Distributed Systems — Practical
Distributed Systems — Practical patterns
Section titled “Distributed Systems — Practical patterns”Designing for partial failure
Section titled “Designing for partial failure”Every remote call must answer:
- What’s the timeout?
- What’s the retry policy (count, backoff, jitter, idempotency)?
- What’s the fallback? (default value, cached value, degraded mode, error to user)
- Is this call inside a circuit breaker?
- Is this call cancellable (deadline propagation)?
Distributed lock with fencing token (etcd)
Section titled “Distributed lock with fencing token (etcd)”// Acquire lease + lock; fencing token = revisionsession, _ := concurrency.NewSession(client, concurrency.WithTTL(30))mu := concurrency.NewMutex(session, "/locks/job-X")mu.Lock(ctx)fencing := mu.Header().Revision // monotonic increasing
// Use fencing in downstream:// downstream rejects writes with fencing <= last seendb.Exec("UPDATE state SET v=$1, last_token=$2 WHERE last_token < $2", v, fencing)
mu.Unlock(ctx)Lock alone is not enough — process can pause (GC, swap, OS scheduling) and another process acquires lock. Old process wakes up, still thinks it has lock, mutates state. Fencing prevents this.
Idempotency via inbox table
Section titled “Idempotency via inbox table”CREATE TABLE inbox ( message_id UUID PRIMARY KEY, received_at TIMESTAMPTZ NOT NULL DEFAULT now());async function handle(msg) { await db.tx(async t => { const r = await t.query( `INSERT INTO inbox (message_id) VALUES ($1) ON CONFLICT DO NOTHING RETURNING 1`, [msg.id]); if (!r.rowCount) return; await applyEffect(t, msg); });}Hedged requests (reduce tail latency)
Section titled “Hedged requests (reduce tail latency)”async function hedged<T>(call: () => Promise<T>, hedgeAfterMs: number): Promise<T> { const first = call(); const hedge = new Promise<T>((resolve, reject) => { setTimeout(() => call().then(resolve, reject), hedgeAfterMs); }); return Promise.race([first, hedge]);}Consistent hashing (sketch)
Section titled “Consistent hashing (sketch)”class HashRing { private ring = new Map<number, string>(); // hash → node private sortedHashes: number[] = [];
add(node: string, virtual = 100) { for (let i = 0; i < virtual; i++) { const h = hash(`${node}#${i}`); this.ring.set(h, node); } this.sortedHashes = [...this.ring.keys()].sort((a, b) => a - b); }
pick(key: string): string { const h = hash(key); const idx = this.sortedHashes.findIndex(x => x >= h); return this.ring.get(this.sortedHashes[idx === -1 ? 0 : idx])!; }}Use virtual nodes (100+ per physical) to smooth distribution.
Time-based dedup (sliding window)
Section titled “Time-based dedup (sliding window)”For event streams where exact dedup table would grow forever, keep a TTL’d Bloom filter or LRU. Acceptable false-positive rate in exchange for bounded memory.
Outbox + relay (cross-system reliability)
Section titled “Outbox + relay (cross-system reliability)”-- in producer DB transactionBEGIN;UPDATE orders SET status='paid' WHERE id=$1;INSERT INTO outbox(id, event_type, payload) VALUES (gen_random_uuid(), 'OrderPaid', $2);COMMIT;Relay (separate worker / Debezium) reads outbox and publishes to broker. Idempotent consumer dedupes.
Health/readiness for orchestrators
Section titled “Health/readiness for orchestrators”- liveness — am I alive? If false, restart me.
- readiness — should I receive traffic now? If false, drain.
- startup — initial probe with longer grace (K8s).
Don’t conflate them. A loaded service is alive but might be unready.
Backoff with jitter (essential)
Section titled “Backoff with jitter (essential)”const sleep = (base: number, attempt: number) => base * 2 ** attempt * (0.5 + Math.random() * 0.5);“Full jitter” is a recommended variant: Math.random() * base * 2**attempt.
Without jitter, retries synchronize → thundering herd.
Distributed tracing — what to record per span
Section titled “Distributed tracing — what to record per span”- Service name, span name, kind (server/client/internal/producer/consumer).
- Start + end time.
trace_id,span_id,parent_span_id.- Attributes: HTTP method/status, DB statement (sanitized), entity ids, user id, region.
- Events: error, retry, cache hit/miss.
Propagate via traceparent header (W3C Trace Context).
Useful “back of envelope” numbers
Section titled “Useful “back of envelope” numbers”| Op | Time |
|---|---|
| L1 cache | 0.5 ns |
| Branch mispredict | 5 ns |
| Memory access | 100 ns |
| SSD random read | 100 µs |
| Round trip same DC | 0.5 ms |
| Disk seek (HDD) | 5-10 ms |
| Round trip same continent | 50 ms |
| Round trip cross-Atlantic | 150 ms |
| HTTP request to typical API | 100-300 ms |
(From Jeff Dean’s “Numbers Every Engineer Should Know”.)
Common production issues & fixes
Section titled “Common production issues & fixes”| Symptom | Cause | Fix |
|---|---|---|
| Spike in p99 only | Tail latency in one shard | Hedging, replication |
| Cascading 500s | No timeout on downstream | Per-call deadline |
| OOM after deploy | Connection leak | Pool with max + close in finally |
| Slow recovery from outage | Retry storm | Jitter + breaker |
| Duplicate orders | At-least-once + non-idempotent | Idempotency key |
| Phantom split-brain writes | Lock without fencing | Fencing token |
Tools to know
Section titled “Tools to know”- etcd / ZooKeeper / Consul — coordination.
- Temporal / Cadence — durable workflows.
- Vitess / CockroachDB / TiDB / YugabyteDB — distributed SQL.
- Cassandra / ScyllaDB — eventually consistent KV at scale.
- OpenTelemetry — tracing/metrics/logs standard.
- Jepsen — testing distributed systems for consistency violations.