System Design — Theory

System Design — Theory (deep concepts)

Always-asked tradeoffs

Axis	Trade
Consistency	latency / availability
Read vs write throughput	sharding strategy
Sync vs async	latency vs decoupling
Cache	freshness vs hit rate
Single-region vs multi-region	latency vs availability vs cost
SQL vs NoSQL	flexibility vs simplicity
Server-side vs client-side rendering	TTFB vs interactivity
Push vs pull	freshness vs efficiency

CAP / PACELC (revisited from distributed-systems)

Under partition: pick C or A. Even normal: pick L (latency) or C (consistency).

Most modern DBs lean AP for survivability. Strong-consistent systems (Spanner, etcd) accept latency cost.

Sharding strategy

Range — by key range; risk: hot range (timestamps).
Hash — even distribution; range queries hard.
Geo — by region; user data near user; data residency.
Tenant — per-customer shard; isolation.

Re-sharding is painful. Plan for it: consistent hashing, virtual nodes, migration paths.

Read scale

Read replicas (sync or async).
Materialized views.
CDN cache.
Application-tier cache (Redis).
Per-request cache (memoize within request).
CQRS — separate read schema optimized for queries.

Write scale

Sharding.
Async write paths (queue → worker).
Write-behind cache.
Batch writes.
Append-only log instead of mutate-in-place.

Hot keys

When one key gets disproportionate traffic:

Add L1 cache before L2 (in-process).
Replicate that key across many cache nodes.
Append a random suffix key.{1..N} and pick one (bucket spreads load).
Read replicas close to clients.

Idempotency

Every retried request must be safe. Achieved via:

Idempotency key (client-supplied).
Natural keys (INSERT ON CONFLICT DO NOTHING).
State machines (only apply transitions that move forward).

Required for any HTTP API likely to be retried by clients.

Backpressure

When demand exceeds capacity, decide what gives:

Reject (load shed).
Queue (and grow buffer; risk OOM).
Slow upstream.
Degrade response quality.

Principle: fail fast and visibly beats silent latency growth.

Tail latency strategies

Hedged requests.
Replicate slow shards.
Tighter timeouts on inner calls.
Fewer round trips (combine, prefetch).
Async paths for non-critical (“eventual” path).

Geo-distributed designs

Single-region: simplest, lowest latency for nearby users, fails as a unit.
Multi-region active-passive: failover. RTO/RPO tradeoffs. Cost: replicating data + idle standby.
Multi-region active-active: read locally, complex consistency. Common with eventually consistent storage.
Edge / regional partition: each region serves its tenants exclusively (data locality, GDPR).

Spanner / DynamoDB Global Tables / CockroachDB are options for global strong consistency at cost.

Common pitfalls in design interviews

Skipping clarifying questions.
Overengineering for scale that wasn’t asked.
Ignoring write path.
No mention of failure modes.
Not handling concurrent updates.
Forgetting auth/observability/deploy.
Picking exotic tech (Cassandra) for a simple problem.
Not addressing the interviewer’s prompts.

Frequently-asked deep dives

URL shortener

ID gen: random 7-char, base62 over an int counter, or distributed snowflake.
DB: KV store (Redis/DynamoDB) or RDBMS for analytics.
Read 100:1 to writes → cache and CDN heavy.
Custom slugs collide → use INSERT ON CONFLICT.

Twitter feed

Fanout-on-write: pre-compute timeline at tweet time. Fast read, write fan = followers count (millions for celebrities — handle separately).
Fanout-on-read: assemble at read. Lots of work for active users.
Hybrid: fanout for normal users; pull-on-read for celebrity authors.

Chat

Connection mgmt: WebSocket / sticky LB.
Storage: per-conversation partition.
Presence: ephemeral, Redis.
Push delivery: APNS/FCM for offline.
Retention + search.

Rate limiter

Fixed window: simple, edge-of-window burst.
Sliding window log (ZSET): exact, more memory.
Token bucket: bursts allowed up to bucket size.
Distributed: Redis Lua atomic check-and-decrement.

Notification system

Fanout via queue.
Per-user dedupe.
Retry with TTL.
Per-channel adapter (email, push, SMS).
Quiet hours / preferences.
Audit log.

Payment flow

Idempotency key per request.
Saga: authorize → fulfill → capture (or cancel).
Double-entry ledger for accounting.
Reconciliation against payment provider.
Watch for race conditions in balance updates (SELECT FOR UPDATE or atomic increment).

Geo dispatch (Uber-like)

Geohash / S2 / quadtree for spatial index.
Match: nearest available drivers within radius.
Driver location updates: high write QPS — use streaming + in-memory grid (Redis Geo, Tile38).
Surge: time-window aggregation per cell.

News feed ranking

Candidate generation (followed users, popular).
Feature retrieval (recency, affinity, content type).
Scoring (ML model serving with low p99).
Dedup, diversify, pagination.

Distributed file sync

Chunk file, hash chunks, dedupe.
Local cache + lazy sync.
Conflict resolution (last-write-wins, branch on conflict).
Delta sync.

Search

Inverted index (Elasticsearch / Lucene).
Indexing pipeline (Kafka → indexer with idempotency).
Query: filters first, then scoring.
Personalization layer.