System Design — Basics
System Design — Basics
Section titled “System Design — Basics”Interview framework (for any system design question)
Section titled “Interview framework (for any system design question)”- Clarify requirements — functional + non-functional. Don’t jump to architecture.
- Estimate scale — DAU, QPS, data size, read/write ratio.
- High-level design — boxes & arrows.
- API design — exact endpoints / events.
- Data model — tables / collections / events.
- Deep dive on 1-2 components the interviewer cares about.
- Address bottlenecks & tradeoffs — what scales, what doesn’t, what fails.
- Operational concerns — observability, deploy, on-call.
Speak through your reasoning; the interviewer cares about how you think.
Numbers to memorize (Jeff Dean’s)
Section titled “Numbers to memorize (Jeff Dean’s)”| Op | Time |
|---|---|
| L1 cache | 0.5 ns |
| Branch mispredict | 5 ns |
| Mutex lock/unlock | 25 ns |
| Memory access | 100 ns |
| Compress 1KB w/ Snappy | 3 µs |
| 1Gbps net send 1KB | 10 µs |
| SSD random read | 100 µs |
| SSD seq read 1MB | 1 ms |
| Round trip same DC | 0.5 ms |
| HDD seek | 10 ms |
| Round trip same continent | 50 ms |
| Round trip cross-Atlantic | 150 ms |
Capacity estimation method
Section titled “Capacity estimation method”- DAU × actions/day × bytes/action = data/day.
- Data/day × retention = storage.
- DAU × actions/day / 86400 × peak factor = QPS.
- QPS × avg payload = bandwidth.
- Round generously; show your reasoning.
Example (Twitter feed):
- 200M DAU × 50 reads + 5 writes / day = 10B reads + 1B writes / day.
- 10B / 86400 ~ 116k reads/sec mean; ~ 350k peak.
- 1 tweet ~ 280 bytes; 1B writes/day × 280B = 280GB/day; × 365 ~ 100TB/year.
Building blocks (toolbox)
Section titled “Building blocks (toolbox)”- Load balancer — L4/L7. Distribution + termination.
- Reverse proxy — Nginx, Envoy, HAProxy.
- CDN — edge cache.
- API gateway — auth, rate limit, routing.
- Stateless app servers — horizontal scale.
- Cache — Redis / Memcached.
- DB: SQL (PG/MySQL) or NoSQL (DynamoDB/Mongo/Cassandra).
- Search: Elasticsearch.
- Object store: S3.
- Queue / pub-sub: SQS / Kafka / RabbitMQ.
- Stream processor: Flink / Kafka Streams.
- Workflow: Temporal / Step Functions.
- Object cache layer (CDN-like) for API.
- Read replicas for read scale.
- Sharding for write scale.
Core patterns
Section titled “Core patterns”- Stateless services + DB — easy to scale horizontally.
- Cache-aside — read DB on miss.
- Write-through / write-behind — when latency demands.
- CQRS — separate read and write models.
- Event sourcing — store events instead of state.
- Saga — distributed transaction via local txns + compensations.
- Outbox — atomic DB write + event.
- CDC — replicate DB changes downstream.
- Materialized views — precompute reads.
- Sharding — partition data.
- Replication — sync (RDBMS HA) or async (read replicas, regional).
Pick the right DB
Section titled “Pick the right DB”| Workload | Pick |
|---|---|
| Relational, transactions | Postgres / MySQL |
| Massive write scale, key-based access | Cassandra / DynamoDB |
| Document model | MongoDB |
| Search / analytics | Elasticsearch / OpenSearch |
| Time series | Timescale / InfluxDB / Timestream |
| Graph | Neo4j / Neptune |
| Cache | Redis / Memcached |
| Analytical OLAP | Snowflake / BigQuery / ClickHouse |
| Strong global consistency | Spanner / CockroachDB |
Communication patterns
Section titled “Communication patterns”- Sync RPC — REST / gRPC. Easy, latency adds up.
- Async event — Kafka / SQS. Decoupled, eventual.
- Request-reply over MQ — rare; usually sync RPC instead.
- Pub/sub — fanout.
- Streaming — gRPC streams, WebSocket, SSE.
Caching layers
Section titled “Caching layers”client cache → CDN → API gateway cache → app L1 (in-memory) → Redis (L2) → DBEach layer cuts latency and DB load. Stale handling at each layer.
Reliability patterns
Section titled “Reliability patterns”- Timeouts on every external call.
- Retries with backoff + jitter on idempotent ops.
- Circuit breakers.
- Bulkheads (resource isolation).
- Rate limiting.
- Fallback / graceful degradation.
- Health checks.
- Multi-AZ / multi-region replication.
- Auto-scaling and auto-healing.
Common interview problems
Section titled “Common interview problems”- Design a URL shortener (TinyURL) — id generation, redirects, analytics.
- Design Twitter feed — fanout-on-write vs fanout-on-read, hot users.
- Design a chat — WebSocket fanout, presence, history, push.
- Design Uber/dispatch — geo index (geohash), matching, surge.
- Design rate limiter — token bucket, distributed counter.
- Design notification system — fanout + retries + dedupe.
- Design payment flow — idempotency, saga, double-entry, reconciliation.
- Design news feed / timeline — caching, ranking.
- Design distributed file storage (Dropbox) — chunking, dedupe, sync.
- Design search autocomplete — trie or ngram, freshness.
- Design ad serving — low latency, budgets, fraud.
- Design distributed cache — consistent hashing, replication, eviction.
Don’t forget
Section titled “Don’t forget”- Identity (auth/authz).
- Observability (logs, metrics, traces).
- Deploys (CI/CD, blue-green, canary).
- Failover testing.
- Cost.
- Privacy / compliance.
- Internationalization (where relevant).
- Mobile / web / partner clients (different SLAs).