Networking — Theory
Networking — Theory (interview deep-dive)
Section titled “Networking — Theory (interview deep-dive)”TCP slow start & congestion control
Section titled “TCP slow start & congestion control”When connection opens, TCP doesn’t blast at full bandwidth — it grows the congestion window (cwnd) exponentially during slow start, then linearly after threshold. On loss, it backs off.
Implications:
- Fresh connections are slow for the first few RTTs. Persistent connections (keep-alive) avoid repeated cold start.
- Loss = backoff. Even brief congestion can cause noticeable slowdown.
- Modern algorithms: Cubic (default Linux), BBR (Google) — measures bottleneck bandwidth instead of using loss as signal.
Head-of-line blocking — at multiple layers
Section titled “Head-of-line blocking — at multiple layers”- HTTP/1.1: only one in-flight request per connection. Browsers open 6 conns.
- HTTP/2 over TCP: streams multiplexed BUT TCP retransmit on packet loss stalls all streams (TCP is byte-stream).
- HTTP/3 over QUIC/UDP: per-stream isolation; one stream’s loss doesn’t stall others.
TLS handshake cost
Section titled “TLS handshake cost”- TLS 1.2: 2 RTTs (often 3 with cert chain).
- TLS 1.3: 1 RTT for new sessions; 0-RTT for resumed (with replay risk).
- Optimizations: session resumption (tickets), TLS-on-CDN, OCSP stapling, ALPN.
TCP_NODELAY (disable Nagle)
Section titled “TCP_NODELAY (disable Nagle)”Nagle’s algorithm batches small writes to reduce overhead. Combined with delayed ACKs, can add 40ms latency to small interactive payloads. For latency-sensitive (e.g., RPC, gaming), set TCP_NODELAY to disable.
Keep-alive vs reconnection
Section titled “Keep-alive vs reconnection”- HTTP keep-alive keeps TCP connection open after response. Reuses for next request.
- TCP keepalive (kernel option) probes idle connections to detect dead peers (default 2h on Linux — too long!). Tune to 30-60s for backend services.
- App-level pings often more reliable than TCP keepalives.
DNS deep notes
Section titled “DNS deep notes”- Cache hierarchy: app → resolver lib → OS → ISP → recursive → authoritative.
- TTL too long: changes propagate slowly. Too short: lookup spam.
- Pre-resolve in advance for latency-sensitive workloads.
- Round-robin DNS is a poor LB; clients cache. Use real LB.
- GeoDNS routes to nearest region (CloudFront, Route53).
- Watch out: Java’s default
networkaddress.cache.ttl=-1(cache forever). Override for cloud.
TCP TIME_WAIT
Section titled “TCP TIME_WAIT”After active close, the closer holds the 4-tuple in TIME_WAIT for ~60s (2× MSL). This prevents stale packets from confusing a new connection on same 4-tuple.
Server-side rarely an issue (port stays). Client-side at high conn rate: ephemeral port exhaustion. Mitigations:
- Persistent connections.
SO_REUSEPORT/tcp_tw_reuse(Linux).- Multiple source IPs.
Connection limits
Section titled “Connection limits”- Per-process:
ulimit -n(file descriptors). - Linux defaults: 1024 — raise to 65535+ for high-conn servers.
- Each TCP conn ~ a few KB kernel memory + epoll registration.
Load balancers — important details
Section titled “Load balancers — important details”L4 vs L7
Section titled “L4 vs L7”- L4 terminates TCP, forwards based on IP/port. No HTTP awareness. Faster, simpler. Good for non-HTTP, raw TCP, gRPC behind dedicated LB.
- L7 terminates HTTP. Routes by path, header, cookie. Can do retries, rewrites, auth. Needed for path-based routing, sticky sessions, gRPC-Web translation.
Algorithms
Section titled “Algorithms”- Round robin — simple, ignores capacity differences.
- Least connections — preferred for varied workload.
- Least response time — adaptive.
- IP hash / consistent hash — sticky sessions, cache affinity.
- Power of two choices — pick 2 random, send to less loaded; near-optimal in practice.
Health checks
Section titled “Health checks”- Active (LB pings) vs passive (LB monitors actual responses).
- HTTP
/healthzminimal; check critical deps in/readyz. - Tune:
interval,timeout,healthy_threshold,unhealthy_threshold.
Anycast vs unicast
Section titled “Anycast vs unicast”- Unicast: one IP, one host.
- Anycast: same IP advertised from many places; BGP routes you to nearest. Used by DNS root servers, CDNs, Cloudflare.
Load balancing layers (typical edge → app)
Section titled “Load balancing layers (typical edge → app)”- DNS-based geo routing → region.
- Anycast IP → nearest PoP.
- CDN → cached or origin pull.
- Regional L7 LB → service.
- Service mesh sidecar (Envoy) → app instance.
CDN basics
Section titled “CDN basics”- Edge caching close to user. Origin pull on miss.
- Cache key = URL + Vary headers.
- Purge / invalidation: purge by URL/tag.
- Used for static assets, API responses (with cache-control), images, video.
- Workers / Edge functions — run code at edge (Cloudflare Workers, Lambda@Edge, Fastly Compute).
HTTP/2 settings to know
Section titled “HTTP/2 settings to know”SETTINGS_MAX_CONCURRENT_STREAMS(default 100, often raise to 1000+ for gRPC).SETTINGS_INITIAL_WINDOW_SIZE— flow control window per stream.- HPACK dynamic table size.
Common interview Qs
Section titled “Common interview Qs”- What happens when you type
https://x.comin a browser? DNS lookup → TCP connect (handshake) → TLS handshake → HTTP request → HTML parse → CSS/JS fetch → render. - HTTP/1 vs HTTP/2 vs HTTP/3 — when would you choose each?
- Why might TLS 1.3 0-RTT be dangerous? Replay attacks for non-idempotent requests.
- TCP retransmit timer — what is it? Adaptive based on RTT estimate. Min/max bounded.
- What is sticky session, and why might you want or avoid it?
- Difference between forward and reverse proxy.
- Path MTU discovery — why might packets be silently dropped? ICMP filtering breaks it; large packets just disappear. Tune MSS clamp.
- CORS preflight — when triggered? Non-simple methods (PUT, DELETE), custom headers, non-standard content-types.
- WebSocket vs SSE — pick one for: live stock prices, chat, notifications.
- You see high p99 only for one client country — debug. Geo-routing miss, peering issue, IPv6 path, MTU. Use traceroute, mtr.
Production gotchas
Section titled “Production gotchas”- Idle TCP keep-alive too long → load balancer drops connection without app knowing → next write fails → 5xx.
- DNS TTL too short + many clients → DNS amplification.
- Slowloris-style clients holding connections — set
client_header_timeout,client_body_timeout,keepalive_timeout. - No graceful drain on shutdown → in-flight requests fail. SIGTERM handler should: stop accepting new, drain, then exit.
- Hairpin routing: client → LB → service in same VPC → upstream over external interface — adds latency and cost.