Networking — Theory

Networking — Theory (interview deep-dive)

TCP slow start & congestion control

When connection opens, TCP doesn’t blast at full bandwidth — it grows the congestion window (cwnd) exponentially during slow start, then linearly after threshold. On loss, it backs off.

Implications:

Fresh connections are slow for the first few RTTs. Persistent connections (keep-alive) avoid repeated cold start.
Loss = backoff. Even brief congestion can cause noticeable slowdown.
Modern algorithms: Cubic (default Linux), BBR (Google) — measures bottleneck bandwidth instead of using loss as signal.

Head-of-line blocking — at multiple layers

HTTP/1.1: only one in-flight request per connection. Browsers open 6 conns.
HTTP/2 over TCP: streams multiplexed BUT TCP retransmit on packet loss stalls all streams (TCP is byte-stream).
HTTP/3 over QUIC/UDP: per-stream isolation; one stream’s loss doesn’t stall others.

TLS handshake cost

TLS 1.2: 2 RTTs (often 3 with cert chain).
TLS 1.3: 1 RTT for new sessions; 0-RTT for resumed (with replay risk).
Optimizations: session resumption (tickets), TLS-on-CDN, OCSP stapling, ALPN.

TCP_NODELAY (disable Nagle)

Nagle’s algorithm batches small writes to reduce overhead. Combined with delayed ACKs, can add 40ms latency to small interactive payloads. For latency-sensitive (e.g., RPC, gaming), set TCP_NODELAY to disable.

Keep-alive vs reconnection

HTTP keep-alive keeps TCP connection open after response. Reuses for next request.
TCP keepalive (kernel option) probes idle connections to detect dead peers (default 2h on Linux — too long!). Tune to 30-60s for backend services.
App-level pings often more reliable than TCP keepalives.

DNS deep notes

Cache hierarchy: app → resolver lib → OS → ISP → recursive → authoritative.
TTL too long: changes propagate slowly. Too short: lookup spam.
Pre-resolve in advance for latency-sensitive workloads.
Round-robin DNS is a poor LB; clients cache. Use real LB.
GeoDNS routes to nearest region (CloudFront, Route53).
Watch out: Java’s default networkaddress.cache.ttl=-1 (cache forever). Override for cloud.

TCP TIME_WAIT

After active close, the closer holds the 4-tuple in TIME_WAIT for ~60s (2× MSL). This prevents stale packets from confusing a new connection on same 4-tuple.

Server-side rarely an issue (port stays). Client-side at high conn rate: ephemeral port exhaustion. Mitigations:

Persistent connections.
SO_REUSEPORT / tcp_tw_reuse (Linux).
Multiple source IPs.

Connection limits

Per-process: ulimit -n (file descriptors).
Linux defaults: 1024 — raise to 65535+ for high-conn servers.
Each TCP conn ~ a few KB kernel memory + epoll registration.

Load balancers — important details

L4 vs L7

L4 terminates TCP, forwards based on IP/port. No HTTP awareness. Faster, simpler. Good for non-HTTP, raw TCP, gRPC behind dedicated LB.
L7 terminates HTTP. Routes by path, header, cookie. Can do retries, rewrites, auth. Needed for path-based routing, sticky sessions, gRPC-Web translation.

Algorithms

Round robin — simple, ignores capacity differences.
Least connections — preferred for varied workload.
Least response time — adaptive.
IP hash / consistent hash — sticky sessions, cache affinity.
Power of two choices — pick 2 random, send to less loaded; near-optimal in practice.

Health checks

Active (LB pings) vs passive (LB monitors actual responses).
HTTP /healthz minimal; check critical deps in /readyz.
Tune: interval, timeout, healthy_threshold, unhealthy_threshold.

Anycast vs unicast

Unicast: one IP, one host.
Anycast: same IP advertised from many places; BGP routes you to nearest. Used by DNS root servers, CDNs, Cloudflare.

Load balancing layers (typical edge → app)

DNS-based geo routing → region.
Anycast IP → nearest PoP.
CDN → cached or origin pull.
Regional L7 LB → service.
Service mesh sidecar (Envoy) → app instance.

CDN basics

Edge caching close to user. Origin pull on miss.
Cache key = URL + Vary headers.
Purge / invalidation: purge by URL/tag.
Used for static assets, API responses (with cache-control), images, video.
Workers / Edge functions — run code at edge (Cloudflare Workers, Lambda@Edge, Fastly Compute).

HTTP/2 settings to know

SETTINGS_MAX_CONCURRENT_STREAMS (default 100, often raise to 1000+ for gRPC).
SETTINGS_INITIAL_WINDOW_SIZE — flow control window per stream.
HPACK dynamic table size.

Common interview Qs

What happens when you type https://x.com in a browser? DNS lookup → TCP connect (handshake) → TLS handshake → HTTP request → HTML parse → CSS/JS fetch → render.
HTTP/1 vs HTTP/2 vs HTTP/3 — when would you choose each?
Why might TLS 1.3 0-RTT be dangerous? Replay attacks for non-idempotent requests.
TCP retransmit timer — what is it? Adaptive based on RTT estimate. Min/max bounded.
What is sticky session, and why might you want or avoid it?
Difference between forward and reverse proxy.
Path MTU discovery — why might packets be silently dropped? ICMP filtering breaks it; large packets just disappear. Tune MSS clamp.
CORS preflight — when triggered? Non-simple methods (PUT, DELETE), custom headers, non-standard content-types.
WebSocket vs SSE — pick one for: live stock prices, chat, notifications.
You see high p99 only for one client country — debug. Geo-routing miss, peering issue, IPv6 path, MTU. Use traceroute, mtr.

Production gotchas

Idle TCP keep-alive too long → load balancer drops connection without app knowing → next write fails → 5xx.
DNS TTL too short + many clients → DNS amplification.
Slowloris-style clients holding connections — set client_header_timeout, client_body_timeout, keepalive_timeout.
No graceful drain on shutdown → in-flight requests fail. SIGTERM handler should: stop accepting new, drain, then exit.
Hairpin routing: client → LB → service in same VPC → upstream over external interface — adds latency and cost.