Skip to content

Networking — Theory

Networking — Theory (interview deep-dive)

Section titled “Networking — Theory (interview deep-dive)”

When connection opens, TCP doesn’t blast at full bandwidth — it grows the congestion window (cwnd) exponentially during slow start, then linearly after threshold. On loss, it backs off.

Implications:

  • Fresh connections are slow for the first few RTTs. Persistent connections (keep-alive) avoid repeated cold start.
  • Loss = backoff. Even brief congestion can cause noticeable slowdown.
  • Modern algorithms: Cubic (default Linux), BBR (Google) — measures bottleneck bandwidth instead of using loss as signal.

Head-of-line blocking — at multiple layers

Section titled “Head-of-line blocking — at multiple layers”
  • HTTP/1.1: only one in-flight request per connection. Browsers open 6 conns.
  • HTTP/2 over TCP: streams multiplexed BUT TCP retransmit on packet loss stalls all streams (TCP is byte-stream).
  • HTTP/3 over QUIC/UDP: per-stream isolation; one stream’s loss doesn’t stall others.
  • TLS 1.2: 2 RTTs (often 3 with cert chain).
  • TLS 1.3: 1 RTT for new sessions; 0-RTT for resumed (with replay risk).
  • Optimizations: session resumption (tickets), TLS-on-CDN, OCSP stapling, ALPN.

Nagle’s algorithm batches small writes to reduce overhead. Combined with delayed ACKs, can add 40ms latency to small interactive payloads. For latency-sensitive (e.g., RPC, gaming), set TCP_NODELAY to disable.

  • HTTP keep-alive keeps TCP connection open after response. Reuses for next request.
  • TCP keepalive (kernel option) probes idle connections to detect dead peers (default 2h on Linux — too long!). Tune to 30-60s for backend services.
  • App-level pings often more reliable than TCP keepalives.
  • Cache hierarchy: app → resolver lib → OS → ISP → recursive → authoritative.
  • TTL too long: changes propagate slowly. Too short: lookup spam.
  • Pre-resolve in advance for latency-sensitive workloads.
  • Round-robin DNS is a poor LB; clients cache. Use real LB.
  • GeoDNS routes to nearest region (CloudFront, Route53).
  • Watch out: Java’s default networkaddress.cache.ttl=-1 (cache forever). Override for cloud.

After active close, the closer holds the 4-tuple in TIME_WAIT for ~60s (2× MSL). This prevents stale packets from confusing a new connection on same 4-tuple.

Server-side rarely an issue (port stays). Client-side at high conn rate: ephemeral port exhaustion. Mitigations:

  • Persistent connections.
  • SO_REUSEPORT / tcp_tw_reuse (Linux).
  • Multiple source IPs.
  • Per-process: ulimit -n (file descriptors).
  • Linux defaults: 1024 — raise to 65535+ for high-conn servers.
  • Each TCP conn ~ a few KB kernel memory + epoll registration.
  • L4 terminates TCP, forwards based on IP/port. No HTTP awareness. Faster, simpler. Good for non-HTTP, raw TCP, gRPC behind dedicated LB.
  • L7 terminates HTTP. Routes by path, header, cookie. Can do retries, rewrites, auth. Needed for path-based routing, sticky sessions, gRPC-Web translation.
  • Round robin — simple, ignores capacity differences.
  • Least connections — preferred for varied workload.
  • Least response time — adaptive.
  • IP hash / consistent hash — sticky sessions, cache affinity.
  • Power of two choices — pick 2 random, send to less loaded; near-optimal in practice.
  • Active (LB pings) vs passive (LB monitors actual responses).
  • HTTP /healthz minimal; check critical deps in /readyz.
  • Tune: interval, timeout, healthy_threshold, unhealthy_threshold.
  • Unicast: one IP, one host.
  • Anycast: same IP advertised from many places; BGP routes you to nearest. Used by DNS root servers, CDNs, Cloudflare.

Load balancing layers (typical edge → app)

Section titled “Load balancing layers (typical edge → app)”
  1. DNS-based geo routing → region.
  2. Anycast IP → nearest PoP.
  3. CDN → cached or origin pull.
  4. Regional L7 LB → service.
  5. Service mesh sidecar (Envoy) → app instance.
  • Edge caching close to user. Origin pull on miss.
  • Cache key = URL + Vary headers.
  • Purge / invalidation: purge by URL/tag.
  • Used for static assets, API responses (with cache-control), images, video.
  • Workers / Edge functions — run code at edge (Cloudflare Workers, Lambda@Edge, Fastly Compute).
  • SETTINGS_MAX_CONCURRENT_STREAMS (default 100, often raise to 1000+ for gRPC).
  • SETTINGS_INITIAL_WINDOW_SIZE — flow control window per stream.
  • HPACK dynamic table size.
  1. What happens when you type https://x.com in a browser? DNS lookup → TCP connect (handshake) → TLS handshake → HTTP request → HTML parse → CSS/JS fetch → render.
  2. HTTP/1 vs HTTP/2 vs HTTP/3 — when would you choose each?
  3. Why might TLS 1.3 0-RTT be dangerous? Replay attacks for non-idempotent requests.
  4. TCP retransmit timer — what is it? Adaptive based on RTT estimate. Min/max bounded.
  5. What is sticky session, and why might you want or avoid it?
  6. Difference between forward and reverse proxy.
  7. Path MTU discovery — why might packets be silently dropped? ICMP filtering breaks it; large packets just disappear. Tune MSS clamp.
  8. CORS preflight — when triggered? Non-simple methods (PUT, DELETE), custom headers, non-standard content-types.
  9. WebSocket vs SSE — pick one for: live stock prices, chat, notifications.
  10. You see high p99 only for one client country — debug. Geo-routing miss, peering issue, IPv6 path, MTU. Use traceroute, mtr.
  • Idle TCP keep-alive too long → load balancer drops connection without app knowing → next write fails → 5xx.
  • DNS TTL too short + many clients → DNS amplification.
  • Slowloris-style clients holding connections — set client_header_timeout, client_body_timeout, keepalive_timeout.
  • No graceful drain on shutdown → in-flight requests fail. SIGTERM handler should: stop accepting new, drain, then exit.
  • Hairpin routing: client → LB → service in same VPC → upstream over external interface — adds latency and cost.