Kubernetes — Theory
Kubernetes — Theory (interview deep-dive)
Section titled “Kubernetes — Theory (interview deep-dive)”Why K8s exists
Section titled “Why K8s exists”Standard API for deploying / running / scaling / healing containerized workloads on any infra. Replaces hand-rolled scripts + cron + LB config with a declarative loop.
Reconciliation loop
Section titled “Reconciliation loop”Heart of K8s. Every controller continuously:
- Observes current state (via API).
- Compares to desired state (spec).
- Acts to converge.
Eventually-consistent. Loops are safe to retry — operations should be idempotent.
Pod scheduling
Section titled “Pod scheduling”- Scheduler picks node based on:
- Filtering: resource requests fit, taints/tolerations, node selectors, affinity/anti-affinity, topology spread.
- Scoring: priority based on least-loaded, locality, image already cached, etc.
- Once scheduled, kubelet runs the pod via container runtime.
Affinity examples:
- Pod anti-affinity — spread replicas across nodes / AZs.
- Pod affinity — colocate with another service.
- Topology spread — even distribution across zones.
Deployment rollout
Section titled “Deployment rollout”RollingUpdate (default): create N new, wait for ready, kill N old. Tunable via maxSurge, maxUnavailable.
Recreate: kill all old, then create new — outage but no version mix.
Health gates:
- readinessProbe must pass for pod to receive traffic.
- livenessProbe failure → restart.
- startupProbe for slow boot.
minReadySecondsfor soak time before rollout proceeds.
kubectl rollout undo reverts to previous ReplicaSet.
Services & endpoints
Section titled “Services & endpoints”Service selects pods by label → Endpoints (or EndpointSlice) tracks IPs.
kube-proxy programs iptables/IPVS rules to DNAT to a backing pod.
headless service (clusterIP: None) returns A records for each pod — used by StatefulSets, for clients doing their own LB.
CoreDNS resolves cluster-internal names. Pod’s /etc/resolv.conf points to it.
api.default.svc.cluster.local → ClusterIPapi → search list expands using `default` namespaceDNS issues are common — CoreDNS pod CPU caps, large clusters need more replicas.
Storage
Section titled “Storage”PersistentVolumeClaim declares “I need 10Gi RWO”. StorageClass knows how to provision it (CSI driver). PV is the resulting backing volume.
Access modes:
- RWO ReadWriteOnce — one node.
- RWX ReadWriteMany — multiple nodes (rare; needs NFS/EFS).
- ROX ReadOnlyMany.
StatefulSets bind each replica to a stable PVC by ordinal.
Networking model
Section titled “Networking model”- Every pod has its own IP.
- All pods can reach all pods (no NAT) by default.
- NetworkPolicy lets you restrict (default-deny + selectively allow). CNI plugin must support it (Calico, Cilium do).
Service mesh (Istio, Linkerd) layers mTLS, retries, traffic shifting on top.
Autoscaling
Section titled “Autoscaling”- HPA (Horizontal Pod Autoscaler) — scales replicas by metric (CPU / memory / custom). Needs metrics-server or Prometheus Adapter.
- VPA (Vertical) — adjusts requests/limits. Don’t use with HPA on same metric.
- Cluster Autoscaler — adds/removes nodes when pods can’t schedule.
- Karpenter (AWS) — modern node provisioner, faster, more flexible than CA.
RBAC, service accounts
Section titled “RBAC, service accounts”- Each pod runs as a ServiceAccount (default =
defaultSA). - Token mounted at
/var/run/secrets/kubernetes.io/serviceaccount/token. - Use specific SAs per workload, bind minimal Role.
- For cloud auth: IRSA (AWS), Workload Identity (GCP), Azure AD Workload Identity → no static creds.
Secrets — the elephant
Section titled “Secrets — the elephant”- K8s Secrets are base64, not encrypted by default. Enable etcd encryption at rest.
- RBAC: deny
get/liston secrets to most subjects. - Better: External Secrets Operator pulls from Vault / Secrets Manager.
- Sealed Secrets / SOPS for GitOps with encrypted secrets in repo.
Common interview questions
Section titled “Common interview questions”- What’s the difference between Deployment and StatefulSet? Deployment → identical interchangeable pods, no stable identity. StatefulSet → ordered, stable hostnames + PVC per ordinal.
- A pod is
CrashLoopBackOff— debug.kubectl describe pod,kubectl logs --previous, look at exit code, recent image change, env, volumes. - Pod is Pending forever. Resource requests too high, no matching nodes, taints not tolerated, image pull error.
- Service has no endpoints. Selector mismatch with pod labels, pods not Ready, NetworkPolicy blocking.
- OOMKilled — what now? Raise memory limit; profile app; check Node memory; consider GOMEMLIMIT for Go,
--max-old-space-sizefor Node. - Rolling update — request to old vs new during? Both serve. Make handlers idempotent and contracts forwards-compatible.
preStophook drains. - HPA isn’t scaling. Metrics-server installed? CPU-based metrics need requests defined. Custom metrics need adapter.
- What’s a sidecar? Helper container in pod (proxy, log forwarder, secrets). Shares network/volume.
- Why use init containers? Run-once setup before app starts (DB migrations, fetch certs).
- Operator pattern — when? When you need to encode operational knowledge (HA cluster bootstrap, backups, upgrades) as a controller for a CRD.
- Why is
kubectl execnot for “deploys”? Imperative, untracked, no audit, lost on pod restart. Use deployments. - What’s PodDisruptionBudget? Guarantees minimum available pods during voluntary disruptions (drain).
minAvailable: 2ormaxUnavailable: 1. - NetworkPolicy default behavior. Without policies, all traffic allowed. With at least one selecting pod → that pod is restricted to allowed traffic.
Common pitfalls
Section titled “Common pitfalls”- Same
requestsandlimits(Guaranteed QoS) for everything — wastes capacity. - No
requests(BestEffort) — first to be evicted. - Liveness probe identical to readiness — if liveness fails, restart loop instead of pulling out of LB.
- Forgetting
terminationGracePeriodSecondsandpreStop— connections cut mid-flight. - DNS lookups for every request without resolver caching.
latestimage tag → pods restart with different code.- Cluster-scoped admin role to a pod’s SA.
- Using
hostPath(couples to a specific node). - One huge cluster shared by many tenants without isolation.
kubectl editin prod without GitOps trail.
Helm vs Kustomize
Section titled “Helm vs Kustomize”- Helm — templating + package manager. Charts with values. Good for complex apps with many opts.
- Kustomize — patches over base manifests. Built into kubectl. Cleaner for env overlays.
- Combine: helm for upstream, kustomize on top for env-specific.
Modern stack pieces
Section titled “Modern stack pieces”- Argo CD / Flux — GitOps.
- Cert-Manager — TLS via Let’s Encrypt / Vault.
- External DNS — sync DNS records.
- Prometheus + Grafana + Alertmanager — metrics.
- Loki / Elastic — logs.
- OpenTelemetry Collector — tracing.
- Istio / Linkerd — service mesh.
- Karpenter — node provisioning.
- Velero — backup.
- Cilium — eBPF-based CNI.
When NOT to use Kubernetes
Section titled “When NOT to use Kubernetes”- Small team, few services — too much ops overhead.
- Stateless workloads that fit serverless cheaper.
- One-off batch — managed batch services exist.
- No on-call discipline — clusters need operators.