Kubernetes — Theory

Kubernetes — Theory (interview deep-dive)

Why K8s exists

Standard API for deploying / running / scaling / healing containerized workloads on any infra. Replaces hand-rolled scripts + cron + LB config with a declarative loop.

Reconciliation loop

Heart of K8s. Every controller continuously:

Observes current state (via API).
Compares to desired state (spec).
Acts to converge.

Eventually-consistent. Loops are safe to retry — operations should be idempotent.

Pod scheduling

Scheduler picks node based on:
- Filtering: resource requests fit, taints/tolerations, node selectors, affinity/anti-affinity, topology spread.
- Scoring: priority based on least-loaded, locality, image already cached, etc.
Once scheduled, kubelet runs the pod via container runtime.

Affinity examples:

Pod anti-affinity — spread replicas across nodes / AZs.
Pod affinity — colocate with another service.
Topology spread — even distribution across zones.

Deployment rollout

RollingUpdate (default): create N new, wait for ready, kill N old. Tunable via maxSurge, maxUnavailable.

Recreate: kill all old, then create new — outage but no version mix.

Health gates:

readinessProbe must pass for pod to receive traffic.
livenessProbe failure → restart.
startupProbe for slow boot.
minReadySeconds for soak time before rollout proceeds.

kubectl rollout undo reverts to previous ReplicaSet.

Services & endpoints

Service selects pods by label → Endpoints (or EndpointSlice) tracks IPs. kube-proxy programs iptables/IPVS rules to DNAT to a backing pod.

headless service (clusterIP: None) returns A records for each pod — used by StatefulSets, for clients doing their own LB.

DNS

CoreDNS resolves cluster-internal names. Pod’s /etc/resolv.conf points to it.

api.default.svc.cluster.local → ClusterIP
api → search list expands using `default` namespace

DNS issues are common — CoreDNS pod CPU caps, large clusters need more replicas.

Storage

PersistentVolumeClaim declares “I need 10Gi RWO”. StorageClass knows how to provision it (CSI driver). PV is the resulting backing volume.

Access modes:

RWO ReadWriteOnce — one node.
RWX ReadWriteMany — multiple nodes (rare; needs NFS/EFS).
ROX ReadOnlyMany.

StatefulSets bind each replica to a stable PVC by ordinal.

Networking model

Every pod has its own IP.
All pods can reach all pods (no NAT) by default.
NetworkPolicy lets you restrict (default-deny + selectively allow). CNI plugin must support it (Calico, Cilium do).

Service mesh (Istio, Linkerd) layers mTLS, retries, traffic shifting on top.

Autoscaling

HPA (Horizontal Pod Autoscaler) — scales replicas by metric (CPU / memory / custom). Needs metrics-server or Prometheus Adapter.
VPA (Vertical) — adjusts requests/limits. Don’t use with HPA on same metric.
Cluster Autoscaler — adds/removes nodes when pods can’t schedule.
Karpenter (AWS) — modern node provisioner, faster, more flexible than CA.

RBAC, service accounts

Each pod runs as a ServiceAccount (default = default SA).
Token mounted at /var/run/secrets/kubernetes.io/serviceaccount/token.
Use specific SAs per workload, bind minimal Role.
For cloud auth: IRSA (AWS), Workload Identity (GCP), Azure AD Workload Identity → no static creds.

Secrets — the elephant

K8s Secrets are base64, not encrypted by default. Enable etcd encryption at rest.
RBAC: deny get/list on secrets to most subjects.
Better: External Secrets Operator pulls from Vault / Secrets Manager.
Sealed Secrets / SOPS for GitOps with encrypted secrets in repo.

Common interview questions

What’s the difference between Deployment and StatefulSet? Deployment → identical interchangeable pods, no stable identity. StatefulSet → ordered, stable hostnames + PVC per ordinal.
A pod is CrashLoopBackOff — debug. kubectl describe pod, kubectl logs --previous, look at exit code, recent image change, env, volumes.
Pod is Pending forever. Resource requests too high, no matching nodes, taints not tolerated, image pull error.
Service has no endpoints. Selector mismatch with pod labels, pods not Ready, NetworkPolicy blocking.
OOMKilled — what now? Raise memory limit; profile app; check Node memory; consider GOMEMLIMIT for Go, --max-old-space-size for Node.
Rolling update — request to old vs new during? Both serve. Make handlers idempotent and contracts forwards-compatible. preStop hook drains.
HPA isn’t scaling. Metrics-server installed? CPU-based metrics need requests defined. Custom metrics need adapter.
What’s a sidecar? Helper container in pod (proxy, log forwarder, secrets). Shares network/volume.
Why use init containers? Run-once setup before app starts (DB migrations, fetch certs).
Operator pattern — when? When you need to encode operational knowledge (HA cluster bootstrap, backups, upgrades) as a controller for a CRD.
Why is kubectl exec not for “deploys”? Imperative, untracked, no audit, lost on pod restart. Use deployments.
What’s PodDisruptionBudget? Guarantees minimum available pods during voluntary disruptions (drain). minAvailable: 2 or maxUnavailable: 1.
NetworkPolicy default behavior. Without policies, all traffic allowed. With at least one selecting pod → that pod is restricted to allowed traffic.

Common pitfalls

Same requests and limits (Guaranteed QoS) for everything — wastes capacity.
No requests (BestEffort) — first to be evicted.
Liveness probe identical to readiness — if liveness fails, restart loop instead of pulling out of LB.
Forgetting terminationGracePeriodSeconds and preStop — connections cut mid-flight.
DNS lookups for every request without resolver caching.
latest image tag → pods restart with different code.
Cluster-scoped admin role to a pod’s SA.
Using hostPath (couples to a specific node).
One huge cluster shared by many tenants without isolation.
kubectl edit in prod without GitOps trail.

Helm vs Kustomize

Helm — templating + package manager. Charts with values. Good for complex apps with many opts.
Kustomize — patches over base manifests. Built into kubectl. Cleaner for env overlays.
Combine: helm for upstream, kustomize on top for env-specific.

Modern stack pieces

Argo CD / Flux — GitOps.
Cert-Manager — TLS via Let’s Encrypt / Vault.
External DNS — sync DNS records.
Prometheus + Grafana + Alertmanager — metrics.
Loki / Elastic — logs.
OpenTelemetry Collector — tracing.
Istio / Linkerd — service mesh.
Karpenter — node provisioning.
Velero — backup.
Cilium — eBPF-based CNI.

When NOT to use Kubernetes

Small team, few services — too much ops overhead.
Stateless workloads that fit serverless cheaper.
One-off batch — managed batch services exist.
No on-call discipline — clusters need operators.