Linux — Theory

Linux — Theory (concise)

User space ↔ syscalls ↔ kernel space. Every I/O, fork, signal, etc. = syscall.

Process states (visible in ps):

High D load = stuck on I/O (slow disk, hung NFS).

free -h: actual free = available, not free.

OOM killer picks victim by oom_score when RAM exhausted. Kubernetes containers OOMKilled = the cgroup hit limit, not the host.

Load average: 1/5/15min. Compare to core count. Load == core count = saturated.
Steal time (st in top) — VM hypervisor stealing CPU. Common on noisy AWS hosts.
iowait — high = waiting on disk.
Context switches — high without obvious cause may mean lock contention.

Packet → NIC → softirq → kernel TCP stack → socket buffer → app. Backwards on send.

nf_conntrack table tracks connections (NAT). Full = drops. Tune nf_conntrack_max.
net.core.somaxconn — listen backlog cap.
net.ipv4.ip_local_port_range — ephemeral ports.

Common in interviews:

App should handle SIGTERM (drain, close), not just exit.

Modern Linux uses systemd. Order: kernel → initramfs → systemd PID 1 → unit dependency tree → multi-user.target / graphical.target.

Service unit (.service), timer unit (.timer), socket unit (.socket — socket activation), mount unit, etc.

Process in D state for hours — what now? Kernel can’t kill it (uninterruptible). Often stuck I/O — need reboot or fix the underlying device.
High load but low CPU — explanation? Many processes in D state (I/O wait), or thread contention.
iowait is 50% — what to check? Slow disk (iostat), large queue, many fsyncs, NFS lag.
top shows 200% CPU — meaning? Process using >1 core (multi-thread). 200% ≈ 2 cores fully used.
Process can’t open more sockets — debug. ulimit -n; cat /proc/PID/limits; ls /proc/PID/fd | wc -l; lsof to see what’s open.
kswapd at 100% — what? Memory pressure → reclaiming pages. Free up RAM, disable swap, tune vm.swappiness.
ss -s shows 100k orphan sockets — diagnose. TIME_WAIT pileup or app not closing connections; tune kernel or fix app.
Out of inodes but disk has space. df -i. Many small files (tmp, sessions, logs). Different ext4 mkfs option needed if rebuilding.
systemd service crashes on startup — debug. journalctl -u svc -e -f; check ExecStart syntax, env, dependencies, capabilities.
Container can’t reach the internet. Check pod IP, service DNS, NetworkPolicy, node iptables, security group, VPC route to NAT.