Linux — Theory
Linux — Theory (concise)
Section titled “Linux — Theory (concise)”How the kernel runs your process
Section titled “How the kernel runs your process”User space ↔ syscalls ↔ kernel space. Every I/O, fork, signal, etc. = syscall.
Process states (visible in ps):
- R running.
- S interruptible sleep (waiting on event).
- D uninterruptible sleep (waiting on I/O — usually disk).
- Z zombie (dead, parent hasn’t reaped).
- T stopped.
High D load = stuck on I/O (slow disk, hung NFS).
Memory
Section titled “Memory”- Resident (RSS) — actual RAM used.
- Virtual (VSZ) — total mapped (incl. swap, files).
- Shared — between processes.
- Page cache — files cached in RAM. Counted as “used” by free, but evictable.
free -h: actual free = available, not free.
OOM killer picks victim by oom_score when RAM exhausted. Kubernetes containers OOMKilled = the cgroup hit limit, not the host.
- Load average: 1/5/15min. Compare to core count. Load == core count = saturated.
- Steal time (st in top) — VM hypervisor stealing CPU. Common on noisy AWS hosts.
- iowait — high = waiting on disk.
- Context switches — high without obvious cause may mean lock contention.
- Block layer + scheduler (none/mq-deadline/bfq for SSD).
iostat -x 1: %util ~100%, await high → disk bottleneck.- Page cache absorbs reads; writes go through fsync if requested.
Networking — kernel path
Section titled “Networking — kernel path”Packet → NIC → softirq → kernel TCP stack → socket buffer → app. Backwards on send.
nf_conntracktable tracks connections (NAT). Full = drops. Tunenf_conntrack_max.net.core.somaxconn— listen backlog cap.net.ipv4.ip_local_port_range— ephemeral ports.
Cgroups & namespaces (containers)
Section titled “Cgroups & namespaces (containers)”- Cgroups (v2) limit & account: cpu, memory, io, pids.
- Namespaces isolate: PID, NET, MNT, IPC, UTS, USER, CGROUP.
- Together = container.
- systemd uses cgroups for its services too (slice/scope).
File descriptors
Section titled “File descriptors”- Numbered open-file refs.
ulimit -n= max per-process.- “Too many open files” → leaked fds (sockets, files, epoll).
- Diagnose:
lsof -p PID | wc -l, watch over time.
Signals
Section titled “Signals”Common in interviews:
- SIGTERM (15) → graceful stop.
- SIGINT (2) → ctrl+c.
- SIGKILL (9) → uncatchable.
- SIGHUP (1) → reload config (often).
- SIGUSR1/2 → app-specific.
App should handle SIGTERM (drain, close), not just exit.
Booting / init
Section titled “Booting / init”Modern Linux uses systemd. Order: kernel → initramfs → systemd PID 1 → unit dependency tree → multi-user.target / graphical.target.
Service unit (.service), timer unit (.timer), socket unit (.socket — socket activation), mount unit, etc.
Common interview Qs
Section titled “Common interview Qs”- Process in
Dstate for hours — what now? Kernel can’t kill it (uninterruptible). Often stuck I/O — need reboot or fix the underlying device. - High load but low CPU — explanation? Many processes in D state (I/O wait), or thread contention.
iowaitis 50% — what to check? Slow disk (iostat), large queue, many fsyncs, NFS lag.topshows 200% CPU — meaning? Process using >1 core (multi-thread). 200% ≈ 2 cores fully used.- Process can’t open more sockets — debug. ulimit -n;
cat /proc/PID/limits;ls /proc/PID/fd | wc -l; lsof to see what’s open. - kswapd at 100% — what? Memory pressure → reclaiming pages. Free up RAM, disable swap, tune
vm.swappiness. ss -sshows 100k orphan sockets — diagnose. TIME_WAIT pileup or app not closing connections; tune kernel or fix app.- Out of inodes but disk has space.
df -i. Many small files (tmp, sessions, logs). Different ext4 mkfs option needed if rebuilding. - systemd service crashes on startup — debug.
journalctl -u svc -e -f; check ExecStart syntax, env, dependencies, capabilities. - Container can’t reach the internet. Check pod IP, service DNS, NetworkPolicy, node iptables, security group, VPC route to NAT.
Anti-patterns
Section titled “Anti-patterns”- Running services as root.
- No log rotation.
- Manual ad-hoc
chown -Rrecursive on/. - Adding swap to “fix” memory issues without diagnosing.
- Editing
/etc/fstabwithout checkingblkidfirst. - Running every cron as root.
- Ignoring
dmesg/journalctl -kafter weird issues.