Skip to content

Linux — Theory

User space ↔ syscalls ↔ kernel space. Every I/O, fork, signal, etc. = syscall.

Process states (visible in ps):

  • R running.
  • S interruptible sleep (waiting on event).
  • D uninterruptible sleep (waiting on I/O — usually disk).
  • Z zombie (dead, parent hasn’t reaped).
  • T stopped.

High D load = stuck on I/O (slow disk, hung NFS).

  • Resident (RSS) — actual RAM used.
  • Virtual (VSZ) — total mapped (incl. swap, files).
  • Shared — between processes.
  • Page cache — files cached in RAM. Counted as “used” by free, but evictable.

free -h: actual free = available, not free.

OOM killer picks victim by oom_score when RAM exhausted. Kubernetes containers OOMKilled = the cgroup hit limit, not the host.

  • Load average: 1/5/15min. Compare to core count. Load == core count = saturated.
  • Steal time (st in top) — VM hypervisor stealing CPU. Common on noisy AWS hosts.
  • iowait — high = waiting on disk.
  • Context switches — high without obvious cause may mean lock contention.
  • Block layer + scheduler (none/mq-deadline/bfq for SSD).
  • iostat -x 1: %util ~100%, await high → disk bottleneck.
  • Page cache absorbs reads; writes go through fsync if requested.

Packet → NIC → softirq → kernel TCP stack → socket buffer → app. Backwards on send.

  • nf_conntrack table tracks connections (NAT). Full = drops. Tune nf_conntrack_max.
  • net.core.somaxconn — listen backlog cap.
  • net.ipv4.ip_local_port_range — ephemeral ports.
  • Cgroups (v2) limit & account: cpu, memory, io, pids.
  • Namespaces isolate: PID, NET, MNT, IPC, UTS, USER, CGROUP.
  • Together = container.
  • systemd uses cgroups for its services too (slice/scope).
  • Numbered open-file refs.
  • ulimit -n = max per-process.
  • “Too many open files” → leaked fds (sockets, files, epoll).
  • Diagnose: lsof -p PID | wc -l, watch over time.

Common in interviews:

  • SIGTERM (15) → graceful stop.
  • SIGINT (2) → ctrl+c.
  • SIGKILL (9) → uncatchable.
  • SIGHUP (1) → reload config (often).
  • SIGUSR1/2 → app-specific.

App should handle SIGTERM (drain, close), not just exit.

Modern Linux uses systemd. Order: kernel → initramfs → systemd PID 1 → unit dependency tree → multi-user.target / graphical.target.

Service unit (.service), timer unit (.timer), socket unit (.socket — socket activation), mount unit, etc.

  1. Process in D state for hours — what now? Kernel can’t kill it (uninterruptible). Often stuck I/O — need reboot or fix the underlying device.
  2. High load but low CPU — explanation? Many processes in D state (I/O wait), or thread contention.
  3. iowait is 50% — what to check? Slow disk (iostat), large queue, many fsyncs, NFS lag.
  4. top shows 200% CPU — meaning? Process using >1 core (multi-thread). 200% ≈ 2 cores fully used.
  5. Process can’t open more sockets — debug. ulimit -n; cat /proc/PID/limits; ls /proc/PID/fd | wc -l; lsof to see what’s open.
  6. kswapd at 100% — what? Memory pressure → reclaiming pages. Free up RAM, disable swap, tune vm.swappiness.
  7. ss -s shows 100k orphan sockets — diagnose. TIME_WAIT pileup or app not closing connections; tune kernel or fix app.
  8. Out of inodes but disk has space. df -i. Many small files (tmp, sessions, logs). Different ext4 mkfs option needed if rebuilding.
  9. systemd service crashes on startup — debug. journalctl -u svc -e -f; check ExecStart syntax, env, dependencies, capabilities.
  10. Container can’t reach the internet. Check pod IP, service DNS, NetworkPolicy, node iptables, security group, VPC route to NAT.
  • Running services as root.
  • No log rotation.
  • Manual ad-hoc chown -R recursive on /.
  • Adding swap to “fix” memory issues without diagnosing.
  • Editing /etc/fstab without checking blkid first.
  • Running every cron as root.
  • Ignoring dmesg / journalctl -k after weird issues.