Performance & Profiling — Basics

Profile types

CPU sampling — periodically capture stack; aggregate. Low overhead. Most common.
CPU instrumentation — exact counts, higher overhead. Rarely needed.
Heap / memory — allocations or live objects. Find leaks.
Wall-clock — includes time spent waiting (I/O), unlike CPU.
Off-CPU — time blocked. Useful for I/O-bound services.
Lock / contention — where threads wait for mutexes.
Goroutine / event loop — language-specific.

For each resource (CPU, mem, disk, net, FD, queue):

Track time in: on-CPU, runnable, sleep (I/O wait), blocked (lock).

Visualization: x = aggregated time, y = stack depth. Wide bars = hot. Excellent for spotting hot functions.

Tools: brendangregg’s flamegraph.pl, async-profiler (Java), pprof (Go), pprof-rs (Rust), py-spy (Python), 0x (Node), Pyroscope (continuous).

Read top-down: stacks share prefix (parent frames). Plateaus = hot leaf.

Tools: Chrome DevTools (Node), heaptrack (C/C++), pprof (Go heap), tracemalloc (Python), VisualVM/Eclipse MAT (Java).

When demand > capacity:

Solutions:

Language	CPU	Heap
Node	`node --prof`, `0x`, `clinic`, Chrome DevTools	DevTools heap snapshot, `--inspect`
Python	`cProfile`, `py-spy`, `scalene`	`tracemalloc`, `objgraph`, `memray`
Go	`pprof` (CPU/heap/goroutine/block)	same
Java/Kotlin	`async-profiler`, JFR	Eclipse MAT, VisualVM
Ruby	`stackprof`, `vernier`	`derailed_benchmarks`
.NET	`dotnet-trace`, `PerfView`	`dotnet-dump`, `dotMemory`

Always-on, low-overhead. Captures profiles even of issues that happen rarely.

Tools: Pyroscope, Parca, Datadog Continuous Profiler, Google Cloud Profiler, Polar Signals. Visualize flamegraphs over time, compare versions.

Use them together.