Skip to content

Performance & Profiling — Practical

Terminal window
# Built-in CPU profile
node --prof app.js
# generates isolate-*.log; process:
node --prof-process isolate-*.log > profile.txt
# clinic (event loop, GC, heap)
npm i -g clinic
clinic doctor -- node app.js # diagnoses
clinic flame -- node app.js # flame graph
clinic bubbleprof -- node app.js # async ops
# 0x flame graph
npx 0x app.js
# heap snapshot via SIGUSR2
node --heapsnapshot-signal=SIGUSR2 app.js
kill -USR2 <pid>

Inspect with Chrome DevTools: node --inspect=0.0.0.0:9229 app.js → chrome://inspect.

// event loop lag in code
import { monitorEventLoopDelay } from 'node:perf_hooks';
const h = monitorEventLoopDelay({ resolution: 20 });
h.enable();
setInterval(() => console.log('p99 lag', h.percentile(99) / 1e6, 'ms'), 5000);
Terminal window
# cProfile + snakeviz
python -m cProfile -o prof.out app.py
snakeviz prof.out
# py-spy (no code change, attaches)
py-spy record -o flame.svg --pid <PID> --duration 30
py-spy top --pid <PID>
# scalene (CPU + memory + GPU)
scalene app.py
# tracemalloc (memory)
import tracemalloc
tracemalloc.start()
# ...
snap = tracemalloc.take_snapshot()
for s in snap.statistics('lineno')[:10]: print(s)
# memray
python -m memray run app.py
python -m memray flamegraph memray-app.bin
import _ "net/http/pprof"
go func() { http.ListenAndServe(":6060", nil) }()
Terminal window
# CPU
go tool pprof -http=:8080 http://localhost:6060/debug/pprof/profile?seconds=30
# heap
go tool pprof -http=:8080 http://localhost:6060/debug/pprof/heap
# goroutine
go tool pprof -http=:8080 http://localhost:6060/debug/pprof/goroutine
# benchmark
go test -bench=. -benchmem -cpuprofile=cpu.out -memprofile=mem.out
go tool pprof cpu.out
Terminal window
# async-profiler (recommended)
./profiler.sh -d 30 -f flame.html <pid>
# JFR (Java Flight Recorder)
jcmd <pid> JFR.start name=p settings=profile filename=p.jfr duration=30s
# heap dump
jcmd <pid> GC.heap_dump dump.hprof
# inspect with Eclipse MAT
Terminal window
# perf flame
sudo perf record -F 99 -p <pid> -g -- sleep 30
sudo perf script | stackcollapse-perf.pl | flamegraph.pl > flame.svg
# off-CPU
sudo /usr/share/bcc/tools/profile -F 99 -p <pid> 30
# block latency
sudo /usr/share/bcc/tools/biolatency 5
# tcptracer
sudo /usr/share/bcc/tools/tcptracer

Pyroscope (Node example):

import Pyroscope from '@pyroscope/nodejs';
Pyroscope.init({
serverAddress: 'http://pyroscope:4040',
appName: 'api',
});
Pyroscope.start();

Parca for Go/eBPF-based, datadog continuous profiler for managed.

Terminal window
# k6
k6 run --vus 100 --duration 1m script.js
# wrk2 (constant-arrival, no coordinated omission)
wrk2 -t8 -c200 -R 5000 --latency -d 1m http://localhost:3000/
# autocannon
autocannon -c 100 -d 30 -p 10 http://localhost:3000/

Always use constant-arrival-rate mode for honest tail latency.

PostgreSQL:

EXPLAIN (ANALYZE, BUFFERS) SELECT ...;
-- Top queries
SELECT calls, total_exec_time::int AS total, mean_exec_time::int AS mean,
substr(query, 1, 80) AS q
FROM pg_stat_statements ORDER BY total_exec_time DESC LIMIT 10;

MongoDB:

db.coll.explain('executionStats').find({...});
db.setProfilingLevel(1, { slowms: 50 });
db.system.profile.find().sort({millis:-1}).limit(10);
Terminal window
curl -w '@-' -o /dev/null -s https://api/x <<'EOF'
namelookup: %{time_namelookup}s
connect: %{time_connect}s
appconnect: %{time_appconnect}s
pretransfer: %{time_pretransfer}s
starttransfer: %{time_starttransfer}s
total: %{time_total}s
EOF
  • DB queries indexed; no N+1.
  • No unbounded data fetches (paginate).
  • Caching with explicit TTL.
  • Async I/O properly (no sync calls on event loop).
  • Connection pooling sized to load.
  • Serialization minimized for hot paths.
  • Logs / metrics not blocking.
  • Resource limits set (memory, CPU, fds).
  • Backpressure handled (queues, rate limit).
  • Profiles in CI on representative load.
MetricLikely tool
p99 latency uptracing, then CPU profile of hot service
memory growthheap snapshot, allocation profile
CPU saturationsampling CPU profile
Event loop lagperf_hooks, async profiler
GC time highruntime stats, heap profile
DB slowDB explain + indexes
Cold-start latencyper-step timing, lazy-load profile