Performance & Profiling — Practical
Performance & Profiling — Practical
Section titled “Performance & Profiling — Practical”Node.js
Section titled “Node.js”# Built-in CPU profilenode --prof app.js# generates isolate-*.log; process:node --prof-process isolate-*.log > profile.txt
# clinic (event loop, GC, heap)npm i -g clinicclinic doctor -- node app.js # diagnosesclinic flame -- node app.js # flame graphclinic bubbleprof -- node app.js # async ops
# 0x flame graphnpx 0x app.js
# heap snapshot via SIGUSR2node --heapsnapshot-signal=SIGUSR2 app.jskill -USR2 <pid>Inspect with Chrome DevTools: node --inspect=0.0.0.0:9229 app.js → chrome://inspect.
// event loop lag in codeimport { monitorEventLoopDelay } from 'node:perf_hooks';const h = monitorEventLoopDelay({ resolution: 20 });h.enable();setInterval(() => console.log('p99 lag', h.percentile(99) / 1e6, 'ms'), 5000);Python
Section titled “Python”# cProfile + snakevizpython -m cProfile -o prof.out app.pysnakeviz prof.out
# py-spy (no code change, attaches)py-spy record -o flame.svg --pid <PID> --duration 30py-spy top --pid <PID>
# scalene (CPU + memory + GPU)scalene app.py
# tracemalloc (memory)import tracemalloctracemalloc.start()# ...snap = tracemalloc.take_snapshot()for s in snap.statistics('lineno')[:10]: print(s)
# memraypython -m memray run app.pypython -m memray flamegraph memray-app.binimport _ "net/http/pprof"go func() { http.ListenAndServe(":6060", nil) }()# CPUgo tool pprof -http=:8080 http://localhost:6060/debug/pprof/profile?seconds=30
# heapgo tool pprof -http=:8080 http://localhost:6060/debug/pprof/heap
# goroutinego tool pprof -http=:8080 http://localhost:6060/debug/pprof/goroutine
# benchmarkgo test -bench=. -benchmem -cpuprofile=cpu.out -memprofile=mem.outgo tool pprof cpu.out# async-profiler (recommended)./profiler.sh -d 30 -f flame.html <pid>
# JFR (Java Flight Recorder)jcmd <pid> JFR.start name=p settings=profile filename=p.jfr duration=30s
# heap dumpjcmd <pid> GC.heap_dump dump.hprof# inspect with Eclipse MATLinux-level
Section titled “Linux-level”# perf flamesudo perf record -F 99 -p <pid> -g -- sleep 30sudo perf script | stackcollapse-perf.pl | flamegraph.pl > flame.svg
# off-CPUsudo /usr/share/bcc/tools/profile -F 99 -p <pid> 30
# block latencysudo /usr/share/bcc/tools/biolatency 5
# tcptracersudo /usr/share/bcc/tools/tcptracerContinuous profiling
Section titled “Continuous profiling”Pyroscope (Node example):
import Pyroscope from '@pyroscope/nodejs';Pyroscope.init({ serverAddress: 'http://pyroscope:4040', appName: 'api',});Pyroscope.start();Parca for Go/eBPF-based, datadog continuous profiler for managed.
Load test
Section titled “Load test”# k6k6 run --vus 100 --duration 1m script.js
# wrk2 (constant-arrival, no coordinated omission)wrk2 -t8 -c200 -R 5000 --latency -d 1m http://localhost:3000/
# autocannonautocannon -c 100 -d 30 -p 10 http://localhost:3000/Always use constant-arrival-rate mode for honest tail latency.
DB profiling
Section titled “DB profiling”PostgreSQL:
EXPLAIN (ANALYZE, BUFFERS) SELECT ...;
-- Top queriesSELECT calls, total_exec_time::int AS total, mean_exec_time::int AS mean, substr(query, 1, 80) AS qFROM pg_stat_statements ORDER BY total_exec_time DESC LIMIT 10;MongoDB:
db.coll.explain('executionStats').find({...});db.setProfilingLevel(1, { slowms: 50 });db.system.profile.find().sort({millis:-1}).limit(10);HTTP timing
Section titled “HTTP timing”curl -w '@-' -o /dev/null -s https://api/x <<'EOF'namelookup: %{time_namelookup}sconnect: %{time_connect}sappconnect: %{time_appconnect}spretransfer: %{time_pretransfer}sstarttransfer: %{time_starttransfer}stotal: %{time_total}sEOFOptimization checklist
Section titled “Optimization checklist”- DB queries indexed; no N+1.
- No unbounded data fetches (paginate).
- Caching with explicit TTL.
- Async I/O properly (no sync calls on event loop).
- Connection pooling sized to load.
- Serialization minimized for hot paths.
- Logs / metrics not blocking.
- Resource limits set (memory, CPU, fds).
- Backpressure handled (queues, rate limit).
- Profiles in CI on representative load.
When metrics tell vs profiles tell
Section titled “When metrics tell vs profiles tell”| Metric | Likely tool |
|---|---|
| p99 latency up | tracing, then CPU profile of hot service |
| memory growth | heap snapshot, allocation profile |
| CPU saturation | sampling CPU profile |
| Event loop lag | perf_hooks, async profiler |
| GC time high | runtime stats, heap profile |
| DB slow | DB explain + indexes |
| Cold-start latency | per-step timing, lazy-load profile |