Performance Profiling
You are profiling and optimising: $ARGUMENTS
Performance work without measurement is guessing. Every optimisation must have a before number and an after number. Never optimise based on intuition alone. Profile first, then fix exactly what the data shows.
Principles
- Measure before you optimise. Write down current performance numbers before touching code.
- Find the bottleneck first. Optimising the wrong layer wastes time. Profile CPU, memory, I/O, and network before deciding where to focus.
- Optimise the hottest path. If one function consumes 80% of the time, optimise that. Don't spread effort across 10 functions that each contribute 2%.
- Validate every optimisation. Run the benchmark again. Confirm the improvement. Commit the numbers.
- Correctness first. An optimisation that introduces a bug is a regression, not an improvement.
1. Establish a Baseline
Before any changes, record:
Metric | Current | Target | Notes
---------------------|---------|--------|----------------------------
P50 response time | | | median requests
P95 response time | | | slow requests
P99 response time | | | worst case
Throughput (req/s) | | | under normal load
Memory RSS | | | after warm-up
CPU % | | | under normal load
DB queries / request | | | via ORM logging or APM
2. Backend Profiling
Node.js
# CPU profile — find hot functions
node --prof server.js
# then run your load test
hey -n 1000 -c 50 http://localhost:3000/api/orders
# process the log
node --prof-process isolate-*.log > profile.txt
grep -A5 "Bottom up (heavy) profile" profile.txt
# Memory snapshot
node --inspect server.js
# Open chrome://inspect → take heap snapshot → analyse
# Built-in profiler (no restart needed)
kill -USR1 $(pgrep node) # start profiling
kill -USR2 $(pgrep node) # stop and write isolate-*.log
Python
# cProfile — function-level CPU profiling
python -m cProfile -o profile.bin server.py
python -c "
import pstats, io
s = pstats.Stats('profile.bin')
s.sort_stats('cumulative')
s.print_stats(20)
"
# Line profiler — line-by-line for hot functions
pip install line_profiler
# add @profile decorator to the slow function
kernprof -l -v server.py
# Memory profiler
pip install memory_profiler
python -m memory_profiler slow_function.py
PHP
# Xdebug profiling
php -d xdebug.mode=profile -d xdebug.output_dir=/tmp script.php
# Open cachegrind file in KCachegrind or QCachegrind
# Blackfire (production-safe, sampling-based)
blackfire run php script.php
blackfire curl http://localhost:8080/api/orders
Go
# pprof CPU profile
go test -cpuprofile=cpu.out ./...
go tool pprof cpu.out
# In interactive mode:
# top10 — top 10 functions by CPU
# web — open graph in browser
# via HTTP endpoint (add to main.go):
import _ "net/http/pprof"
# then:
go tool pprof http://localhost:6060/debug/pprof/profile?seconds=30
3. Database Profiling
Find the Slowest Queries
-- PostgreSQL: pg_stat_statements
SELECT
left(query, 80) AS query,
calls,
round(mean_exec_time) AS avg_ms,
round(total_exec_time / 60000) AS total_min,
rows / calls AS avg_rows
FROM pg_stat_statements
WHERE calls > 100
ORDER BY mean_exec_time DESC
LIMIT 20;
-- MySQL: slow query log
SET GLOBAL slow_query_log = 'ON';
SET GLOBAL long_query_time = 0.1; -- log queries > 100ms
-- Review: /var/log/mysql/slow.log
-- Or use: pt-query-digest /var/log/mysql/slow.log
EXPLAIN the Slow Queries
-- Postgres: use EXPLAIN ANALYZE BUFFERS for real metrics
EXPLAIN (ANALYZE, BUFFERS, FORMAT TEXT)
SELECT o.*, u.email
FROM orders o
JOIN users u ON u.id = o.user_id
WHERE o.status = 'pending'
ORDER BY o.created_at DESC
LIMIT 20;
Red flags in EXPLAIN output:
Seq Scanon a large table (missing index)Nested Loopwith high row estimates (might need a Hash Join)cost=0..99999— extremely high estimated costactual rows>>estimated rows— stale statistics; runANALYZE
N+1 Query Detection
# Enable query logging in development
# Laravel
DB::listen(fn($q) => Log::info($q->sql, ['bindings' => $q->bindings]));
# Rails
config.log_level = :debug
# Count queries during a request (look for repeated similar queries)
# A sign: "SELECT * FROM users WHERE id = ?" appearing 50 times in one request
4. Network and I/O
# Find slow external calls
# Node.js: trace-events
node --trace-events-enabled --trace-event-categories node.async_hooks app.js
# strace — what syscalls is the process making?
strace -p $(pgrep -n php-fpm) -e trace=network,file 2>&1 | head -100
# tcpdump — actual network traffic
tcpdump -i lo -w /tmp/capture.pcap port 5432 # watch DB traffic
tcpdump -r /tmp/capture.pcap -A | grep -v "^$" | head -100
5. Load Testing
Establish the load profile before optimising for it:
# hey — simple HTTP load tester
hey -n 10000 -c 100 -m GET http://localhost:8080/api/orders
# wrk — advanced with Lua scripts
wrk -t4 -c100 -d30s --latency http://localhost:8080/api/orders
# k6 — scriptable, great for complex scenarios
k6 run --vus 50 --duration 60s load-test.js
6. Common Bottlenecks and Fixes
| Symptom | Likely Cause | Fix |
|---|---|---|
| High DB CPU, slow queries | Missing index | CREATE INDEX CONCURRENTLY |
| Many DB queries per request | N+1 query | Eager-load associations |
| High memory, growing over time | Memory leak | Heap snapshot diff |
| Slow on first request after idle | Cold start / connection setup | Connection pool + warmup |
| High P99, low P50 | Occasional GC pause or lock | Profile with APM; check lock contention |
| Slow file I/O | Synchronous reads in hot path | Use async I/O or cache in memory |
| Slow external API calls | No timeout / no caching | Add timeout; cache idempotent responses |
7. Documenting Results
After every optimisation, record in TODO.md or a PR description:
## Optimisation: Eager-load user associations in orders index
### Before
- P50: 320ms P95: 890ms
- 52 SQL queries per request (N+1 on user.email)
### Change
Added `.include(:user)` to orders query.
### After
- P50: 45ms P95: 62ms
- 2 SQL queries per request
### Benchmark
hey -n 1000 -c 20 http://localhost:3000/api/orders