Performance Profiling

You are profiling and optimising: $ARGUMENTS

Performance work without measurement is guessing. Every optimisation must have a before number and an after number. Never optimise based on intuition alone. Profile first, then fix exactly what the data shows.


Principles

  1. Measure before you optimise. Write down current performance numbers before touching code.
  2. Find the bottleneck first. Optimising the wrong layer wastes time. Profile CPU, memory, I/O, and network before deciding where to focus.
  3. Optimise the hottest path. If one function consumes 80% of the time, optimise that. Don't spread effort across 10 functions that each contribute 2%.
  4. Validate every optimisation. Run the benchmark again. Confirm the improvement. Commit the numbers.
  5. Correctness first. An optimisation that introduces a bug is a regression, not an improvement.

1. Establish a Baseline

Before any changes, record:

Metric               | Current | Target | Notes
---------------------|---------|--------|----------------------------
P50 response time    |         |        | median requests
P95 response time    |         |        | slow requests
P99 response time    |         |        | worst case
Throughput (req/s)   |         |        | under normal load
Memory RSS           |         |        | after warm-up
CPU %                |         |        | under normal load
DB queries / request |         |        | via ORM logging or APM

2. Backend Profiling

Node.js

# CPU profile — find hot functions
node --prof server.js
# then run your load test
hey -n 1000 -c 50 http://localhost:3000/api/orders
# process the log
node --prof-process isolate-*.log > profile.txt
grep -A5 "Bottom up (heavy) profile" profile.txt

# Memory snapshot
node --inspect server.js
# Open chrome://inspect → take heap snapshot → analyse

# Built-in profiler (no restart needed)
kill -USR1 $(pgrep node)      # start profiling
kill -USR2 $(pgrep node)      # stop and write isolate-*.log

Python

# cProfile — function-level CPU profiling
python -m cProfile -o profile.bin server.py
python -c "
import pstats, io
s = pstats.Stats('profile.bin')
s.sort_stats('cumulative')
s.print_stats(20)
"

# Line profiler — line-by-line for hot functions
pip install line_profiler
# add @profile decorator to the slow function
kernprof -l -v server.py

# Memory profiler
pip install memory_profiler
python -m memory_profiler slow_function.py

PHP

# Xdebug profiling
php -d xdebug.mode=profile -d xdebug.output_dir=/tmp script.php
# Open cachegrind file in KCachegrind or QCachegrind

# Blackfire (production-safe, sampling-based)
blackfire run php script.php
blackfire curl http://localhost:8080/api/orders

Go

# pprof CPU profile
go test -cpuprofile=cpu.out ./...
go tool pprof cpu.out
# In interactive mode:
# top10 — top 10 functions by CPU
# web   — open graph in browser

# via HTTP endpoint (add to main.go):
import _ "net/http/pprof"
# then:
go tool pprof http://localhost:6060/debug/pprof/profile?seconds=30

3. Database Profiling

Find the Slowest Queries

-- PostgreSQL: pg_stat_statements
SELECT
  left(query, 80)             AS query,
  calls,
  round(mean_exec_time)       AS avg_ms,
  round(total_exec_time / 60000) AS total_min,
  rows / calls                AS avg_rows
FROM pg_stat_statements
WHERE calls > 100
ORDER BY mean_exec_time DESC
LIMIT 20;

-- MySQL: slow query log
SET GLOBAL slow_query_log = 'ON';
SET GLOBAL long_query_time = 0.1;  -- log queries > 100ms
-- Review: /var/log/mysql/slow.log
-- Or use: pt-query-digest /var/log/mysql/slow.log

EXPLAIN the Slow Queries

-- Postgres: use EXPLAIN ANALYZE BUFFERS for real metrics
EXPLAIN (ANALYZE, BUFFERS, FORMAT TEXT)
SELECT o.*, u.email
FROM orders o
JOIN users u ON u.id = o.user_id
WHERE o.status = 'pending'
ORDER BY o.created_at DESC
LIMIT 20;

Red flags in EXPLAIN output:

  • Seq Scan on a large table (missing index)
  • Nested Loop with high row estimates (might need a Hash Join)
  • cost=0..99999 — extremely high estimated cost
  • actual rows >> estimated rows — stale statistics; run ANALYZE

N+1 Query Detection

# Enable query logging in development
# Laravel
DB::listen(fn($q) => Log::info($q->sql, ['bindings' => $q->bindings]));

# Rails
config.log_level = :debug

# Count queries during a request (look for repeated similar queries)
# A sign: "SELECT * FROM users WHERE id = ?" appearing 50 times in one request

4. Network and I/O

# Find slow external calls
# Node.js: trace-events
node --trace-events-enabled --trace-event-categories node.async_hooks app.js

# strace — what syscalls is the process making?
strace -p $(pgrep -n php-fpm) -e trace=network,file 2>&1 | head -100

# tcpdump — actual network traffic
tcpdump -i lo -w /tmp/capture.pcap port 5432   # watch DB traffic
tcpdump -r /tmp/capture.pcap -A | grep -v "^$" | head -100

5. Load Testing

Establish the load profile before optimising for it:

# hey — simple HTTP load tester
hey -n 10000 -c 100 -m GET http://localhost:8080/api/orders

# wrk — advanced with Lua scripts
wrk -t4 -c100 -d30s --latency http://localhost:8080/api/orders

# k6 — scriptable, great for complex scenarios
k6 run --vus 50 --duration 60s load-test.js

6. Common Bottlenecks and Fixes

Symptom Likely Cause Fix
High DB CPU, slow queries Missing index CREATE INDEX CONCURRENTLY
Many DB queries per request N+1 query Eager-load associations
High memory, growing over time Memory leak Heap snapshot diff
Slow on first request after idle Cold start / connection setup Connection pool + warmup
High P99, low P50 Occasional GC pause or lock Profile with APM; check lock contention
Slow file I/O Synchronous reads in hot path Use async I/O or cache in memory
Slow external API calls No timeout / no caching Add timeout; cache idempotent responses

7. Documenting Results

After every optimisation, record in TODO.md or a PR description:

## Optimisation: Eager-load user associations in orders index

### Before
- P50: 320ms  P95: 890ms
- 52 SQL queries per request (N+1 on user.email)

### Change
Added `.include(:user)` to orders query.

### After
- P50: 45ms   P95: 62ms
- 2 SQL queries per request

### Benchmark
hey -n 1000 -c 20 http://localhost:3000/api/orders