Code Review

You are performing a thorough code review of $ARGUMENTS.

A great code review is not a style guide enforcement pass — it is a safety and quality gate. Your job is to find problems that matter: bugs that will reach production, security holes, design decisions that will hurt the team in six months, and missing tests for critical paths.

Review Philosophy

Be specific. Every comment must reference an exact line or block.
Be kind, be direct. Critique the code, not the author. Never soften a critical finding into an "optional" suggestion.
Prioritise ruthlessly. Critical and High findings must be fixed before merging. Medium findings should be addressed. Low findings are optional.
Praise what deserves it. Note genuinely good patterns — it teaches what to do more of.

Severity Scale

Level	Meaning
🔴 Critical	Will cause data loss, security breach, crash in production, or data corruption. Block merge.
🟠 High	Incorrect behaviour, likely bugs, significant security risk. Block merge.
🟡 Medium	Fragile code, maintainability issue, missing test for important case. Should fix.
🟢 Low	Style, naming, minor improvement. Optional.
💡 Suggestion	Alternative approach worth considering. No action required.

Step 1 — Read Everything First

Before commenting on anything:

Read the entire change from top to bottom.
Understand the intent: What is this meant to do?
Identify the critical paths: What would break if this were wrong?
Only then begin the structured review.

Step 2 — Correctness Review

Ask: Does this code do what it claims to do?

[ ] Does the logic handle the happy path correctly?
[ ] Are all edge cases handled?
- Empty collections, null/undefined values, zero, negative numbers
- First and last element of a list
- Single-element collections
- Maximum and minimum allowed values
[ ] Are all error paths handled explicitly (not swallowed silently)?
[ ] Are there off-by-one errors in loops or index calculations?
[ ] Are concurrent operations safe? (If two requests run simultaneously, does this break?)
[ ] Are there any implicit assumptions about ordering that may not hold?

Step 3 — Security Review

Ask: Can this be exploited?

[ ] Is every user-supplied input validated before use?
[ ] Are all database queries parameterised (no string concatenation)?
[ ] Is output HTML-escaped before rendering?
[ ] Are file paths sanitised against directory traversal (../)?
[ ] Are authorisation checks present on every operation that modifies sensitive data?
[ ] Are secrets or credentials anywhere in the code or commits?
[ ] Are HTTP requests to external URLs validated against an allowlist?
[ ] Could this code be used to enumerate user accounts or trigger excessive resource use?

Step 4 — Performance Review

Ask: Will this be acceptable under realistic load?

[ ] Are there N+1 query patterns? (Loop that runs a query per item)
[ ] Are there unbounded queries? (No LIMIT, no pagination)
[ ] Are there synchronous operations that should be async? (Email, PDF generation)
[ ] Is expensive computation cached where appropriate?
[ ] Are there unnecessary repeated operations that could be computed once?
[ ] Does this hold locks or transactions longer than necessary?

Step 5 — Design and Maintainability Review

Ask: Will the next developer understand this in six months?

[ ] Is each function doing exactly one thing?
[ ] Is the code at the right level of abstraction? (Neither too clever nor too verbose)
[ ] Are complex business rules documented with a comment explaining the "why"?
[ ] Is there duplication that should be extracted into a shared function?
[ ] Are names clear and precise? (getUserData() vs fetchActiveUsersByRegion())
[ ] Is error handling consistent with the rest of the codebase?
[ ] Are any magic numbers or strings replaced with named constants?
[ ] Does this introduce circular dependencies or tangled module relationships?

Step 6 — Test Coverage Review

Ask: How would we know if this broke?

[ ] Is there at least one test per significant code path?
[ ] Are error/failure paths tested (not just the happy path)?
[ ] Do the tests verify behaviour or just cover lines?
[ ] Are tests independent (no reliance on execution order or shared mutable state)?
[ ] Would any test catch a regression if this code was deleted and replaced with a no-op?

Output Format

Structure the review as follows:

## Summary
[2-3 sentences: overall impression, must-fix issues count, recommendation: APPROVE / REQUEST CHANGES / NEEDS DISCUSSION]

## Critical Findings
### [C-01] Title
**Severity:** 🔴 Critical
**Location:** `filename.ext:line_number`
**Issue:** [What is wrong]
**Risk:** [What happens if not fixed]
**Fix:** [Specific change to make]

## High Findings
[Same format]

## Medium Findings
[Same format — can group minor items]

## Low / Suggestions
[Lighter format: location → issue → suggestion]

## What's Done Well
[Specific patterns or decisions worth noting]

After the review, output a final one-line verdict:

VERDICT: APPROVE | REQUEST CHANGES | NEEDS DISCUSSION