Lesson 33: Project — Code Review Agent
Course: AI Code Agents | Duration: 55 minutes | Level: Intermediate
Learning Objectives
By the end of this lesson, you will be able to:
- Design a code review agent with clear output specifications
- Implement multi-file context gathering for comprehensive review
- Structure agent output for human consumption and downstream processing
- Handle the edge cases unique to code review workflows
Prerequisites
- Sections 1-4 complete
- Python 3.9+; Anthropic SDK installed
Part 1: What Makes a Good Code Review Agent
A code review agent is more nuanced than it first appears. The naive implementation — "read the file, ask Claude to find bugs" — works poorly in practice because:
Context matters. Claude finding a "missing null check" in a function is only meaningful if it knows whether null inputs are expected. The agent needs context: what does the project's error handling convention look like? What does the calling code look like?
Specificity matters. "This could be improved" is useless feedback. "Line 47: the email.split('@') operation will raise IndexError if the string has no '@' character. Replace with email.partition('@') which returns a tuple and never raises." is actionable.
Coverage matters. A review that misses the critical security issue while correctly noting the suboptimal variable name is worse than no review. Prioritize high-severity findings.
Structure matters. Reviewers need to quickly scan a review. Structure matters for usability: group by severity, use consistent format, provide line references.
The design principles for our code review agent:
- Read the file under review
- Read the test file for that module (if it exists) — tests reveal intent
- Run the linter to catch style issues automatically
- Ask Claude to focus on correctness, security, and missing error handling
- Output a structured markdown review, grouped by severity
Part 2: The Code Review Agent Implementation
#!/usr/bin/env python3
"""Code Review Agent — reviews Python source files and outputs a structured review."""
import anthropic
import subprocess
from pathlib import Path
import sys
import json
client = anthropic.Anthropic()
SYSTEM_PROMPT = """You are a senior Python engineer performing thorough code review.
Your review format must be:
# Code Review: {filename}
## Critical Issues
Issues that will cause bugs, security vulnerabilities, or data loss. MUST be fixed.
Format: **Line N:** [issue description] — [specific fix]
## Warnings
Code that may cause bugs in edge cases, missing error handling, performance problems.
Format: **Line N:** [issue description] — [suggestion]
## Style / Minor
PEP 8 violations, naming issues, unnecessary complexity, missing docstrings.
Format: **Line N:** [issue description]
## Summary
One paragraph: overall code quality assessment.
## Verdict
One of: APPROVE | NEEDS CHANGES | MAJOR ISSUES
Rules:
- Be specific: always include line numbers
- Be actionable: every issue must have a concrete fix suggestion
- Do NOT report issues that are pre-existing in the codebase unless they are security-critical
- If a section has no issues, write "None"
"""
TOOLS = [
{
"name": "read_file",
"description": "Read the contents of a source file.",
"input_schema": {
"type": "object",
"properties": {
"path": {"type": "string", "description": "File path relative to project root"}
},
"required": ["path"]
}
},
{
"name": "run_linter",
"description": "Run ruff linter on a file and return violations with line numbers.",
"input_schema": {
"type": "object",
"properties": {
"path": {"type": "string", "description": "File path to lint"}
},
"required": ["path"]
}
},
{
"name": "find_test_file",
"description": "Find the test file for a given source file (if it exists). Returns the path or 'not found'.",
"input_schema": {
"type": "object",
"properties": {
"source_path": {"type": "string", "description": "Source file path"}
},
"required": ["source_path"]
}
},
{
"name": "write_review",
"description": "Write the completed review to an output file.",
"input_schema": {
"type": "object",
"properties": {
"output_path": {"type": "string", "description": "Output file path"},
"content": {"type": "string", "description": "Review content in markdown"}
},
"required": ["output_path", "content"]
}
}
]
def execute_tool(name: str, args: dict, project_root: Path) -> str:
if name == "read_file":
p = project_root / args["path"]
if not p.exists():
return f"File not found: {args['path']}"
return p.read_text(encoding="utf-8")
elif name == "run_linter":
p = project_root / args["path"]
result = subprocess.run(
["ruff", "check", str(p), "--output-format", "text"],
capture_output=True, text=True
)
return result.stdout or "No lint issues found."
elif name == "find_test_file":
source = Path(args["source_path"])
stem = source.stem
candidates = [
project_root / "tests" / f"test_{stem}.py",
project_root / "tests" / source.parent / f"test_{stem}.py",
project_root / source.parent / f"test_{stem}.py",
]
for candidate in candidates:
if candidate.exists():
return str(candidate.relative_to(project_root))
return "not found"
elif name == "write_review":
p = project_root / args["output_path"]
p.parent.mkdir(parents=True, exist_ok=True)
p.write_text(args["content"], encoding="utf-8")
return f"Review written to {p}"
return f"Unknown tool: {name}"
def review_file(source_path: str, output_path: str, project_root: str = ".") -> str:
root = Path(project_root).resolve()
task = f"""Please perform a thorough code review of {source_path}.
Workflow:
1. Read the source file: {source_path}
2. Find and read the test file (if it exists) to understand expected behavior
3. Run the linter on the source file
4. Identify all issues: correctness problems, security issues, missing error handling, style violations
5. Write a comprehensive structured review to {output_path}
Focus especially on:
- Functions that don't validate their inputs
- Exception handling that swallows errors silently
- SQL queries or shell commands built with string concatenation
- Functions with unclear error returns (returning None vs raising exceptions inconsistently)
"""
messages = [{"role": "user", "content": task}]
for _ in range(25):
response = client.messages.create(
model="claude-sonnet-4-5",
max_tokens=4096,
system=SYSTEM_PROMPT,
tools=TOOLS,
messages=messages
)
if response.stop_reason == "end_turn":
for block in response.content:
if hasattr(block, "text"):
return block.text
return "Review complete."
messages.append({"role": "assistant", "content": response.content})
tool_results = []
for block in response.content:
if block.type == "tool_use":
result = execute_tool(block.name, block.input, root)
tool_results.append({
"type": "tool_result",
"tool_use_id": block.id,
"content": result
})
messages.append({"role": "user", "content": tool_results})
return "Error: agent did not complete review within iteration limit"
if __name__ == "__main__":
if len(sys.argv) < 3:
print("Usage: python review_agent.py <source_file.py> <output_review.md> [project_root]")
sys.exit(1)
source = sys.argv[1]
output = sys.argv[2]
root = sys.argv[3] if len(sys.argv) > 3 else "."
print(f"Reviewing {source}...")
result = review_file(source, output, root)
print(f"Done. Review written to {output}")
print(result)Part 3: Running and Testing the Agent
# Install dependencies
uv pip install anthropic ruff
# Review a file
python review_agent.py src/auth.py reviews/auth-review.md
# Review multiple files (shell loop)
for f in src/*.py; do
stem=$(basename $f .py)
python review_agent.py "$f" "reviews/${stem}-review.md" && echo "Done: $f"
donePart 4: Edge Cases and Improvements
Handling very long files: Files over 500 lines can consume too much context. For these, pass the file in chunks — review a section at a time, then ask Claude to synthesize a final review.
Diff-only review: For CI review, you often want to review only the changed lines, not the entire file. Add a get_diff tool that calls git diff HEAD~1 -- {file} and have the agent focus review on the diff.
Severity calibration: Out of the box, Claude may over-report minor style issues relative to serious bugs. Add explicit guidance in the system prompt: "If in doubt between Warnings and Critical, prefer Warnings. Reserve Critical for issues that will definitely cause runtime errors or security vulnerabilities."
Key Takeaways
- Good code review agents need: contextual reading (source + tests), automated linting, focused AI analysis on high-value issues, structured output with line numbers
- The system prompt shapes output format — be explicit about the exact structure you want
- Tool selection matters:
find_test_fileandrun_linterprovide grounding that pure LLM analysis lacks - Edge cases to handle: very long files, diff-only review for CI, severity calibration
Next Lesson: In Lesson 34: Project — Test Generation Agent, we build an agent that reads source code and generates a comprehensive test suite — handling interface discovery, edge cases, and test framework conventions.
Back to Section Overview | Next Lesson: Test Generation Agent →