runloopai · vadim-rl · Jun 4, 2026
diff --git a/examples/README.md b/examples/README.md
@@ -15,6 +15,7 @@
 | Example | Description |
 |---------|-------------|
 | [deep_research](deep_research/) | Multi-step web research agent using Tavily for URL discovery, parallel sub-agents, and strategic reflection |
+| [code-reviewer](code-reviewer/) | Code review agent that analyzes source code for correctness, style, and security using static analysis and security auditing tools |
 | [content-builder-agent](content-builder-agent/) | Content writing agent that demonstrates memory (`AGENTS.md`), skills, and subagents for blog posts, LinkedIn posts, and tweets with generated images |
 | [text-to-sql-agent](text-to-sql-agent/) | Natural language to SQL agent with planning, skill-based workflows, and the Chinook demo database |
 | [ralph_mode](ralph_mode/) | Autonomous looping pattern that runs with fresh context each iteration, using the filesystem for persistence |

diff --git a/examples/code-reviewer/.env.example b/examples/code-reviewer/.env.example
@@ -0,0 +1,12 @@
+# API Keys for Code Reviewer Agent Example
+# Copy this file to .env and fill in your actual API keys
+
+# Anthropic API Key (for Claude Sonnet 4)
+ANTHROPIC_API_KEY=your_anthropic_api_key_here
+
+# OpenAI API Key (for GPT-4o-mini as alternative)
+OPENAI_API_KEY=your_openai_api_key_here
+
+# LangSmith API Key (required for LangGraph local server)
+# Get your key at: https://smith.langchain.com/settings
+LANGSMITH_API_KEY=lsv2_pt_your_api_key_here
diff --git a/examples/code-reviewer/README.md b/examples/code-reviewer/README.md
@@ -0,0 +1,152 @@
+# Code Reviewer
+
+## Quickstart
+
+**Prerequisites**: Install [uv](https://docs.astral.sh/uv/) package manager:
+
+```bash
+curl -LsSf https://astral.sh/uv/install.sh | sh
+```
+
+Ensure you are in the `code-reviewer` directory:
+
+```bash
+cd examples/code-reviewer
+```
+
+Install packages:
+
+```bash
+uv sync
+```
+
+Set your API keys in your environment:
+
+```bash
+export ANTHROPIC_API_KEY=your_anthropic_api_key_here  # Required for Claude model
+export GOOGLE_API_KEY=your_google_api_key_here        # Required for Gemini model ([get one here](https://ai.google.dev/gemini-api/docs))
+export LANGSMITH_API_KEY=your_langsmith_api_key_here  # [LangSmith API key](https://smith.langchain.com/settings) (free to sign up)
+```
+
+## Usage Options
+
+You can run this example in two ways:
+
+### Option 1: Direct Python Invocation
+
+Run the code reviewer directly from Python:
+
+```python
+from deepagents import create_deep_agent
+from langchain.chat_models import init_chat_model
+from code_reviewer.tools import analyze_code, security_audit, think_tool
+from code_reviewer.prompts import REVIEW_WORKFLOW_INSTRUCTIONS
+
+model = init_chat_model(model="anthropic:claude-sonnet-4-5-20250929", temperature=0.0)
+
+agent = create_deep_agent(
+    model=model,
+    tools=[analyze_code, security_audit, think_tool],
+    system_prompt=REVIEW_WORKFLOW_INSTRUCTIONS,
+)
+
+result = agent.invoke({
+    "messages": [{
+        "role": "user",
+        "content": """Review this Python code:
+
+def calculate_total(items):
+    total = 0
+    for i in range(len(items)):
+        total = total + items[i]
+    return total
+
+class order_processor:
+    def process(self, data, db_conn, user_id, auth_token, config, callback):
+        eval(data[\"query\"])
+        return \"done\"
+""",
+    }]
+})
+
+print(result["messages"][-1].content)
+```
+
+### Option 2: LangGraph Server
+
+Run a local [LangGraph server](https://langchain-ai.github.io/langgraph/tutorials/langgraph-platform/local-server/) with a web interface:
+
+```bash
+langgraph dev
+```
+
+LangGraph server will open a new browser window with the Studio interface, where you can submit your code for review.
+
+You can also connect the LangGraph server to a [UI specifically designed for deepagents](https://github.com/langchain-ai/deep-agents-ui):
+
+```bash
+git clone https://github.com/langchain-ai/deep-agents-ui.git
+cd deep-agents-ui
+yarn install
+yarn dev
+```
+
+Then follow the instructions in the [deep-agents-ui README](https://github.com/langchain-ai/deep-agents-ui?tab=readme-ov-file#connecting-to-a-langgraph-server) to connect the UI to the running LangGraph server.
+
+This provides a user-friendly interface for submitting code and viewing review reports.
+
+## Custom Model
+
+By default, `deepagents` uses `"claude-sonnet-4-5-20250929"`. You can customize this by passing any [LangChain model object](https://python.langchain.com/docs/integrations/chat/). See the Deep Agents package [README](https://github.com/langchain-ai/deepagents?tab=readme-ov-file#model) for more details.
+
+```python
+from langchain.chat_models import init_chat_model
+from deepagents import create_deep_agent
+
+# Using Claude
+model = init_chat_model(model="anthropic:claude-sonnet-4-5-20250929", temperature=0.0)
+
+# Using Gemini
+from langchain_google_genai import ChatGoogleGenerativeAI
+model = ChatGoogleGenerativeAI(model="gemini-3-pro-preview")
+
+agent = create_deep_agent(
+    model=model,
+)
+```
+
+## Custom Instructions
+
+The code reviewer agent uses custom instructions defined in `code_reviewer/prompts.py` that complement (rather than duplicate) the default middleware instructions. You can modify these in any way you want.
+
+| Instruction Set | Purpose |
+|----------------|---------|
+| `REVIEW_WORKFLOW_INSTRUCTIONS` | Defines the 6-step code review workflow: save request → plan with TODOs → analyze with sub-agents → synthesize → write report → verify. Includes reporting format with severity-based issue classification (Critical, Major, Minor, Suggestion). |
+| `SUBAGENT_DELEGATION_INSTRUCTIONS` | Provides delegation strategies for multi-file reviews. Defaults to 1 sub-agent per review; parallelizes only for logically independent files. Sets limits on parallel execution (max 3 concurrent) and iteration rounds (max 3). |
+| `REVIEWER_INSTRUCTIONS` | Guides individual review sub-agents to analyze code for correctness, style, and security. Includes hard limits on tool calls (2-3 for simple scripts, max 5 for complex code) and emphasizes using `think_tool` after each analysis. |
+
+## Custom Tools
+
+The code reviewer agent adds the following custom tools beyond the built-in deepagent tools. You can also use your own tools, including via MCP servers. See the Deep Agents package [README](https://github.com/langchain-ai/deepagents?tab=readme-ov-file#mcp) for more details.
+
+| Tool Name | Description |
+|-----------|-------------|
+| `analyze_code` | Static analysis tool that uses Python's `ast` module to parse and inspect code structure. Checks for syntax errors, missing docstrings, naming convention violations (snake_case for functions, PascalCase for classes), excessive parameter counts (>5), overly long functions (>50 lines), long lines (>100 chars), and TODO/FIXME markers. Returns a categorized report with severity levels. |
+| `security_audit` | Security vulnerability scanner using regex pattern matching. Detects dangerous functions (eval, exec, pickle), SQL injection via f-strings or string formatting, shell injection (os.system, subprocess with shell=True), hardcoded credentials, unsafe yaml.load, requests without timeouts, and security-relevant comments. Reports findings with severity levels (CRITICAL, HIGH, MEDIUM, LOW). |
+| `think_tool` | Strategic reflection mechanism that helps the agent pause and assess progress between analysis calls, evaluate findings, identify gaps, and plan next steps. |
+
+### Example Analysis Output
+
+When run against code with common issues, the agent produces a structured report like this:
+
+**Correctness:**
+- [critical] Line 12: `eval(data["query"])` can execute arbitrary code — use `ast.literal_eval()` or a safer alternative
+
+**Style:**
+- [minor] Class `order_processor` (line 8) violates PascalCase naming convention
+- [minor] Function `calculate_total` is missing a docstring
+- [minor] Use `for item in items` instead of `for i in range(len(items))` for cleaner iteration
+
+**Security:**
+- [CRITICAL] Line 12: Use of eval() — can execute arbitrary code
+- [HIGH] Line 9: Function `process` has 6 parameters (max recommended: 5)
diff --git a/examples/code-reviewer/agent.py b/examples/code-reviewer/agent.py
@@ -0,0 +1,53 @@
+"""Code Reviewer Agent - Standalone script for LangGraph deployment.
+
+This module creates a code review agent with custom tools and prompts
+for analyzing code correctness, style, and security.
+"""
+
+from deepagents import create_deep_agent
+from langchain.chat_models import init_chat_model
+
+from code_reviewer.prompts import (
+    REVIEWER_INSTRUCTIONS,
+    REVIEW_WORKFLOW_INSTRUCTIONS,
+    SUBAGENT_DELEGATION_INSTRUCTIONS,
+)
+from code_reviewer.tools import analyze_code, security_audit, think_tool
+
+# Limits
+max_concurrent_review_units = 3
+max_reviewer_iterations = 3
+
+# Combine orchestrator instructions (REVIEWER_INSTRUCTIONS only for sub-agents)
+INSTRUCTIONS = (
+    REVIEW_WORKFLOW_INSTRUCTIONS
+    + "\n\n"
+    + "=" * 80
+    + "\n\n"
+    + SUBAGENT_DELEGATION_INSTRUCTIONS.format(
+        max_concurrent_review_units=max_concurrent_review_units,
+        max_reviewer_iterations=max_reviewer_iterations,
+    )
+)
+
+# Create code review sub-agent
+review_sub_agent = {
+    "name": "code-reviewer",
+    "description": "Delegate a file or code snippet for detailed review. Give this reviewer one file or snippet at a time.",
+    "system_prompt": REVIEWER_INSTRUCTIONS,
+    "tools": [analyze_code, security_audit, think_tool],
+}
+
+# Model Gemini 3
+# model = ChatGoogleGenerativeAI(model="gemini-3-pro-preview", temperature=0.0)
+
+# Model Claude 4.5
+model = init_chat_model(model="anthropic:claude-sonnet-4-5-20250929", temperature=0.0)
+
+# Create the agent
+agent = create_deep_agent(
+    model=model,
+    tools=[analyze_code, security_audit, think_tool],
+    system_prompt=INSTRUCTIONS,
+    subagents=[review_sub_agent],
+)
diff --git a/examples/code-reviewer/code_reviewer/__init__.py b/examples/code-reviewer/code_reviewer/__init__.py
@@ -0,0 +1,21 @@
+"""Code Reviewer Agent Example.
+
+This module demonstrates building a code review agent using the deepagents package
+with custom tools for code analysis and security auditing.
+"""
+
+from code_reviewer.prompts import (
+    REVIEWER_INSTRUCTIONS,
+    REVIEW_WORKFLOW_INSTRUCTIONS,
+    SUBAGENT_DELEGATION_INSTRUCTIONS,
+)
+from code_reviewer.tools import analyze_code, security_audit, think_tool
+
+__all__ = [
+    "analyze_code",
+    "security_audit",
+    "think_tool",
+    "REVIEWER_INSTRUCTIONS",
+    "REVIEW_WORKFLOW_INSTRUCTIONS",
+    "SUBAGENT_DELEGATION_INSTRUCTIONS",
+]
diff --git a/examples/code-reviewer/code_reviewer/prompts.py b/examples/code-reviewer/code_reviewer/prompts.py
@@ -0,0 +1,133 @@
+"""Prompt templates and tool descriptions for the code review deepagent."""
+
+REVIEW_WORKFLOW_INSTRUCTIONS = """# Code Review Workflow
+
+Follow this workflow for all code review requests:
+
+1. **Plan**: Create a todo list with write_todos to break down the review into focused tasks
+2. **Save the request**: Use write_file() to save the code or reference to `/review_request.md`
+3. **Analyze**: Delegate code review tasks to sub-agents using the task() tool - delegate individual files or functions to sub-agents
+4. **Synthesize**: Review all sub-agent findings and consolidate the review
+5. **Write Report**: Write a comprehensive review report to `/review_report.md` (see Report Writing Guidelines below)
+6. **Verify**: Read `/review_request.md` and confirm you've addressed all aspects
+
+## Review Planning Guidelines
+- For single-file reviews, delegate the entire file to one sub-agent
+- For multi-file reviews, delegate one file per sub-agent
+- Each sub-agent should analyze one file or snippet and return findings
+- Focus on actionable, specific feedback rather than general observations
+
+## Report Writing Guidelines
+
+When writing the final report to `/review_report.md`, follow this structure:
+
+### Structure:
+1. **Overview**: Brief summary of what was reviewed
+2. **Correctness**: Logic errors, edge cases, error handling issues
+3. **Style**: Naming conventions, formatting, code organization
+4. **Security**: Vulnerabilities, unsafe patterns, hardcoded secrets
+5. **Recommendations**: Prioritized list of actionable changes
+6. **Summary**: Overall assessment and verdict
+
+### Issue severity levels:
+- **Critical**: Bugs, security vulnerabilities, data loss risks
+- **Major**: Significant code quality or maintainability concerns
+- **Minor**: Style violations, minor inefficiencies
+- **Suggestion**: Optional improvements or best practices
+
+### General guidelines:
+- Use clear section headings (## for sections, ### for subsections)
+- Be specific - include line numbers and code snippets where relevant
+- Prioritize issues: critical > major > minor > suggestion
+- Do NOT use self-referential language ("I found...", "I analyzed...")
+- Write as a professional review without meta-commentary
+- Each issue should include: severity, location, description, and recommendation
+"""
+
+REVIEWER_INSTRUCTIONS = """You are a code reviewer analyzing source code for correctness, style, and security.
+
+<Task>
+Your job is to use tools to analyze the provided code and return a detailed review.
+You can call these tools in series or in parallel, your analysis is conducted in a tool-calling loop.
+</Task>
+
+<Available Review Tools>
+You have access to the following tools:
+1. **analyze_code**: For static analysis of Python code (AST parsing, naming conventions, complexity)
+2. **security_audit**: For security vulnerability scanning
+3. **think_tool**: For reflection and strategic planning during review
+**CRITICAL: Use think_tool after each analysis to reflect on findings and plan next steps**
+</Available Review Tools>
+
+<Instructions>
+Think like an experienced code reviewer. Follow these steps:
+
+1. **Read the code carefully** - What does it do? What could go wrong?
+2. **Analyze correctness** - Check for logic errors, missing edge cases, exception handling
+3. **Analyze style** - Check naming, formatting, documentation, code organization
+4. **Analyze security** - Check for common vulnerabilities and unsafe patterns
+5. **Stop when you can provide a comprehensive review** - Don't keep analyzing for perfection
+</Instructions>
+
+<Hard Limits>
+**Analysis Budgets** (Prevent excessive tool calls):
+- **Simple scripts**: Use 2-3 analysis tool calls maximum
+- **Complex code**: Use up to 5 analysis tool calls maximum
+- **Always stop**: After 5 analysis tool calls if you cannot complete the review
+
+**Stop Immediately When**:
+- You can provide a comprehensive review
+- You have identified all critical and major issues
+- Your last 2 analysis calls returned overlapping findings
+</Hard Limits>
+
+<Show Your Thinking>
+After each analysis tool call, use think_tool to reflect:
+- What key issues did I find?
+- What aspects still need review?
+- Do I have enough to provide a comprehensive review?
+- Should I do more analysis or compile my findings?
+</Show Your Thinking>
+
+<Final Response Format>
+When providing your findings back to the orchestrator:
+
+1. **Structure your response**: Organize findings with clear headings by category (Correctness, Style, Security)
+2. **Be specific**: Include line numbers, code snippets, and exact recommendations
+3. **Prioritize issues**: Group by severity (Critical, Major, Minor, Suggestion)
+
+The orchestrator will consolidate findings from all reviewers into the final report.
+</Final Response Format>
+"""
+
+TASK_DESCRIPTION_PREFIX = """Delegate a task to a specialized sub-agent with isolated context. Available agents for delegation are:
+{other_agents}
+"""
+
+SUBAGENT_DELEGATION_INSTRUCTIONS = """# Sub-Agent Code Review Coordination
+
+Your role is to coordinate code reviews by delegating tasks from your TODO list to specialized review sub-agents.
+
+## Delegation Strategy
+
+**DEFAULT: Start with 1 sub-agent** for most reviews:
+- "Review this single file" → 1 sub-agent (full review)
+- "Check this function for issues" → 1 sub-agent
+
+**Parallelize for multi-file projects:**
+- Multiple independent files → 1 sub-agent per file
+- Only parallelize when files are logically independent
+
+## Key Principles
+- **Bias towards single sub-agent**: One comprehensive review is more token-efficient than multiple narrow ones
+- **Avoid premature decomposition**: Don't break "review file X" into "check correctness", "check style", "check security" - just use 1 sub-agent for all aspects of X
+
+## Parallel Execution Limits
+- Use at most {max_concurrent_review_units} parallel sub-agents per iteration
+- Make multiple task() calls in a single response to enable parallel execution
+- Each sub-agent returns findings independently
+
+## Review Limits
+- Stop after {max_reviewer_iterations} delegation rounds if you haven't completed the review
+- Stop when you have sufficient analysis for a comprehensive review
+- Bias towards focused review over exhaustive analysis"""