A 5-step code review pipeline for AI coding assistants. Treats review as a state machine: three independent passes per cycle, three consecutive clean cycles required, any finding resets the counter. The minimum path to a commit is 9 static review passes plus a runtime smoke test.
AI coding assistants ship code that compiles, runs, and looks right. Single-pass review (Copilot, Cursor, CodeRabbit, etc.) catches the obvious defects but misses two failure modes:
- Author and reviewer collapse. When the same model writes and reviews the change, it inherits its own blind spots. code-forge runs three independent review perspectives (qodo, expert, adversarial) and treats their findings as untrusted claims that must be reproduced before any fix.
- Self-claimed completion. Hooks that gate on "I finished" markers are
bypassable by any agent that can write a string. code-forge gates on
actual state: a real
pre-commithook running the test suite, a mutation runner proving the tests catch regressions, and a coverage heuristic detecting drift across components.
pip install code-review-forge
code-forge install-skillThe first command installs the CLI (Python >=3.12). The second copies the
6 review skills into ~/.claude/skills/. Then in Claude Code, run the
full pipeline:
/code-forge
Or invoke individual passes:
/qodo-review # change-aware pre-review (Pass 1)
/code-review-expert # SOLID, architecture, security (Pass 2)
/adversarial-qe # red-team QE, 12 attack dimensions (Pass 3)
/kernel-fp-verify # false-positive verification (Step 3.5)
/smoke-test # runtime verification (Step 4)
Other agent targets:
code-forge install-skill --target vscode # <cwd>/.claude/skills/
code-forge install-skill --target universal # <cwd>/.agents/skills/
code-forge install-skill --dest /path/to/dir # explicit location
code-forge install-skill --skill code-forge # one skill only
code-forge install-skill --force # overwrite existingBy default, code-forge uses the claude CLI in your PATH with the session
model (no model pin). Three environment variables control the backend:
| Variable | Purpose | Default |
|---|---|---|
FORGE_BACKEND |
Select a named backend from backends.yaml |
session-default |
FORGE_OUTLET |
Force outlet: cli or inline |
auto-detected |
FORGE_LLM_MODEL |
Override model for CLI backends | claude-sonnet-4-6 |
Quick examples:
# Use the default (claude CLI, session model)
code-forge review
# Pin a specific model for this run
FORGE_LLM_MODEL=claude-opus-4-5 code-forge review
# Use a named API backend from backends.yaml
FORGE_BACKEND=claude-api code-forge review
# Force inline outlet (no CLI subprocess)
FORGE_OUTLET=inline code-forge reviewNamed backends (optional) are defined in ~/.config/code-forge/backends.yaml:
backends:
- name: claude-api
type: api
format: anthropic
base_url: https://api.anthropic.com
api_key_env: ANTHROPIC_API_KEY
default: true
- name: openai
type: api
format: openai
base_url: https://api.openai.com/v1
api_key_env: OPENAI_API_KEY
- name: local-claude
type: cli
model: claude-opus-4-5
command: claudeFull reference: docs/configuration.md
Editor setup guides:
- VS Code: docs/setup-vscode.md
- Cursor: docs/setup-cursor.md
- PyCharm: docs/setup-pycharm.md
Code Change
|
v
[Step 0] Syntax (0a) + Lint (0b) + Non-ASCII (0c)
|
v
[Cycle 1] Pass 1: qodo-review
Pass 2: code-review-expert
Pass 3: adversarial-qe
|
| zero findings -> counter += 1
| any finding -> fix, counter = 0, restart Cycle 1
v
[Cycle 2] (same 3 passes)
|
v
[Cycle 3] (same 3 passes)
| counter = 3
v
[Step 3.5] kernel-fp-verify (if fixes were applied during cycles)
|
v
[Step 4] smoke-test (runtime verification)
|
v
[COMMIT GATE] # post-review-c3
| Skill | Step | Purpose |
|---|---|---|
| code-forge | Orchestrator | Runs the full 5-step pipeline |
| qodo-review | Pass 1 | Change-aware pre-review with feature-grouped walkthrough |
| code-review-expert | Pass 2 | SOLID, architecture, security analysis |
| adversarial-qe | Pass 3 | Red-team QE with 12 attack dimensions |
| kernel-fp-verify | Step 3.5 | 10-step false-positive verification protocol |
| smoke-test | Step 4 | Runtime verification with bash assertion primitives |
- Multi-pass convergence. Three consecutive clean cycles from three independent perspectives. Any finding resets the counter to zero. Copilot, CodeRabbit, Cursor, and Devin are single-pass.
- Anti-hallucination gates. code-forge treats LLM review output as untrusted claims. Parser-deterministic findings auto-confirm; LLM findings require falsification before disposition; Step 4 runs the actual code. Prompt-only mitigations cap at 15% hallucination reduction; tool grounding reaches 65-80% (CodeAnt and Suprmind data, 2026).
- Real commit gate (R1). A real
.git/hooks/pre-committhat runs the test suite and blocks on NEW failures vs a baseline. Gates on diff content and test results, not a self-claimed marker. Closes the terminal-and-IDE bypass that PreToolUse hooks cannot reach. - Mutation-gated review (R2). Diff-scoped mutation runs after static review and before the verdict. Each mutant introduced into the changed code is run against the test suite; a surviving mutant flags tests that cannot catch the change. Toothless tests block the same cycle that finds the defect.
- Cross-component coverage heuristic (R3). Detects diffs that span multiple source areas with a changed function signature. An opt-in components mapping raises an uncertain finding when a hub and a dependent both change in the same diff and no integration test under the dependent's paths matches the configured test patterns.
- No cross-repo impact. code-forge reviews a single repository.
Multi-repo dependency analysis requires CodeRabbit-style tooling or
Chromium's
Cq-Depend. - No feedback learning. code-forge does not adapt to dismissed findings or developer preferences. Each review is independent.
- No long-term maintainability scoring. code-forge does not assess technical debt accumulation. SonarQube's tech-debt tracking is the closest automated approximation.
- No performance regression suite. No benchmark harness equivalent to
Rust's
perf.rust-lang.org. - R3 is artifact-presence, not coverage proof. The cross-component check confirms an integration test file exists under the expected path; it does not verify that the test exercises the specific code that changed. A present-but-stale test passes the gate.
Static review (3-cycle convergence) is one layer. code-forge learned from its own Phase 2 experience where 9 static passes and 639 mock tests missed 3 bugs that dynamic verification caught. Verification grounding (test suite + mutation + e2e coverage check) is the thesis -- not a passes count.
- Python 3.12 or newer
jqfor the bash smoke primitives- Claude Code or a compatible AI coding assistant for skill invocation
git clone https://github.com/HouMinXi/forge.git
cd forge
./install.shSymlinks each of the 6 skills from ~/.claude/skills/<name> to this
repo's skills/<name>. Hook installation is manual -- see
hooks/README.md and hooks/settings-snippet.json.
| Hook | Trigger | Purpose |
|---|---|---|
check_worktree.sh |
PreToolUse Edit/Write | Block edits in main worktree |
check_non_ascii.sh |
PreToolUse Write/Edit | Non-ASCII character detection |
check_read_before_edit.sh |
PreToolUse Edit | 1:1 read-before-edit ratio |
check_review_tracker.sh |
PostToolUse Bash | Review cycle state machine |
check_git_commit_review.sh |
PreToolUse Bash | Block unreviewed commits |
check_git_push_review.sh |
PreToolUse Bash | Block unreviewed pushes |
Some hooks contain environment-specific logic (Kerberos auth, pattern
matching) you will need to adapt. See hooks/README.md.
skills/smoke-test/test-library/shell/ ships 19 reusable bash assertion
functions with no dependencies beyond jq:
run_and_capture,run_concurrent,concurrent_waitassert_success,assert_failure,assert_exit_codeassert_output_contains,assert_output_not_containsassert_stderr_contains,assert_stderr_emptyassert_file_exists,assert_file_not_exists,assert_file_containsassert_json_validassert_no_zombie,assert_temp_cleanassert_no_command_exec,assert_no_command_exec_json,assert_no_path_traversal
A backward-compatible symlink at test-library/ points to
skills/smoke-test/test-library/ for users migrating from
bash-smoke-primitives.
evidence/cross-model-complementarity.md-- why 3 different review passesevidence/design-iterations.md-- how the pipeline evolvedevidence/ground-truth-verification.md-- why smoke tests must inject bugsevidence/shell-assertion-footguns.md-- 5 bash-specific trapsevidence/v9-model-coverage-matrix.md-- 4-model coverage datahooks/README.md-- hook installation and adaptation guide
Issues and discussion: https://github.com/HouMinXi/forge/issues.
Apache-2.0