RepoContextTrace is a lightweight CLI and GitHub Action for repository-aware patch review. It indexes local project context, builds task-focused context packs, audits unified diffs at hunk level, and produces review-ready evidence reports.
Use it when a review needs to answer one concrete question:
Which repository files and rules support each changed hunk in this patch?
RepoContextTrace is deterministic and dependency-light. It does not require a hosted service, database, or network call at runtime.
- Repository-local indexing for policies, README files, docs, tests, configs, and source code.
- Task-specific
context_pack.jsongeneration. - Hunk-level
patch_evidence.jsonfor unified diffs. - Deterministic risk tags such as
unsupported-change,source-change-without-test,policy-touch, andapi-contract-change. - Markdown reports for pull request review.
- JSON Schema validation for generated artifacts.
- Quality gates for CI.
- Benchmark runners for controlled experiments and baseline comparison.
python -m pip install -e .For development and tests:
python -m pip install -e ".[test]"Run inside a Git repository:
repoctx init
repoctx index
repoctx pack --task "Fix the billing timeout regression"
git diff --unified=80 > agent.patch
repoctx audit --diff agent.patch
repoctx report --format markdownGenerated artifacts are written to .repoctx/ and can be committed to CI logs or uploaded as workflow artifacts:
.repoctx/config.json
.repoctx/index.json
.repoctx/context_pack.json
.repoctx/patch_evidence.json
.repoctx/report.md
Validate and aggregate outputs:
repoctx validate .repoctx/context_pack.json
repoctx validate .repoctx/patch_evidence.json
repoctx aggregate .repoctx/patch_evidence.json --output .repoctx/metrics.csvrepoctx audit writes hunk-level evidence with:
- a short rationale for each changed hunk
- supporting repository context identifiers
- risk tags such as
unsupported-changeorsource-change-without-test - suggested test commands or coverage checks
- aggregate grounding and unsupported-change metrics
repoctx report renders the same information as Markdown for pull request review.
Create .repoctx/config.json.
repoctx init --repo .Index repository context.
repoctx index --repo .Build a task-specific context pack.
repoctx pack --task "Update billing behavior and regression tests"Audit a unified diff.
repoctx audit --diff agent.patchRead a diff from stdin:
git diff --unified=80 | repoctx audit --diff -Use quality gates:
repoctx audit \
--diff agent.patch \
--max-unsupported-rate 0.10 \
--min-grounding-rate 0.80 \
--fail-on-risk unsupported-change,policy-touchRender a Markdown report.
repoctx report --format markdown --output .repoctx/report.mdValidate generated JSON artifacts.
repoctx validate .repoctx/context_pack.json
repoctx validate .repoctx/patch_evidence.jsonAggregate evidence files into experiment tables.
repoctx aggregate runs/*/patch_evidence.json --output runs/metrics.csv
repoctx aggregate runs/*/patch_evidence.json --output runs/metrics.json --format jsonUse the bundled composite action in pull requests:
name: RepoContextTrace
on:
pull_request:
jobs:
repoctx:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
with:
fetch-depth: 0
- uses: actions/setup-python@v5
with:
python-version: "3.11"
- uses: ./
with:
task: ${{ github.event.pull_request.title }}
max-unsupported-rate: "0.20"
min-grounding-rate: "0.60"
fail-on-risk: "policy-touch"
comment-on-pr: "true"
github-token: ${{ secrets.GITHUB_TOKEN }}The experiment scripts are optional. They are useful for smoke tests, reproducibility checks, and comparing context conditions over normalized patch datasets.
Normalize public benchmark rows:
python experiments/fetch_benchmark_tasks.py --benchmark swebench --limit 500 --output-dir experiments/runs/benchmarksRun a patch-only audit:
python experiments/run_benchmark_audit.py \
--tasks experiments/runs/benchmarks/swebench/tasks.jsonl \
--output-dir experiments/runs/benchmark-audit/swebench-source \
--patch-only \
--forceRun matched baselines:
python experiments/run_baseline_comparison.py \
--tasks experiments/runs/benchmarks/swebench/tasks.jsonl \
--output-dir experiments/runs/baseline-comparison/swebench \
--baselines no-context,naive-rag,repocontexttrace \
--patch-only \
--forceSupported baseline modes:
no-context: audit with empty context artifacts.naive-rag: audit with top lexical snippets only.repocontexttrace: audit with repository-aware indexing and task-specific context packing.
For real checkouts, pass --repo-root-map repo_map.json or --checkout-dir ... --clone-missing and omit --patch-only.
RepoContextTrace reports:
hunk_grounding_rate: fraction of hunks with supporting context.unsupported_change_rate: fraction of hunks tagged unsupported.context_pack_support_rate: fraction of hunks supported by context-pack units.context_pack_waste_ratio: fraction of selected context units unused by audited hunks.- Per-risk-tag counts for aggregation and analysis.
When --apply-diff and --test-command are used, benchmark runners also write outcomes.csv with patch application, CI/test status, and optional review labels.
src/repoctx/ Python package
schemas/ JSON Schemas
examples/demo-repo/ Minimal runnable demo
experiments/ Benchmark and baseline runners
docs/ Research protocol and design notes
.github/workflows/ CI workflow
action.yml Composite GitHub Action
python -m pip install -e ".[test]"
python -m pytest -q
python -m compileall -q src tests scripts
python experiments/run_demo_experiment.pyOn Windows, avoid recursively compiling experiments/runs/ after downloading large upstream repositories.
The current implementation uses transparent lexical scoring and deterministic heuristics. It is intended as a practical review aid and a reproducible baseline. Future work can add semantic retrieval or optional evidence scoring backends while preserving the JSON artifact contracts.
MIT. See LICENSE.