Skip to content

Lin-Aurora/RepoContextTrace

Repository files navigation

RepoContextTrace

RepoContextTrace is a lightweight CLI and GitHub Action for repository-aware patch review. It indexes local project context, builds task-focused context packs, audits unified diffs at hunk level, and produces review-ready evidence reports.

Use it when a review needs to answer one concrete question:

Which repository files and rules support each changed hunk in this patch?

RepoContextTrace is deterministic and dependency-light. It does not require a hosted service, database, or network call at runtime.

Features

  • Repository-local indexing for policies, README files, docs, tests, configs, and source code.
  • Task-specific context_pack.json generation.
  • Hunk-level patch_evidence.json for unified diffs.
  • Deterministic risk tags such as unsupported-change, source-change-without-test, policy-touch, and api-contract-change.
  • Markdown reports for pull request review.
  • JSON Schema validation for generated artifacts.
  • Quality gates for CI.
  • Benchmark runners for controlled experiments and baseline comparison.

Install

python -m pip install -e .

For development and tests:

python -m pip install -e ".[test]"

Quick Start

Run inside a Git repository:

repoctx init
repoctx index
repoctx pack --task "Fix the billing timeout regression"
git diff --unified=80 > agent.patch
repoctx audit --diff agent.patch
repoctx report --format markdown

Generated artifacts are written to .repoctx/ and can be committed to CI logs or uploaded as workflow artifacts:

.repoctx/config.json
.repoctx/index.json
.repoctx/context_pack.json
.repoctx/patch_evidence.json
.repoctx/report.md

Validate and aggregate outputs:

repoctx validate .repoctx/context_pack.json
repoctx validate .repoctx/patch_evidence.json
repoctx aggregate .repoctx/patch_evidence.json --output .repoctx/metrics.csv

Review Output

repoctx audit writes hunk-level evidence with:

  • a short rationale for each changed hunk
  • supporting repository context identifiers
  • risk tags such as unsupported-change or source-change-without-test
  • suggested test commands or coverage checks
  • aggregate grounding and unsupported-change metrics

repoctx report renders the same information as Markdown for pull request review.

CLI Reference

repoctx init

Create .repoctx/config.json.

repoctx init --repo .

repoctx index

Index repository context.

repoctx index --repo .

repoctx pack

Build a task-specific context pack.

repoctx pack --task "Update billing behavior and regression tests"

repoctx audit

Audit a unified diff.

repoctx audit --diff agent.patch

Read a diff from stdin:

git diff --unified=80 | repoctx audit --diff -

Use quality gates:

repoctx audit \
  --diff agent.patch \
  --max-unsupported-rate 0.10 \
  --min-grounding-rate 0.80 \
  --fail-on-risk unsupported-change,policy-touch

repoctx report

Render a Markdown report.

repoctx report --format markdown --output .repoctx/report.md

repoctx validate

Validate generated JSON artifacts.

repoctx validate .repoctx/context_pack.json
repoctx validate .repoctx/patch_evidence.json

repoctx aggregate

Aggregate evidence files into experiment tables.

repoctx aggregate runs/*/patch_evidence.json --output runs/metrics.csv
repoctx aggregate runs/*/patch_evidence.json --output runs/metrics.json --format json

GitHub Action

Use the bundled composite action in pull requests:

name: RepoContextTrace

on:
  pull_request:

jobs:
  repoctx:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
        with:
          fetch-depth: 0
      - uses: actions/setup-python@v5
        with:
          python-version: "3.11"
      - uses: ./
        with:
          task: ${{ github.event.pull_request.title }}
          max-unsupported-rate: "0.20"
          min-grounding-rate: "0.60"
          fail-on-risk: "policy-touch"
          comment-on-pr: "true"
          github-token: ${{ secrets.GITHUB_TOKEN }}

Benchmark Runners

The experiment scripts are optional. They are useful for smoke tests, reproducibility checks, and comparing context conditions over normalized patch datasets.

Normalize public benchmark rows:

python experiments/fetch_benchmark_tasks.py --benchmark swebench --limit 500 --output-dir experiments/runs/benchmarks

Run a patch-only audit:

python experiments/run_benchmark_audit.py \
  --tasks experiments/runs/benchmarks/swebench/tasks.jsonl \
  --output-dir experiments/runs/benchmark-audit/swebench-source \
  --patch-only \
  --force

Run matched baselines:

python experiments/run_baseline_comparison.py \
  --tasks experiments/runs/benchmarks/swebench/tasks.jsonl \
  --output-dir experiments/runs/baseline-comparison/swebench \
  --baselines no-context,naive-rag,repocontexttrace \
  --patch-only \
  --force

Supported baseline modes:

  • no-context: audit with empty context artifacts.
  • naive-rag: audit with top lexical snippets only.
  • repocontexttrace: audit with repository-aware indexing and task-specific context packing.

For real checkouts, pass --repo-root-map repo_map.json or --checkout-dir ... --clone-missing and omit --patch-only.

Metrics

RepoContextTrace reports:

  • hunk_grounding_rate: fraction of hunks with supporting context.
  • unsupported_change_rate: fraction of hunks tagged unsupported.
  • context_pack_support_rate: fraction of hunks supported by context-pack units.
  • context_pack_waste_ratio: fraction of selected context units unused by audited hunks.
  • Per-risk-tag counts for aggregation and analysis.

When --apply-diff and --test-command are used, benchmark runners also write outcomes.csv with patch application, CI/test status, and optional review labels.

Repository Layout

src/repoctx/          Python package
schemas/              JSON Schemas
examples/demo-repo/   Minimal runnable demo
experiments/          Benchmark and baseline runners
docs/                 Research protocol and design notes
.github/workflows/    CI workflow
action.yml            Composite GitHub Action

Development

python -m pip install -e ".[test]"
python -m pytest -q
python -m compileall -q src tests scripts
python experiments/run_demo_experiment.py

On Windows, avoid recursively compiling experiments/runs/ after downloading large upstream repositories.

Current Scope

The current implementation uses transparent lexical scoring and deterministic heuristics. It is intended as a practical review aid and a reproducible baseline. Future work can add semantic retrieval or optional evidence scoring backends while preserving the JSON artifact contracts.

License

MIT. See LICENSE.

About

No description, website, or topics provided.

Resources

License

Contributing

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages