bench(shootout): add codegraph backend to shootout.py by justrach · Pull Request #487 · justrach/codedb

justrach · 2026-05-21T04:23:57Z

Summary

Wires the codegraph 0.7.10 backend into the upstream multi-session shootout.py so the next release can run the cross-corpus head-to-head from the canonical script (instead of the sibling harness used for PR #483's bench data).

New surface

```
--codegraph-bin default: $(which codegraph)
--skip-codegraph skip codegraph entirely
--clean-codegraph wipe matching .codegraph/ before indexing (forces cold build)
```

`codegraph serve --mcp` runs as a long-lived stdio child; queries call `codegraph_search` directly — the same way codedb_search is exercised. Multi-session launcher forwards the new flags to per-session subprocesses.

Smoke test

Codegraph-only on flask (5 iters):
```
[build] codegraph ...
0.57 s, ~3.7 MB
[query]
useState | 1.62/ 1.99/ 2.63/ 2.75 ms ( 0)
function | 1.30/ 1.32/ 1.48/ 1.48 ms ( 105)
set | 0.37/ 0.40/ 0.41/ 0.41 ms ( 70)
```

Numbers match what PR #483 collected via the sibling harness — so the integration is consistent.

What this does NOT change

No behavioral change to the codedb / fts5_* / lean-ctx paths
No new dependencies
`stats()` / `pct()` / multi-session aggregation work unchanged
shootout.py grows by ~135 LOC (CodegraphMCP class + 3 helpers + 3 args + 3 wire-up spots)

Test plan

`python3 -c 'import ast; ast.parse(open("shootout.py").read())'` clean
`--skip-codedb --skip-leanctx --skip-fts5` codegraph-only run completes on flask
Full 5-backend run on react/regex/flask (deferred to next release-bench)

🤖 Generated with Claude Code

Wires the codegraph 0.7.10 backend into the single-session + multi-session launcher alongside codedb / fts5_tri / fts5_uni / lean-ctx. Uses `codegraph serve --mcp` as a long-lived stdio child and invokes `codegraph_search` as the default symbol-lookup tool — apples-to-apples with codedb_search. New CLI flags: --codegraph-bin <path> default: $(which codegraph) --skip-codegraph skip the backend entirely --clean-codegraph wipe matching .codegraph/ before indexing Cold-index helper `codegraph_cold_index` invokes `codegraph init` then `codegraph index` and measures wall-clock + .codegraph/ on-disk size. Smoke-tested codegraph-only on flask: cold build: 0.57 s, ~3.7 MB warm queries: 0.2–2 ms p50 (matches the bench numbers from the v0.2.5815 cross-corpus run committed in PR #483) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

github-actions · 2026-05-21T04:26:14Z

Benchmark Regression Report

Thresholds: 10.00% and 50,000 ns absolute delta

NOISE means the percentage threshold was exceeded, but the absolute delta was too small to fail CI.

Tool	Base (ns)	Head (ns)	Delta	Abs Delta (ns)	Status
`codedb_bundle`	309530	320637	+3.59%	+11107	OK
`codedb_changes`	32283	32906	+1.93%	+623	OK
`codedb_deps`	4938	5105	+3.38%	+167	OK
`codedb_edit`	5795	5945	+2.59%	+150	OK
`codedb_find`	42084	42455	+0.88%	+371	OK
`codedb_hot`	56264	57451	+2.11%	+1187	OK
`codedb_outline`	189139	199581	+5.52%	+10442	OK
`codedb_read`	60575	70916	+17.07%	+10341	NOISE
`codedb_search`	110047	115146	+4.63%	+5099	OK
`codedb_snapshot`	206882	226621	+9.54%	+19739	OK
`codedb_status`	10152	10879	+7.16%	+727	OK
`codedb_symbol`	37214	37558	+0.92%	+344	OK
`codedb_tree`	45147	45858	+1.57%	+711	OK
`codedb_word`	50208	49009	-2.39%	-1199	OK

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: e05a3b4210

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-05-21T04:27:02Z

 HERE = Path(__file__).resolve().parent
 REPO_ROOT = HERE.parent.parent
-DEFAULT_CODEDB = REPO_ROOT / "zig-out/bin/codedb"
+DEFAULT_CODEDB = shutil.which("codedb") or str(REPO_ROOT / "zig-out/bin/codedb")


Keep codedb default pinned to repo build

Do not prefer PATH for the default codedb binary here: this makes shootout silently benchmark whichever codedb is installed globally instead of the repo’s zig-out/bin/codedb. On machines with an older/newer global install, codedb latency/build numbers can shift significantly and no longer reflect the commit under test, which undermines benchmark regression tracking.

Useful? React with 👍 / 👎.

chatgpt-codex-connector · 2026-05-21T04:27:02Z

+    if cg_dir.exists():
+        shutil.rmtree(cg_dir)


Honor --clean-codegraph before deleting index data

The .codegraph directory is deleted unconditionally, so --clean-codegraph is effectively ignored and every run is forced into a cold rebuild. This changes default benchmark behavior and can materially inflate codegraph build timings even when the caller did not opt into a clean run.

Useful? React with 👍 / 👎.

Addresses Codex P1+P2 review on PR #487: - **P1** Pin DEFAULT_CODEDB to repo build (`REPO_ROOT/zig-out/bin/codedb`). Pre-fix used `shutil.which("codedb") or REPO_ROOT/...`, which made the shootout silently benchmark whichever `codedb` was installed in PATH (e.g. an older homebrew bottle) instead of the freshly-built repo binary the user expected. - **P2** Honor --clean-codegraph. Pre-fix `codegraph_cold_index` wiped `.codegraph/` unconditionally, so the flag was a no-op and every run was forced cold. Now wipes only when `clean=True`, passed through from `args.clean_codegraph`. Verified: --skip-codegraph → no codegraph activity (unchanged) --clean-codegraph → wipes + cold rebuild (now works) (no clean flag) → reuses existing .codegraph/ for incremental Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

github-actions · 2026-05-21T05:14:34Z

Benchmark Regression Report

Thresholds: 10.00% and 50,000 ns absolute delta

NOISE means the percentage threshold was exceeded, but the absolute delta was too small to fail CI.

Tool	Base (ns)	Head (ns)	Delta	Abs Delta (ns)	Status
`codedb_bundle`	568502	550128	-3.23%	-18374	OK
`codedb_changes`	61228	58911	-3.78%	-2317	OK
`codedb_deps`	10134	9845	-2.85%	-289	OK
`codedb_edit`	7554	7387	-2.21%	-167	OK
`codedb_find`	70917	66322	-6.48%	-4595	OK
`codedb_hot`	110881	109962	-0.83%	-919	OK
`codedb_outline`	343432	338825	-1.34%	-4607	OK
`codedb_read`	105943	106235	+0.28%	+292	OK
`codedb_search`	161287	162253	+0.60%	+966	OK
`codedb_snapshot`	316765	308629	-2.57%	-8136	OK
`codedb_status`	15366	17470	+13.69%	+2104	NOISE
`codedb_symbol`	69901	67961	-2.78%	-1940	OK
`codedb_tree`	86753	87124	+0.43%	+371	OK
`codedb_word`	95311	94487	-0.86%	-824	OK

Adds RESULTS-VS-MAIN.md comparing experiment+reader.md against the released v0.2.5815 main-lineage binary. Same 3 tasks, fresh sub-agents. Per-task deltas (experiment + reader.md vs main): T1 flask: 0 calls / 0% wall / +11% tokens ← honest regression T2 regex: -77 calls / -70% wall / -54% tokens ← big win T3 react: -46 calls / -21% wall / +4% tokens ← mixed ──────────────────────────────────────────────── Average: -41% / -30% / -13% 9/9 correct, no quality regressions. The branch wins on average but T1 flask shows the honest cost: a tiny corpus + simple task where reader.md adds ~2 KB of overhead for no call savings. Recommendation in the doc: reader.md is opt-in, not a default — install only where you've measured it helping. Beyond reader.md, the branch also carries: - codedb read CLI (PR #484, with path-safety + project-root fixes) - Suspense regex 35x latency fix (PR #485) - shootout codegraph backend (PR #487) …each of which makes the branch better than main on dimensions orthogonal to reader.md. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

… security Bumps semver to 0.2.5817. Bundles the v0.2.5816 perf+security release (PRs #484, #485, #483, #486, #487) with the experiment/reader-md feature that auto-prepends a hash-verified codebase map to codedb_context. Highlights vs v0.2.5815: Performance (PR #485, deterministic microbenchmarks): Suspense regex p50: 2.82 ms → 0.18 ms (15.6× faster) useState regex p99: 16.57 ms → 2.04 ms (8.1× p99 reduction) CLI surface (PR #484): + codedb read <path> [-L FROM-TO] [--compact] + path-safety + sensitive-file guards + project-root anchoring (uses configured root, not cwd) codedb_context (NEW in 0.2.5817): + auto-prepends .codedb/reader.md when source_hash matches + inline ~6 lines of body for ≤3 symbol_definitions + new "## Callers" section pre-surfaces execution sites + skip-on-short-task gate (≤80 chars) to avoid overhead on narrow lookups reader.md security (this branch): + path-traversal blocked (no absolute / .. in source_files) + source_files capped at 20 (DoS guard) + loc_actual capped at 240 (body bloat guard) + golden blake2b roundtrip test Eval (Sonnet 4.6, n=3 per task, vs v0.2.5815 main lineage): T1 flask median: 5 → 4 (-1) T2 regex median: 13 → 7 (-6) T3 react median: 13 → 10 (-3) All 9 runs across the matrix returned correct answers. Branch wins on median, mode, and best-case for every task. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

chatgpt-codex-connector Bot reviewed May 21, 2026

View reviewed changes

justrach added a commit that referenced this pull request May 21, 2026

Merge PR #487: codegraph backend in shootout.py

b21c5e7

justrach mentioned this pull request May 21, 2026

release: v0.2.5816 — read CLI + Tier 5 fix + bench data + ACE spec + shootout codegraph #488

Closed

6 tasks

justrach mentioned this pull request May 21, 2026

release: v0.2.5817 — reader.md auto-prepend + perf + security #490

Merged

7 tasks

justrach merged commit d72066a into main May 21, 2026
1 check passed

justrach deleted the bench/codegraph-backend branch May 21, 2026 06:31

justrach mentioned this pull request May 21, 2026

feat: bidirectional bridge to codegraff tools #241

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

bench(shootout): add codegraph backend to shootout.py#487

bench(shootout): add codegraph backend to shootout.py#487
justrach merged 2 commits into
mainfrom
bench/codegraph-backend

justrach commented May 21, 2026

Uh oh!

github-actions Bot commented May 21, 2026

Uh oh!

chatgpt-codex-connector Bot left a comment

Uh oh!

chatgpt-codex-connector Bot May 21, 2026

Uh oh!

chatgpt-codex-connector Bot May 21, 2026

Uh oh!

github-actions Bot commented May 21, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

justrach commented May 21, 2026

Summary

New surface

Smoke test

What this does NOT change

Test plan

Uh oh!

github-actions Bot commented May 21, 2026

Benchmark Regression Report

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector Bot May 21, 2026

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector Bot May 21, 2026

Choose a reason for hiding this comment

Uh oh!

github-actions Bot commented May 21, 2026

Benchmark Regression Report

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant