Skip to content

bench(shootout): add codegraph backend to shootout.py#487

Merged
justrach merged 2 commits into
mainfrom
bench/codegraph-backend
May 21, 2026
Merged

bench(shootout): add codegraph backend to shootout.py#487
justrach merged 2 commits into
mainfrom
bench/codegraph-backend

Conversation

@justrach
Copy link
Copy Markdown
Owner

Summary

Wires the codegraph 0.7.10 backend into the upstream multi-session shootout.py so the next release can run the cross-corpus head-to-head from the canonical script (instead of the sibling harness used for PR #483's bench data).

New surface

```
--codegraph-bin default: $(which codegraph)
--skip-codegraph skip codegraph entirely
--clean-codegraph wipe matching .codegraph/ before indexing (forces cold build)
```

`codegraph serve --mcp` runs as a long-lived stdio child; queries call `codegraph_search` directly — the same way codedb_search is exercised. Multi-session launcher forwards the new flags to per-session subprocesses.

Smoke test

Codegraph-only on flask (5 iters):
```
[build] codegraph ...
0.57 s, ~3.7 MB
[query]
useState | 1.62/ 1.99/ 2.63/ 2.75 ms ( 0)
function | 1.30/ 1.32/ 1.48/ 1.48 ms ( 105)
set | 0.37/ 0.40/ 0.41/ 0.41 ms ( 70)
```

Numbers match what PR #483 collected via the sibling harness — so the integration is consistent.

What this does NOT change

  • No behavioral change to the codedb / fts5_* / lean-ctx paths
  • No new dependencies
  • `stats()` / `pct()` / multi-session aggregation work unchanged
  • shootout.py grows by ~135 LOC (CodegraphMCP class + 3 helpers + 3 args + 3 wire-up spots)

Test plan

  • `python3 -c 'import ast; ast.parse(open("shootout.py").read())'` clean
  • `--skip-codedb --skip-leanctx --skip-fts5` codegraph-only run completes on flask
  • Full 5-backend run on react/regex/flask (deferred to next release-bench)

🤖 Generated with Claude Code

Wires the codegraph 0.7.10 backend into the single-session + multi-session
launcher alongside codedb / fts5_tri / fts5_uni / lean-ctx. Uses
`codegraph serve --mcp` as a long-lived stdio child and invokes
`codegraph_search` as the default symbol-lookup tool — apples-to-apples
with codedb_search.

New CLI flags:
  --codegraph-bin <path>   default: $(which codegraph)
  --skip-codegraph         skip the backend entirely
  --clean-codegraph        wipe matching .codegraph/ before indexing

Cold-index helper `codegraph_cold_index` invokes `codegraph init` then
`codegraph index` and measures wall-clock + .codegraph/ on-disk size.

Smoke-tested codegraph-only on flask:
  cold build: 0.57 s, ~3.7 MB
  warm queries: 0.2–2 ms p50 (matches the bench numbers from the
  v0.2.5815 cross-corpus run committed in PR #483)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@github-actions
Copy link
Copy Markdown

Benchmark Regression Report

Thresholds: 10.00% and 50,000 ns absolute delta

NOISE means the percentage threshold was exceeded, but the absolute delta was too small to fail CI.

Tool Base (ns) Head (ns) Delta Abs Delta (ns) Status
codedb_bundle 309530 320637 +3.59% +11107 OK
codedb_changes 32283 32906 +1.93% +623 OK
codedb_deps 4938 5105 +3.38% +167 OK
codedb_edit 5795 5945 +2.59% +150 OK
codedb_find 42084 42455 +0.88% +371 OK
codedb_hot 56264 57451 +2.11% +1187 OK
codedb_outline 189139 199581 +5.52% +10442 OK
codedb_read 60575 70916 +17.07% +10341 NOISE
codedb_search 110047 115146 +4.63% +5099 OK
codedb_snapshot 206882 226621 +9.54% +19739 OK
codedb_status 10152 10879 +7.16% +727 OK
codedb_symbol 37214 37558 +0.92% +344 OK
codedb_tree 45147 45858 +1.57% +711 OK
codedb_word 50208 49009 -2.39% -1199 OK

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: e05a3b4210

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread benchmarks/search-shootout/shootout.py Outdated
HERE = Path(__file__).resolve().parent
REPO_ROOT = HERE.parent.parent
DEFAULT_CODEDB = REPO_ROOT / "zig-out/bin/codedb"
DEFAULT_CODEDB = shutil.which("codedb") or str(REPO_ROOT / "zig-out/bin/codedb")
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Keep codedb default pinned to repo build

Do not prefer PATH for the default codedb binary here: this makes shootout silently benchmark whichever codedb is installed globally instead of the repo’s zig-out/bin/codedb. On machines with an older/newer global install, codedb latency/build numbers can shift significantly and no longer reflect the commit under test, which undermines benchmark regression tracking.

Useful? React with 👍 / 👎.

Comment thread benchmarks/search-shootout/shootout.py Outdated
Comment on lines +496 to +497
if cg_dir.exists():
shutil.rmtree(cg_dir)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Honor --clean-codegraph before deleting index data

The .codegraph directory is deleted unconditionally, so --clean-codegraph is effectively ignored and every run is forced into a cold rebuild. This changes default benchmark behavior and can materially inflate codegraph build timings even when the caller did not opt into a clean run.

Useful? React with 👍 / 👎.

Addresses Codex P1+P2 review on PR #487:

- **P1** Pin DEFAULT_CODEDB to repo build (`REPO_ROOT/zig-out/bin/codedb`).
  Pre-fix used `shutil.which("codedb") or REPO_ROOT/...`, which made the
  shootout silently benchmark whichever `codedb` was installed in PATH
  (e.g. an older homebrew bottle) instead of the freshly-built repo
  binary the user expected.

- **P2** Honor --clean-codegraph. Pre-fix `codegraph_cold_index` wiped
  `.codegraph/` unconditionally, so the flag was a no-op and every
  run was forced cold. Now wipes only when `clean=True`, passed
  through from `args.clean_codegraph`.

Verified:
  --skip-codegraph     → no codegraph activity (unchanged)
  --clean-codegraph    → wipes + cold rebuild (now works)
  (no clean flag)      → reuses existing .codegraph/ for incremental

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@github-actions
Copy link
Copy Markdown

Benchmark Regression Report

Thresholds: 10.00% and 50,000 ns absolute delta

NOISE means the percentage threshold was exceeded, but the absolute delta was too small to fail CI.

Tool Base (ns) Head (ns) Delta Abs Delta (ns) Status
codedb_bundle 568502 550128 -3.23% -18374 OK
codedb_changes 61228 58911 -3.78% -2317 OK
codedb_deps 10134 9845 -2.85% -289 OK
codedb_edit 7554 7387 -2.21% -167 OK
codedb_find 70917 66322 -6.48% -4595 OK
codedb_hot 110881 109962 -0.83% -919 OK
codedb_outline 343432 338825 -1.34% -4607 OK
codedb_read 105943 106235 +0.28% +292 OK
codedb_search 161287 162253 +0.60% +966 OK
codedb_snapshot 316765 308629 -2.57% -8136 OK
codedb_status 15366 17470 +13.69% +2104 NOISE
codedb_symbol 69901 67961 -2.78% -1940 OK
codedb_tree 86753 87124 +0.43% +371 OK
codedb_word 95311 94487 -0.86% -824 OK

justrach added a commit that referenced this pull request May 21, 2026
Adds RESULTS-VS-MAIN.md comparing experiment+reader.md against the
released v0.2.5815 main-lineage binary. Same 3 tasks, fresh sub-agents.

Per-task deltas (experiment + reader.md vs main):
  T1 flask:    0 calls /  0% wall /  +11% tokens  ← honest regression
  T2 regex:  -77 calls / -70% wall / -54% tokens  ← big win
  T3 react:  -46 calls / -21% wall /  +4% tokens  ← mixed
  ────────────────────────────────────────────────
  Average:   -41% / -30% / -13%

9/9 correct, no quality regressions.

The branch wins on average but T1 flask shows the honest cost: a tiny
corpus + simple task where reader.md adds ~2 KB of overhead for no
call savings. Recommendation in the doc: reader.md is opt-in, not a
default — install only where you've measured it helping.

Beyond reader.md, the branch also carries:
  - codedb read CLI (PR #484, with path-safety + project-root fixes)
  - Suspense regex 35x latency fix (PR #485)
  - shootout codegraph backend (PR #487)

…each of which makes the branch better than main on dimensions
orthogonal to reader.md.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@justrach justrach merged commit d72066a into main May 21, 2026
1 check passed
justrach added a commit that referenced this pull request May 21, 2026
… security

Bumps semver to 0.2.5817. Bundles the v0.2.5816 perf+security release
(PRs #484, #485, #483, #486, #487) with the experiment/reader-md feature
that auto-prepends a hash-verified codebase map to codedb_context.

Highlights vs v0.2.5815:

  Performance (PR #485, deterministic microbenchmarks):
    Suspense regex p50:    2.82 ms → 0.18 ms  (15.6× faster)
    useState regex p99:   16.57 ms → 2.04 ms  (8.1× p99 reduction)

  CLI surface (PR #484):
    + codedb read <path> [-L FROM-TO] [--compact]
    + path-safety + sensitive-file guards
    + project-root anchoring (uses configured root, not cwd)

  codedb_context (NEW in 0.2.5817):
    + auto-prepends .codedb/reader.md when source_hash matches
    + inline ~6 lines of body for ≤3 symbol_definitions
    + new "## Callers" section pre-surfaces execution sites
    + skip-on-short-task gate (≤80 chars) to avoid overhead on narrow lookups

  reader.md security (this branch):
    + path-traversal blocked (no absolute / .. in source_files)
    + source_files capped at 20 (DoS guard)
    + loc_actual capped at 240 (body bloat guard)
    + golden blake2b roundtrip test

Eval (Sonnet 4.6, n=3 per task, vs v0.2.5815 main lineage):
  T1 flask median:   5 → 4  (-1)
  T2 regex median:  13 → 7  (-6)
  T3 react median:  13 → 10 (-3)

All 9 runs across the matrix returned correct answers. Branch wins on
median, mode, and best-case for every task.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@justrach justrach deleted the bench/codegraph-backend branch May 21, 2026 06:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant