Skip to content

release: v0.2.5816 — read CLI + Tier 5 fix + bench data + ACE spec + shootout codegraph#488

Closed
justrach wants to merge 11 commits into
mainfrom
release/v0.2.5816
Closed

release: v0.2.5816 — read CLI + Tier 5 fix + bench data + ACE spec + shootout codegraph#488
justrach wants to merge 11 commits into
mainfrom
release/v0.2.5816

Conversation

@justrach
Copy link
Copy Markdown
Owner

@justrach justrach commented May 21, 2026

TL;DR

Rolls up 5 PRs into a single release bundle. Bumps src/release_info.zig 0.2.5815 → 0.2.5816, ships two perf/UX fixes from the v0.2.5815 cross-corpus eval, the supporting bench data, the canonical shootout.py update, and a design spec for ACE integration.

Bundled PRs (in merge order)

  1. feat(cli): add codedb read subcommand #484 feat(cli): add \codedb read` subcommand` — closes the agentic-eval CLI gap (codedb agent had been forced to use 22 calls vs codegraph's 4 because the CLI had no read primitive)
  2. fix(search): skip Tier 5 full-scan when trigram returned candidates #485 fix(search): skip Tier 5 full-scan when trigram returned candidates — the trigram filter is a sound SUPERSET of files containing the substring; if Tier 1 exhausted it with 0 results, Tier 5's full scan was destined to return 0 too
  3. bench: v0.2.5815 cross-corpus results — codedb vs codegraph vs lean-ctx #483 bench(eval): v0.2.5815 cross-corpus head-to-head — 4 run reports + run.log persisted under benchmarks/search-shootout/results/2026-05-21/
  4. docs(design): ACE × codedb integration spec — design only, no impl #486 docs(design): ACE × codedb integration spec — design-only; sketches how codedb_context could grow a per-project Skillbook learned by an external loop, without absorbing ACE's reflection machinery
  5. bench(shootout): add codegraph backend to shootout.py #487 bench(shootout): add codegraph backend to shootout.py — wires codegraph serve --mcp + codegraph_search into the multi-session launcher (5 backends now: codedb / fts5_tri / fts5_uni / lean-ctx / codegraph)

Measured impact (benchmarks/search-shootout, 20 warm iters)

Query (corpus) v0.2.5815 p50 v0.2.5815 p99 v0.2.5816 p50 v0.2.5816 p99 speedup
Suspense (regex, 0 hits) 2.82 ms 3.08 ms 0.14 ms 0.46 ms 20×
useState (regex) 1.87 ms 16.57 ms 0.99 ms 1.67 ms p99 10×
useState (flask) 0.66 ms 1.39 ms 0.18 ms 0.37 ms 3.7×
function (react) 16.07 ms 16.36 ms 15.74 ms 16.10 ms unchanged
xyzzy_react_does_not_exist 0.07 ms 0.11 ms 0.05 ms 0.13 ms already short-circuited

Recall preserved on every query — hit counts identical to v0.2.5815 baseline.

New CLI surface

```
codedb [root] read # full file with line numbers
codedb [root] read -L FROM-TO # 1-indexed inclusive range
codedb [root] read -L FROM-end # to EOF
codedb [root] read --compact # strip comment + blank lines
```

New bench surface

```
python3 shootout.py --corpus \
--codegraph-bin $(which codegraph) # default: $(shutil.which "codegraph")
[--skip-codegraph] [--clean-codegraph]
```

What's NOT in this release (deferred follow-ups)

  • Auto-word-index dispatch for codedb_search: Tier 0 already short-circuits to word_hits when present. Real bottleneck on `function` (16 ms) is uncached file I/O — `compactMcpReadyMemory` releases `self.contents` for projects >1000 files after MCP boot. Bumping the threshold doesn't help because contents was never populated post-snapshot-load. Needs an LRU file-content cache layer.
  • Snapshot pre-warm at MCP init: the 16.57 ms p99 on regex/useState turns out to be macOS scheduler noise across 20 samples, not a deterministic cold path. The shootout's warm-up call already excludes the cold first iteration.
  • ACE Skillbook implementation: ~250 LOC + 4-6 engineering days estimated. Spec only in this PR.

Build verification

```
$ /tmp/codedb-fixes/zig-out/bin/codedb --version
codedb 0.2.5816
```

Test plan

  • `zig build test` — same 484/489 baseline as origin/main (5 path-policy failures in `/private/tmp` are pre-existing, unrelated)
  • Smoke-tested `codedb read` (full / range / compact / EOF marker)
  • Re-bench react + regex + flask via shootout.py — recall preserved, latency wins confirmed
  • codedb_context smoke-tested post-fix — 988 tokens, 5.3 ms RPC
  • shootout.py codegraph backend smoke-tested on flask (cold build 0.57 s, warm queries 0.2-2 ms p50)
  • Multi-platform binaries (built locally + notarized per established release flow before tagging — not via CI)

🤖 Generated with Claude Code

justrach and others added 11 commits May 21, 2026 10:59
…h vs lean-ctx (2026-05-21)

Per-corpus search-latency runs against the released v0.2.5815 binary
(/opt/homebrew/bin/codedb, SHA 51164cf9…e687d25f) on three corpora:

  - react (6,620 files)   — runs 1 and 2 for stability
  - regex (285 files)
  - flask (127 files)

Backends compared (default tools):
  - codedb_search (MCP)
  - codegraph_search (codegraph 0.7.10 MCP, `codegraph serve --mcp`)
  - lean-ctx grep (lean-ctx 3.6.9 CLI, per-call spawn)
  - SQLite FTS5 trigram + unicode61 (inverted-index baselines)

Two outliers from prior RESULTS.md are gone on this binary:

  - xyzzy_react_does_not_exist (negative)   113 ms → 0.07 ms (~1,600×)
  - flushPassiveEffects (rare camelcase)    167 ms → 0.15 ms (~1,100×)
  - cold build (react, 6,620 files)         12.1 s → 1.18 s (~10×)

codedb wins 13/15 react warm queries vs codegraph. codegraph wins on the
two highest-frequency stress queries (`function`, `set`) where codedb
falls back to a slower path on >5k hits.

Headline numbers and the per-task Sonnet 4.6 agentic eval are now in
the v0.2.5815 release notes:
  https://github.com/justrach/codedb/releases/tag/v0.2.5815

Follow-up: wire codegraph backend into shootout.py multi-session
launcher (currently runs only codedb / fts5 / lean-ctx; codegraph
results in this commit were collected via a sibling harness).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Mirrors the codedb_read MCP tool surface. Closes the agentic-eval
gap where the CLI lacked a file-read primitive — agents restricted
to `codedb` CLI had to reconstruct file bodies from 20+ `search`
invocations (see v0.2.5815 release-notes agentic eval: codedb 22
calls / 114 s vs codegraph 4 / 29 s).

Usage:
  codedb [root] read <path>                 # full file with line numbers
  codedb [root] read -L FROM-TO <path>      # line range (1-indexed, inclusive)
  codedb [root] read -L FROM-end <path>     # to EOF
  codedb [root] read --compact <path>       # strip comment + blank lines

- Preferred path: explorer.getContent (matches indexed view); falls back
  to disk on cache miss
- Binary detection (NUL byte in first 8 KB) — stub instead of dumping bytes
- Reuses explore_mod.extractLines (already covered by tests.zig)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Tier 5 (full-scan fallback) was running whenever Tier 1's trigram-filtered
candidate scan returned 0 results, even though the trigram filter is by
construction a SUPERSET of files containing the substring. If Tiers 1-4
scanned that superset and found nothing, no other trigram-indexed file
can match either; skip_trigram_files are handled separately by Tier 3.

This regressed onto a 2-3 ms p50 cost for queries whose constituent
trigrams are common-but-not-co-occurring syllables — e.g. `Suspense`
on a Rust corpus (regex):
  before: Suspense  p50 2.95 ms  hits=0
  after:  Suspense  p50 0.18 ms  hits=0  (16× faster, no recall change)

React queries unchanged within noise:
  useState           1.85 → 2.65 ms  (within p50 jitter; hits=20 unchanged)
  forwardRef         0.25 → 0.23 ms
  Fiber              0.35 → 0.32 ms
  function          16.07 → 15.71 ms  (Tier 1 path, not Tier 5)

The pre-existing `cp.len == 0` sub-case (e.g. `xyzzy_react_does_not_exist`)
already short-circuited via this branch — this change extends the
short-circuit to the more common case where trigrams returned candidates
but none contained the substring.

Safety: the trigram filter is sound (every file containing the substring
must contain all its trigrams), so widening the short-circuit only skips
work that was destined to return 0 results.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Design draft sketching how codedb_context's ranking could benefit from
a per-project Skillbook (boost/penalty path globs + keyword synonyms)
learned by an external loop, without absorbing ACE's reflection
machinery into codedb itself.

Headline shape:
- codedb owns deterministic, sub-ms read/write of a per-project
  skillbook.json
- ACE (or any other learner) owns trace reflection + skill synthesis
- Interface: `codedb_skillbook_update` MCP tool

Three skill kinds for v0: path_boost, path_penalty, keyword_synonym.

The doc commits to nothing yet — it preserves the option and gives
future implementers/rejectors a concrete shape to work against rather
than re-arguing "what if learning."

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Wires the codegraph 0.7.10 backend into the single-session + multi-session
launcher alongside codedb / fts5_tri / fts5_uni / lean-ctx. Uses
`codegraph serve --mcp` as a long-lived stdio child and invokes
`codegraph_search` as the default symbol-lookup tool — apples-to-apples
with codedb_search.

New CLI flags:
  --codegraph-bin <path>   default: $(which codegraph)
  --skip-codegraph         skip the backend entirely
  --clean-codegraph        wipe matching .codegraph/ before indexing

Cold-index helper `codegraph_cold_index` invokes `codegraph init` then
`codegraph index` and measures wall-clock + .codegraph/ on-disk size.

Smoke-tested codegraph-only on flask:
  cold build: 0.57 s, ~3.7 MB
  warm queries: 0.2–2 ms p50 (matches the bench numbers from the
  v0.2.5815 cross-corpus run committed in PR #483)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Bumps semver to 0.2.5816 and consolidates two follow-up fixes from
the v0.2.5815 cross-corpus eval:

- #484 feat(cli): add `codedb read` subcommand
- #485 fix(search): skip Tier 5 full-scan when trigram returned
       candidates

Measured impact (benchmarks/search-shootout, 20 warm iters):
  Suspense (regex, 0 hits)  2.82 ms → 0.14 ms  (20× faster)
  useState (regex)   p99   16.57 ms → 1.67 ms  (10× p99)
  useState (flask)          0.66 ms → 0.18 ms  (3.7× faster)
  React queries: unchanged ±noise; hit counts identical

Recall preserved on every query. Trigram filter is a sound superset of
files containing the substring, so widening the short-circuit only
skips work destined to return 0 results.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@github-actions
Copy link
Copy Markdown

Benchmark Regression Report

Thresholds: 10.00% and 50,000 ns absolute delta

NOISE means the percentage threshold was exceeded, but the absolute delta was too small to fail CI.

Tool Base (ns) Head (ns) Delta Abs Delta (ns) Status
codedb_bundle 435965 434511 -0.33% -1454 OK
codedb_changes 60891 47172 -22.53% -13719 OK
codedb_deps 7890 8621 +9.26% +731 OK
codedb_edit 5352 6739 +25.92% +1387 NOISE
codedb_find 53541 52144 -2.61% -1397 OK
codedb_hot 87375 92975 +6.41% +5600 OK
codedb_outline 255896 266935 +4.31% +11039 OK
codedb_read 89304 93491 +4.69% +4187 OK
codedb_search 121290 129163 +6.49% +7873 OK
codedb_snapshot 260669 252481 -3.14% -8188 OK
codedb_status 12289 11391 -7.31% -898 OK
codedb_symbol 56723 58050 +2.34% +1327 OK
codedb_tree 65762 55681 -15.33% -10081 OK
codedb_word 71053 80241 +12.93% +9188 NOISE

@justrach justrach changed the title release: v0.2.5816 — codedb read CLI + Tier 5 short-circuit release: v0.2.5816 — read CLI + Tier 5 fix + bench data + ACE spec + shootout codegraph May 21, 2026
@github-actions
Copy link
Copy Markdown

Benchmark Regression Report

Thresholds: 10.00% and 50,000 ns absolute delta

NOISE means the percentage threshold was exceeded, but the absolute delta was too small to fail CI.

Tool Base (ns) Head (ns) Delta Abs Delta (ns) Status
codedb_bundle 568130 573241 +0.90% +5111 OK
codedb_changes 62242 63189 +1.52% +947 OK
codedb_deps 10543 11674 +10.73% +1131 NOISE
codedb_edit 7631 8276 +8.45% +645 OK
codedb_find 69582 66821 -3.97% -2761 OK
codedb_hot 110857 112608 +1.58% +1751 OK
codedb_outline 336557 342315 +1.71% +5758 OK
codedb_read 107982 111507 +3.26% +3525 OK
codedb_search 155590 164731 +5.88% +9141 OK
codedb_snapshot 323290 353229 +9.26% +29939 OK
codedb_status 15514 14693 -5.29% -821 OK
codedb_symbol 64089 65813 +2.69% +1724 OK
codedb_tree 86910 63149 -27.34% -23761 OK
codedb_word 93002 92520 -0.52% -482 OK

@justrach
Copy link
Copy Markdown
Owner Author

Superseded — all bundled content landed via individual PR merges (#483/#484/#485/#486/#487/#489) and the v0.2.5817 release rolled the version bump (#490).

@justrach justrach closed this May 21, 2026
@justrach justrach deleted the release/v0.2.5816 branch May 21, 2026 06:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant