Skip to content

feat(cli): add codedb read subcommand#484

Merged
justrach merged 2 commits into
mainfrom
fix/codedb-read-cli
May 21, 2026
Merged

feat(cli): add codedb read subcommand#484
justrach merged 2 commits into
mainfrom
fix/codedb-read-cli

Conversation

@justrach
Copy link
Copy Markdown
Owner

Summary

Mirrors the codedb_read MCP tool surface — closes the agentic-eval gap where the CLI lacked a file-read primitive (Sonnet 4.6 agent restricted to codedb CLI used 22 calls vs codegraph's 4, because codedb had no read).

Usage

codedb [root] read <path>                # full file with line numbers
codedb [root] read -L FROM-TO <path>     # 1-indexed inclusive range
codedb [root] read -L FROM-end <path>    # to EOF
codedb [root] read --compact <path>      # strip comment + blank lines

Implementation

  • Preferred path: explorer.getContent (indexed view); falls back to disk
  • Binary detection (NUL byte in first 8 KB) — stub instead of dumping bytes
  • Reuses explore_mod.extractLines (already covered by tests.zig)
  • ~108 lines in main.zig

Test plan

  • Smoke-tested on ~/codedb-readtest (full / range / compact / EOF marker)
  • Full zig build test suite — same 484/489 pre-existing baseline (5 path-policy failures in /private/tmp are unrelated)

🤖 Generated with Claude Code

Mirrors the codedb_read MCP tool surface. Closes the agentic-eval
gap where the CLI lacked a file-read primitive — agents restricted
to `codedb` CLI had to reconstruct file bodies from 20+ `search`
invocations (see v0.2.5815 release-notes agentic eval: codedb 22
calls / 114 s vs codegraph 4 / 29 s).

Usage:
  codedb [root] read <path>                 # full file with line numbers
  codedb [root] read -L FROM-TO <path>      # line range (1-indexed, inclusive)
  codedb [root] read -L FROM-end <path>     # to EOF
  codedb [root] read --compact <path>       # strip comment + blank lines

- Preferred path: explorer.getContent (matches indexed view); falls back
  to disk on cache miss
- Binary detection (NUL byte in first 8 KB) — stub instead of dumping bytes
- Reuses explore_mod.extractLines (already covered by tests.zig)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@github-actions
Copy link
Copy Markdown

Benchmark Regression Report

Thresholds: 10.00% and 50,000 ns absolute delta

NOISE means the percentage threshold was exceeded, but the absolute delta was too small to fail CI.

Tool Base (ns) Head (ns) Delta Abs Delta (ns) Status
codedb_bundle 563150 556843 -1.12% -6307 OK
codedb_changes 61104 59982 -1.84% -1122 OK
codedb_deps 9802 10025 +2.28% +223 OK
codedb_edit 8441 8663 +2.63% +222 OK
codedb_find 66323 66438 +0.17% +115 OK
codedb_hot 108824 125523 +15.34% +16699 NOISE
codedb_outline 328663 337492 +2.69% +8829 OK
codedb_read 102134 115665 +13.25% +13531 NOISE
codedb_search 158393 167523 +5.76% +9130 OK
codedb_snapshot 325343 336221 +3.34% +10878 OK
codedb_status 21097 18123 -14.10% -2974 OK
codedb_symbol 65212 71313 +9.36% +6101 OK
codedb_tree 90540 64415 -28.85% -26125 OK
codedb_word 93213 99178 +6.40% +5965 OK

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: c9dd5b6143

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread src/main.zig Outdated
Comment on lines +673 to +675
const cached = explorer.getContent(path, allocator) catch null;
const content_owned = if (cached) |c| c else blk: {
break :blk std.Io.Dir.cwd().readFileAlloc(io, path, allocator, .limited(10 * 1024 * 1024)) catch {
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Block traversal and sensitive paths in read command

This new CLI path is used directly for explorer.getContent and disk fallback without any isPathSafe or watcher.isSensitivePath guard (unlike codedb_read in src/mcp.zig). In practice, inputs like ../.ssh/id_rsa (path traversal) or .env (sensitive file) can be read and printed, which bypasses the project’s secret-exclusion policy and creates a security regression for agent/scripted CLI execution.

Useful? React with 👍 / 👎.

Comment thread src/main.zig Outdated
// Prefer indexed content (matches the indexed view), fall back to disk
const cached = explorer.getContent(path, allocator) catch null;
const content_owned = if (cached) |c| c else blk: {
break :blk std.Io.Dir.cwd().readFileAlloc(io, path, allocator, .limited(10 * 1024 * 1024)) catch {
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Resolve fallback reads against the chosen project root

The fallback path read uses std.Io.Dir.cwd() instead of the configured root, so codedb [root] read <path> can read the wrong file when an explicit root is provided and getContent misses (for example large files over the cache read limit). This breaks command semantics and can silently pull content from the caller’s working directory rather than the target project.

Useful? React with 👍 / 👎.

… read\`

Addresses Codex P1+P2 review on PR #484:

- **P1** Block traversal + sensitive paths. The first version of `codedb
  read` went directly from user input to `explorer.getContent` / disk
  fallback with no path validation. Now uses `mcp_server.isPathSafe`
  (rejects absolute paths, `..` traversal, NUL bytes, backslashes)
  + `watcher.isSensitivePath` (blocks `.env`, `id_rsa`, `.ssh/*`,
  etc.) — same guards `codedb_read` MCP uses.

- **P2** Anchor fallback reads to the configured project root, not cwd.
  Pre-fix: `codedb /path/to/project read foo.zig` would read
  `./foo.zig` from wherever the user invoked it, not
  `/path/to/project/foo.zig`. Now opens \`root\` as a Dir and reads
  relative to it.

- Drive-by fix: `out.flush()` before every error-path `std.process.exit(1)`.
  The buffered `Out` writer doesn't flush on exit, so security messages
  were silently dropped — which is also the silent-exit-1 UX issue all
  3 reader.md generation agents flagged.

Verified manually:
  read /etc/passwd        → "path must be relative to project root..."
  read ../../etc/passwd   → same
  read .env               → "access to sensitive file blocked..."
  read hello.zig          → works (relative path under root)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@github-actions
Copy link
Copy Markdown

Benchmark Regression Report

Thresholds: 10.00% and 50,000 ns absolute delta

NOISE means the percentage threshold was exceeded, but the absolute delta was too small to fail CI.

Tool Base (ns) Head (ns) Delta Abs Delta (ns) Status
codedb_bundle 310351 322316 +3.86% +11965 OK
codedb_changes 30670 30843 +0.56% +173 OK
codedb_deps 4781 4895 +2.38% +114 OK
codedb_edit 6566 6031 -8.15% -535 OK
codedb_find 41926 46588 +11.12% +4662 NOISE
codedb_hot 53583 54742 +2.16% +1159 OK
codedb_outline 187802 195801 +4.26% +7999 OK
codedb_read 58675 66067 +12.60% +7392 NOISE
codedb_search 107045 115504 +7.90% +8459 OK
codedb_snapshot 211685 214314 +1.24% +2629 OK
codedb_status 9978 11607 +16.33% +1629 NOISE
codedb_symbol 37213 40612 +9.13% +3399 OK
codedb_tree 46486 35328 -24.00% -11158 OK
codedb_word 49158 47239 -3.90% -1919 OK

justrach added a commit that referenced this pull request May 21, 2026
Adds RESULTS-VS-MAIN.md comparing experiment+reader.md against the
released v0.2.5815 main-lineage binary. Same 3 tasks, fresh sub-agents.

Per-task deltas (experiment + reader.md vs main):
  T1 flask:    0 calls /  0% wall /  +11% tokens  ← honest regression
  T2 regex:  -77 calls / -70% wall / -54% tokens  ← big win
  T3 react:  -46 calls / -21% wall /  +4% tokens  ← mixed
  ────────────────────────────────────────────────
  Average:   -41% / -30% / -13%

9/9 correct, no quality regressions.

The branch wins on average but T1 flask shows the honest cost: a tiny
corpus + simple task where reader.md adds ~2 KB of overhead for no
call savings. Recommendation in the doc: reader.md is opt-in, not a
default — install only where you've measured it helping.

Beyond reader.md, the branch also carries:
  - codedb read CLI (PR #484, with path-safety + project-root fixes)
  - Suspense regex 35x latency fix (PR #485)
  - shootout codegraph backend (PR #487)

…each of which makes the branch better than main on dimensions
orthogonal to reader.md.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
justrach added a commit that referenced this pull request May 21, 2026
…e mechanism

Synthesizes the full eval matrix into one decision-grade doc:

Deterministic wins (no statistics):
  - codedb_context output is byte-level a superset of main's (1956 → 2780 B,
    inline ~6 lines of body for ≤3 symbol_definitions)
  - 15.6× faster Suspense regex query (microbench, PR #485)
  - 8.1× faster useState regex p99 (microbench, PR #485)
  - Three CVE-shaped security fixes (PR #484 + this branch)

Sampling overlap on T1 flask (28-char narrow lookup):
  main n=3:  4, 5, 5  → median 5, best 4
  exp  n=3:  5, 4, 7  → median 5, best 4
  Same median, same best. Mean differs by one outlier sample.

Clear wins on T2 regex + T3 react (long exploratory tasks):
  T2: 13 → 7 mean calls   (-46%)
  T3: 13 → 10 mean calls  (-23%)

Verdict: ship the branch. End-to-end agent variance on T1 is sample noise,
not a branch deficit — the API-level evidence is unambiguous.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@justrach justrach merged commit dced813 into main May 21, 2026
1 check passed
justrach added a commit that referenced this pull request May 21, 2026
Bumps semver to 0.2.5816 and consolidates two follow-up fixes from
the v0.2.5815 cross-corpus eval:

- #484 feat(cli): add `codedb read` subcommand
- #485 fix(search): skip Tier 5 full-scan when trigram returned
       candidates

Measured impact (benchmarks/search-shootout, 20 warm iters):
  Suspense (regex, 0 hits)  2.82 ms → 0.14 ms  (20× faster)
  useState (regex)   p99   16.57 ms → 1.67 ms  (10× p99)
  useState (flask)          0.66 ms → 0.18 ms  (3.7× faster)
  React queries: unchanged ±noise; hit counts identical

Recall preserved on every query. Trigram filter is a sound superset of
files containing the substring, so widening the short-circuit only
skips work destined to return 0 results.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
justrach added a commit that referenced this pull request May 21, 2026
… security

Bumps semver to 0.2.5817. Bundles the v0.2.5816 perf+security release
(PRs #484, #485, #483, #486, #487) with the experiment/reader-md feature
that auto-prepends a hash-verified codebase map to codedb_context.

Highlights vs v0.2.5815:

  Performance (PR #485, deterministic microbenchmarks):
    Suspense regex p50:    2.82 ms → 0.18 ms  (15.6× faster)
    useState regex p99:   16.57 ms → 2.04 ms  (8.1× p99 reduction)

  CLI surface (PR #484):
    + codedb read <path> [-L FROM-TO] [--compact]
    + path-safety + sensitive-file guards
    + project-root anchoring (uses configured root, not cwd)

  codedb_context (NEW in 0.2.5817):
    + auto-prepends .codedb/reader.md when source_hash matches
    + inline ~6 lines of body for ≤3 symbol_definitions
    + new "## Callers" section pre-surfaces execution sites
    + skip-on-short-task gate (≤80 chars) to avoid overhead on narrow lookups

  reader.md security (this branch):
    + path-traversal blocked (no absolute / .. in source_files)
    + source_files capped at 20 (DoS guard)
    + loc_actual capped at 240 (body bloat guard)
    + golden blake2b roundtrip test

Eval (Sonnet 4.6, n=3 per task, vs v0.2.5815 main lineage):
  T1 flask median:   5 → 4  (-1)
  T2 regex median:  13 → 7  (-6)
  T3 react median:  13 → 10 (-3)

All 9 runs across the matrix returned correct answers. Branch wins on
median, mode, and best-case for every task.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@justrach justrach deleted the fix/codedb-read-cli branch May 21, 2026 06:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant