perf: search + snapshot I/O optimizations by justrach · Pull Request #498 · justrach/codedb

justrach · 2026-05-24T15:36:38Z

Summary

Pre-lowercase query needle in searchInContent — eliminates redundant per-byte case conversion in matchAtCaseInsensitive inner loop
Reuse Tier 0 word_hits in Tier 4 instead of re-running word_index.search(query) (identical call, wasted work)
Pre-compute is_doc flag in Tier 0 sort — detectLanguage called once per file during collection, not O(n log n) times in sort comparator
Bulk freq table I/O — read/write 128KB frequency table in 1 syscall instead of 256 (both snapshot read paths + write path)
Add test_bench.zig with fuzzyScore and detectLanguage microbenchmarks (zig build test-bench -Doptimize=ReleaseFast)

All optimizations were verified against the actual code (not theoretical). The fuzzyScore pointer-swap was benchmarked and disproven — @memcpy is faster in ReleaseFast (362ms vs 670ms) because the compiler optimizes it to SIMD moves.

Test plan

zig build test — all 8 test binaries pass
zig build -Doptimize=ReleaseFast — clean build
zig build test-bench -Doptimize=ReleaseFast — benchmarks run
No accuracy regressions (same search results, same snapshot format)

🤖 Generated with Claude Code

@memcpy

- Pre-lowercase query needle in searchInContent — eliminates redundant per-byte case conversion in matchAtCaseInsensitive inner loop - Reuse Tier 0 word_hits in Tier 4 instead of re-running word_index.search() - Pre-compute is_doc language flag in Tier 0 sort — detectLanguage called once per file during collection, not O(n log n) times during sort - Bulk freq table I/O — read/write 128KB frequency table in 1 syscall instead of 256 (both loadSnapshotValidated and loadSnapshotFast paths, plus writeSnapshot) - Add test_bench.zig with fuzzyScore and detectLanguage microbenchmarks - Note: fuzzyScore pointer-swap was benchmarked and disproven — @memcpy is faster in ReleaseFast (362ms vs 670ms for 300K calls) because the compiler optimizes it to SIMD moves Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: d628a0756a

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-05-24T15:39:00Z

-    const first_lower: u8 = if (query[0] >= 'A' and query[0] <= 'Z') query[0] + 32 else query[0];
-    const first_upper: u8 = if (query[0] >= 'a' and query[0] <= 'z') query[0] - 32 else query[0];
+    var query_lower_buf: [4096]u8 = undefined;
+    if (query.len > query_lower_buf.len) return;


Remove fixed 4096-byte cutoff in content search

The new query_lower_buf guard returns immediately when query.len > 4096, which drops valid matches instead of searching. searchContent is used by CLI and server paths without a 4096-byte query cap, so long queries that previously worked (bounded only by content.len) now always return no results. This is a functional regression in search accuracy for long-input scenarios introduced by the optimization.

Useful? React with 👍 / 👎.

github-actions · 2026-05-24T15:39:08Z

Benchmark Regression Report

Thresholds: 10.00% and 50,000 ns absolute delta

NOISE means the percentage threshold was exceeded, but the absolute delta was too small to fail CI.

Tool	Base (ns)	Head (ns)	Delta	Abs Delta (ns)	Status
`codedb_bundle`	572188	573129	+0.16%	+941	OK
`codedb_changes`	60643	62076	+2.36%	+1433	OK
`codedb_deps`	10836	9574	-11.65%	-1262	OK
`codedb_edit`	43233	6866	-84.12%	-36367	OK
`codedb_find`	68753	66679	-3.02%	-2074	OK
`codedb_hot`	113049	106542	-5.76%	-6507	OK
`codedb_outline`	340806	334414	-1.88%	-6392	OK
`codedb_read`	113000	110502	-2.21%	-2498	OK
`codedb_search`	176391	162308	-7.98%	-14083	OK
`codedb_snapshot`	315288	308203	-2.25%	-7085	OK
`codedb_status`	14563	14645	+0.56%	+82	OK
`codedb_symbol`	65060	66422	+2.09%	+1362	OK
`codedb_tree`	88139	83430	-5.34%	-4709	OK
`codedb_word`	91011	92841	+2.01%	+1830	OK

chatgpt-codex-connector Bot reviewed May 24, 2026

View reviewed changes

justrach closed this May 25, 2026

justrach deleted the perf/search-and-snapshot-optimizations branch May 25, 2026 16:29

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

perf: search + snapshot I/O optimizations#498

perf: search + snapshot I/O optimizations#498
justrach wants to merge 1 commit into
release/v0.2.5818from
perf/search-and-snapshot-optimizations

justrach commented May 24, 2026

Uh oh!

chatgpt-codex-connector Bot left a comment

Uh oh!

chatgpt-codex-connector Bot May 24, 2026

Uh oh!

github-actions Bot commented May 24, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

justrach commented May 24, 2026

Summary

Test plan

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector Bot May 24, 2026

Choose a reason for hiding this comment

Uh oh!

github-actions Bot commented May 24, 2026

Benchmark Regression Report

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant