docs(mcp): sharpen codedb_search / codedb_word / codedb_glob / codedb_ls descriptions by justrach · Pull Request #493 · justrach/codedb

justrach · 2026-05-21T09:11:59Z

Summary

The descriptions of these four MCP tools were overselling their overlap with each other and making them look duplicative when they're not:

Tool	What it actually does
`codedb_search`	Ranked top-K (BM25 + word-index hits), capped by `max_results`.
`codedb_word`	Exhaustive — every occurrence of one identifier, unranked, no cap.
`codedb_glob`	Pattern match across the whole tree, flat paths, no metadata.
`codedb_ls`	One directory's immediate children, with per-file metadata.

The prior descriptions made codedb_search and codedb_word sound interchangeable (they aren't — return shape differs), and made codedb_glob and codedb_ls sound interchangeable (they aren't — scope and metadata differ). A fresh agent reading tools/list would conclude codedb has redundancy that doesn't actually exist.

Each new description leads with the distinguishing axis (ranked vs exhaustive; whole-tree vs single-directory) and points explicitly at the sibling tool for the other shape, so an agent picking a tool can decide on shape, not on guess.

What changed

src/mcp.zig tools_list const: 4 description strings updated. No code paths touched.
No schema changes, no breaking changes for callers.

Why no failing test?

This is a docs-only edit to MCP tools/list strings (not a behavior change), so the CLAUDE.md "every issue must include a failing test" rule doesn't apply — there's no behavior to test. Full zig build test was run before/after the edit: same result both times (one pre-existing unrelated OOM in test.issue-44, 539/540 elsewhere).

Test plan

zig build test — diff isolated to the 4 description strings; pre-existing snapshot OOM unchanged.
Manual: codedb_status then call tools/list from an MCP client, eyeball the four descriptions for sense.

🤖 Generated with Claude Code

…_ls descriptions The four descriptions oversold overlap and made the tools look like duplicates of each other. They aren't: - codedb_search: ranked top-K (BM25 + word-index hits), capped by max_results. - codedb_word: exhaustive — every occurrence of one identifier, unranked, no cap. - codedb_glob: pattern match across the whole tree, flat paths, no metadata. - codedb_ls: ONE directory's immediate children, with per-file metadata. Each rewritten description now names the distinguishing axis up front (ranked vs exhaustive; whole-tree vs single-directory) and points explicitly at the sibling tool for the other shape. No code changes; only the tools_list JSON descriptions in src/mcp.zig. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 5ac4f32c0c

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-05-21T09:14:30Z

    \\{"name":"codedb_symbol","description":"Find where a named symbol is defined across the index. Returns file, line, and kind. Pass body=true for source. Pick this over codedb_search when you have an exact identifier.","inputSchema":{"type":"object","properties":{"name":{"type":"string","description":"Symbol name to search for (exact match)"},"body":{"type":"boolean","description":"Include source body for each symbol (default: false)"},"project":{"type":"string","description":"Optional absolute path to a different project (must have codedb.snapshot)"}},"required":["name"]}},
-    \\{"name":"codedb_search","description":"Substring full-text search across the index (regex if regex=true). For one identifier prefer codedb_word; for a definition prefer codedb_symbol. Scope with path_glob to filter by language.","inputSchema":{"type":"object","properties":{"query":{"type":"string","description":"Text to search for (substring match, or regex if regex=true)"},"max_results":{"type":"integer","description":"Maximum results to return (default: 20, raise to 50 for broad surveys)"},"scope":{"type":"boolean","description":"Annotate results with enclosing symbol scope (default: false)"},"compact":{"type":"boolean","description":"Skip comment and blank lines in results (default: false)"},"paths_only":{"type":"boolean","description":"Return path:line per result without the matching line text — ~50% fewer tokens per call, useful for broad surveys or for budget-conscious agents (default: false)"},"regex":{"type":"boolean","description":"Treat query as regex pattern (default: false)"},"path_glob":{"type":"string","description":"Filter results to paths matching this glob, e.g. '*.zig' or 'src/**/*.zig'. Bare patterns like '*.zig' are auto-promoted to '**/*.zig' to match nested files."},"project":{"type":"string","description":"Optional absolute path to a different project (must have codedb.snapshot)"}},"required":["query"]}},
-    \\{"name":"codedb_word","description":"Exact-identifier lookup via inverted index — every occurrence of one word, O(1). Use for single identifiers; use codedb_search for substrings or phrases.","inputSchema":{"type":"object","properties":{"word":{"type":"string","description":"Exact word/identifier to look up"},"project":{"type":"string","description":"Optional absolute path to a different project (must have codedb.snapshot)"}},"required":["word"]}},
+    \\{"name":"codedb_search","description":"Ranked full-text search across the index — returns the top max_results scored by BM25 + word-index hits. Use regex=true for patterns, scope=true to annotate with enclosing symbol, path_glob to filter by language. Different return shape from codedb_word: ranked + capped, not exhaustive. For definition sites see codedb_symbol; for filename matches see codedb_find.","inputSchema":{"type":"object","properties":{"query":{"type":"string","description":"Text to search for (substring match, or regex if regex=true)"},"max_results":{"type":"integer","description":"Maximum results to return (default: 20, raise to 50 for broad surveys)"},"scope":{"type":"boolean","description":"Annotate results with enclosing symbol scope (default: false)"},"compact":{"type":"boolean","description":"Skip comment and blank lines in results (default: false)"},"paths_only":{"type":"boolean","description":"Return path:line per result without the matching line text — ~50% fewer tokens per call, useful for broad surveys or for budget-conscious agents (default: false)"},"regex":{"type":"boolean","description":"Treat query as regex pattern (default: false)"},"path_glob":{"type":"string","description":"Filter results to paths matching this glob, e.g. '*.zig' or 'src/**/*.zig'. Bare patterns like '*.zig' are auto-promoted to '**/*.zig' to match nested files."},"project":{"type":"string","description":"Optional absolute path to a different project (must have codedb.snapshot)"}},"required":["query"]}},


Correct codedb_search scoring description

codedb_search now advertises that results are “scored by BM25 + word-index hits,” but the MCP path calls handleSearch → Explorer.searchContent/searchContentWithScope, and searchContent ranks via rerankSignalScore heuristics rather than the BM25 pipeline (searchContentRanked is a separate function and is not invoked here). This mismatch can mislead MCP clients/agents that depend on tools/list semantics when choosing retrieval strategies and evaluating ranking behavior.

Useful? React with 👍 / 👎.

github-actions · 2026-05-21T09:14:33Z

Benchmark Regression Report

Thresholds: 10.00% and 50,000 ns absolute delta

NOISE means the percentage threshold was exceeded, but the absolute delta was too small to fail CI.

Tool	Base (ns)	Head (ns)	Delta	Abs Delta (ns)	Status
`codedb_bundle`	571203	575556	+0.76%	+4353	OK
`codedb_changes`	66569	65267	-1.96%	-1302	OK
`codedb_deps`	11237	12197	+8.54%	+960	OK
`codedb_edit`	9132	7438	-18.55%	-1694	OK
`codedb_find`	71410	69366	-2.86%	-2044	OK
`codedb_hot`	116698	116027	-0.57%	-671	OK
`codedb_outline`	334046	339952	+1.77%	+5906	OK
`codedb_read`	111322	115892	+4.11%	+4570	OK
`codedb_search`	162518	167233	+2.90%	+4715	OK
`codedb_snapshot`	314783	311589	-1.01%	-3194	OK
`codedb_status`	14916	14899	-0.11%	-17	OK
`codedb_symbol`	65117	67705	+3.97%	+2588	OK
`codedb_tree`	64886	69423	+6.99%	+4537	OK
`codedb_word`	91858	93212	+1.47%	+1354	OK

chatgpt-codex-connector Bot reviewed May 21, 2026

View reviewed changes

justrach closed this May 25, 2026

justrach deleted the chore/sharpen-tool-descriptions branch May 25, 2026 16:30

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

docs(mcp): sharpen codedb_search / codedb_word / codedb_glob / codedb_ls descriptions#493

docs(mcp): sharpen codedb_search / codedb_word / codedb_glob / codedb_ls descriptions#493
justrach wants to merge 1 commit into
mainfrom
chore/sharpen-tool-descriptions

justrach commented May 21, 2026

Uh oh!

chatgpt-codex-connector Bot left a comment

Uh oh!

chatgpt-codex-connector Bot May 21, 2026

Uh oh!

github-actions Bot commented May 21, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

justrach commented May 21, 2026

Summary

What changed

Why no failing test?

Test plan

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector Bot May 21, 2026

Choose a reason for hiding this comment

Uh oh!

github-actions Bot commented May 21, 2026

Benchmark Regression Report

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant