docs(mcp): sharpen codedb_search / codedb_word / codedb_glob / codedb_ls descriptions#493
docs(mcp): sharpen codedb_search / codedb_word / codedb_glob / codedb_ls descriptions#493justrach wants to merge 1 commit into
Conversation
…_ls descriptions The four descriptions oversold overlap and made the tools look like duplicates of each other. They aren't: - codedb_search: ranked top-K (BM25 + word-index hits), capped by max_results. - codedb_word: exhaustive — every occurrence of one identifier, unranked, no cap. - codedb_glob: pattern match across the whole tree, flat paths, no metadata. - codedb_ls: ONE directory's immediate children, with per-file metadata. Each rewritten description now names the distinguishing axis up front (ranked vs exhaustive; whole-tree vs single-directory) and points explicitly at the sibling tool for the other shape. No code changes; only the tools_list JSON descriptions in src/mcp.zig. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 5ac4f32c0c
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| \\{"name":"codedb_symbol","description":"Find where a named symbol is defined across the index. Returns file, line, and kind. Pass body=true for source. Pick this over codedb_search when you have an exact identifier.","inputSchema":{"type":"object","properties":{"name":{"type":"string","description":"Symbol name to search for (exact match)"},"body":{"type":"boolean","description":"Include source body for each symbol (default: false)"},"project":{"type":"string","description":"Optional absolute path to a different project (must have codedb.snapshot)"}},"required":["name"]}}, | ||
| \\{"name":"codedb_search","description":"Substring full-text search across the index (regex if regex=true). For one identifier prefer codedb_word; for a definition prefer codedb_symbol. Scope with path_glob to filter by language.","inputSchema":{"type":"object","properties":{"query":{"type":"string","description":"Text to search for (substring match, or regex if regex=true)"},"max_results":{"type":"integer","description":"Maximum results to return (default: 20, raise to 50 for broad surveys)"},"scope":{"type":"boolean","description":"Annotate results with enclosing symbol scope (default: false)"},"compact":{"type":"boolean","description":"Skip comment and blank lines in results (default: false)"},"paths_only":{"type":"boolean","description":"Return path:line per result without the matching line text — ~50% fewer tokens per call, useful for broad surveys or for budget-conscious agents (default: false)"},"regex":{"type":"boolean","description":"Treat query as regex pattern (default: false)"},"path_glob":{"type":"string","description":"Filter results to paths matching this glob, e.g. '*.zig' or 'src/**/*.zig'. Bare patterns like '*.zig' are auto-promoted to '**/*.zig' to match nested files."},"project":{"type":"string","description":"Optional absolute path to a different project (must have codedb.snapshot)"}},"required":["query"]}}, | ||
| \\{"name":"codedb_word","description":"Exact-identifier lookup via inverted index — every occurrence of one word, O(1). Use for single identifiers; use codedb_search for substrings or phrases.","inputSchema":{"type":"object","properties":{"word":{"type":"string","description":"Exact word/identifier to look up"},"project":{"type":"string","description":"Optional absolute path to a different project (must have codedb.snapshot)"}},"required":["word"]}}, | ||
| \\{"name":"codedb_search","description":"Ranked full-text search across the index — returns the top max_results scored by BM25 + word-index hits. Use regex=true for patterns, scope=true to annotate with enclosing symbol, path_glob to filter by language. Different return shape from codedb_word: ranked + capped, not exhaustive. For definition sites see codedb_symbol; for filename matches see codedb_find.","inputSchema":{"type":"object","properties":{"query":{"type":"string","description":"Text to search for (substring match, or regex if regex=true)"},"max_results":{"type":"integer","description":"Maximum results to return (default: 20, raise to 50 for broad surveys)"},"scope":{"type":"boolean","description":"Annotate results with enclosing symbol scope (default: false)"},"compact":{"type":"boolean","description":"Skip comment and blank lines in results (default: false)"},"paths_only":{"type":"boolean","description":"Return path:line per result without the matching line text — ~50% fewer tokens per call, useful for broad surveys or for budget-conscious agents (default: false)"},"regex":{"type":"boolean","description":"Treat query as regex pattern (default: false)"},"path_glob":{"type":"string","description":"Filter results to paths matching this glob, e.g. '*.zig' or 'src/**/*.zig'. Bare patterns like '*.zig' are auto-promoted to '**/*.zig' to match nested files."},"project":{"type":"string","description":"Optional absolute path to a different project (must have codedb.snapshot)"}},"required":["query"]}}, |
There was a problem hiding this comment.
Correct codedb_search scoring description
codedb_search now advertises that results are “scored by BM25 + word-index hits,” but the MCP path calls handleSearch → Explorer.searchContent/searchContentWithScope, and searchContent ranks via rerankSignalScore heuristics rather than the BM25 pipeline (searchContentRanked is a separate function and is not invoked here). This mismatch can mislead MCP clients/agents that depend on tools/list semantics when choosing retrieval strategies and evaluating ranking behavior.
Useful? React with 👍 / 👎.
Benchmark Regression ReportThresholds: 10.00% and 50,000 ns absolute delta
|
Summary
The descriptions of these four MCP tools were overselling their overlap with each other and making them look duplicative when they're not:
codedb_searchmax_results.codedb_wordcodedb_globcodedb_lsThe prior descriptions made
codedb_searchandcodedb_wordsound interchangeable (they aren't — return shape differs), and madecodedb_globandcodedb_lssound interchangeable (they aren't — scope and metadata differ). A fresh agent readingtools/listwould conclude codedb has redundancy that doesn't actually exist.Each new description leads with the distinguishing axis (ranked vs exhaustive; whole-tree vs single-directory) and points explicitly at the sibling tool for the other shape, so an agent picking a tool can decide on shape, not on guess.
What changed
src/mcp.zigtools_listconst: 4 description strings updated. No code paths touched.Why no failing test?
This is a docs-only edit to MCP
tools/liststrings (not a behavior change), so the CLAUDE.md "every issue must include a failing test" rule doesn't apply — there's no behavior to test. Fullzig build testwas run before/after the edit: same result both times (one pre-existing unrelated OOM intest.issue-44, 539/540 elsewhere).Test plan
zig build test— diff isolated to the 4 description strings; pre-existing snapshot OOM unchanged.codedb_statusthen calltools/listfrom an MCP client, eyeball the four descriptions for sense.🤖 Generated with Claude Code