Skip to content

docs(mcp): sharpen codedb_search / codedb_word / codedb_glob / codedb_ls descriptions#493

Closed
justrach wants to merge 1 commit into
mainfrom
chore/sharpen-tool-descriptions
Closed

docs(mcp): sharpen codedb_search / codedb_word / codedb_glob / codedb_ls descriptions#493
justrach wants to merge 1 commit into
mainfrom
chore/sharpen-tool-descriptions

Conversation

@justrach
Copy link
Copy Markdown
Owner

Summary

The descriptions of these four MCP tools were overselling their overlap with each other and making them look duplicative when they're not:

Tool What it actually does
codedb_search Ranked top-K (BM25 + word-index hits), capped by max_results.
codedb_word Exhaustive — every occurrence of one identifier, unranked, no cap.
codedb_glob Pattern match across the whole tree, flat paths, no metadata.
codedb_ls One directory's immediate children, with per-file metadata.

The prior descriptions made codedb_search and codedb_word sound interchangeable (they aren't — return shape differs), and made codedb_glob and codedb_ls sound interchangeable (they aren't — scope and metadata differ). A fresh agent reading tools/list would conclude codedb has redundancy that doesn't actually exist.

Each new description leads with the distinguishing axis (ranked vs exhaustive; whole-tree vs single-directory) and points explicitly at the sibling tool for the other shape, so an agent picking a tool can decide on shape, not on guess.

What changed

  • src/mcp.zig tools_list const: 4 description strings updated. No code paths touched.
  • No schema changes, no breaking changes for callers.

Why no failing test?

This is a docs-only edit to MCP tools/list strings (not a behavior change), so the CLAUDE.md "every issue must include a failing test" rule doesn't apply — there's no behavior to test. Full zig build test was run before/after the edit: same result both times (one pre-existing unrelated OOM in test.issue-44, 539/540 elsewhere).

Test plan

  • zig build test — diff isolated to the 4 description strings; pre-existing snapshot OOM unchanged.
  • Manual: codedb_status then call tools/list from an MCP client, eyeball the four descriptions for sense.

🤖 Generated with Claude Code

…_ls descriptions

The four descriptions oversold overlap and made the tools look like
duplicates of each other. They aren't:

  - codedb_search:  ranked top-K (BM25 + word-index hits), capped by max_results.
  - codedb_word:    exhaustive — every occurrence of one identifier, unranked, no cap.
  - codedb_glob:    pattern match across the whole tree, flat paths, no metadata.
  - codedb_ls:      ONE directory's immediate children, with per-file metadata.

Each rewritten description now names the distinguishing axis up front
(ranked vs exhaustive; whole-tree vs single-directory) and points
explicitly at the sibling tool for the other shape. No code changes;
only the tools_list JSON descriptions in src/mcp.zig.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 5ac4f32c0c

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread src/mcp.zig
\\{"name":"codedb_symbol","description":"Find where a named symbol is defined across the index. Returns file, line, and kind. Pass body=true for source. Pick this over codedb_search when you have an exact identifier.","inputSchema":{"type":"object","properties":{"name":{"type":"string","description":"Symbol name to search for (exact match)"},"body":{"type":"boolean","description":"Include source body for each symbol (default: false)"},"project":{"type":"string","description":"Optional absolute path to a different project (must have codedb.snapshot)"}},"required":["name"]}},
\\{"name":"codedb_search","description":"Substring full-text search across the index (regex if regex=true). For one identifier prefer codedb_word; for a definition prefer codedb_symbol. Scope with path_glob to filter by language.","inputSchema":{"type":"object","properties":{"query":{"type":"string","description":"Text to search for (substring match, or regex if regex=true)"},"max_results":{"type":"integer","description":"Maximum results to return (default: 20, raise to 50 for broad surveys)"},"scope":{"type":"boolean","description":"Annotate results with enclosing symbol scope (default: false)"},"compact":{"type":"boolean","description":"Skip comment and blank lines in results (default: false)"},"paths_only":{"type":"boolean","description":"Return path:line per result without the matching line text — ~50% fewer tokens per call, useful for broad surveys or for budget-conscious agents (default: false)"},"regex":{"type":"boolean","description":"Treat query as regex pattern (default: false)"},"path_glob":{"type":"string","description":"Filter results to paths matching this glob, e.g. '*.zig' or 'src/**/*.zig'. Bare patterns like '*.zig' are auto-promoted to '**/*.zig' to match nested files."},"project":{"type":"string","description":"Optional absolute path to a different project (must have codedb.snapshot)"}},"required":["query"]}},
\\{"name":"codedb_word","description":"Exact-identifier lookup via inverted index — every occurrence of one word, O(1). Use for single identifiers; use codedb_search for substrings or phrases.","inputSchema":{"type":"object","properties":{"word":{"type":"string","description":"Exact word/identifier to look up"},"project":{"type":"string","description":"Optional absolute path to a different project (must have codedb.snapshot)"}},"required":["word"]}},
\\{"name":"codedb_search","description":"Ranked full-text search across the index — returns the top max_results scored by BM25 + word-index hits. Use regex=true for patterns, scope=true to annotate with enclosing symbol, path_glob to filter by language. Different return shape from codedb_word: ranked + capped, not exhaustive. For definition sites see codedb_symbol; for filename matches see codedb_find.","inputSchema":{"type":"object","properties":{"query":{"type":"string","description":"Text to search for (substring match, or regex if regex=true)"},"max_results":{"type":"integer","description":"Maximum results to return (default: 20, raise to 50 for broad surveys)"},"scope":{"type":"boolean","description":"Annotate results with enclosing symbol scope (default: false)"},"compact":{"type":"boolean","description":"Skip comment and blank lines in results (default: false)"},"paths_only":{"type":"boolean","description":"Return path:line per result without the matching line text — ~50% fewer tokens per call, useful for broad surveys or for budget-conscious agents (default: false)"},"regex":{"type":"boolean","description":"Treat query as regex pattern (default: false)"},"path_glob":{"type":"string","description":"Filter results to paths matching this glob, e.g. '*.zig' or 'src/**/*.zig'. Bare patterns like '*.zig' are auto-promoted to '**/*.zig' to match nested files."},"project":{"type":"string","description":"Optional absolute path to a different project (must have codedb.snapshot)"}},"required":["query"]}},
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Correct codedb_search scoring description

codedb_search now advertises that results are “scored by BM25 + word-index hits,” but the MCP path calls handleSearchExplorer.searchContent/searchContentWithScope, and searchContent ranks via rerankSignalScore heuristics rather than the BM25 pipeline (searchContentRanked is a separate function and is not invoked here). This mismatch can mislead MCP clients/agents that depend on tools/list semantics when choosing retrieval strategies and evaluating ranking behavior.

Useful? React with 👍 / 👎.

@github-actions
Copy link
Copy Markdown

Benchmark Regression Report

Thresholds: 10.00% and 50,000 ns absolute delta

NOISE means the percentage threshold was exceeded, but the absolute delta was too small to fail CI.

Tool Base (ns) Head (ns) Delta Abs Delta (ns) Status
codedb_bundle 571203 575556 +0.76% +4353 OK
codedb_changes 66569 65267 -1.96% -1302 OK
codedb_deps 11237 12197 +8.54% +960 OK
codedb_edit 9132 7438 -18.55% -1694 OK
codedb_find 71410 69366 -2.86% -2044 OK
codedb_hot 116698 116027 -0.57% -671 OK
codedb_outline 334046 339952 +1.77% +5906 OK
codedb_read 111322 115892 +4.11% +4570 OK
codedb_search 162518 167233 +2.90% +4715 OK
codedb_snapshot 314783 311589 -1.01% -3194 OK
codedb_status 14916 14899 -0.11% -17 OK
codedb_symbol 65117 67705 +3.97% +2588 OK
codedb_tree 64886 69423 +6.99% +4537 OK
codedb_word 91858 93212 +1.47% +1354 OK

@justrach justrach closed this May 25, 2026
@justrach justrach deleted the chore/sharpen-tool-descriptions branch May 25, 2026 16:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant