Add blog post: Solving MCP Tool Overload#9695
Draft
digitarald wants to merge 1 commit intomainfrom
Draft
Conversation
ntrogh
reviewed
Apr 24, 2026
Collaborator
ntrogh
left a comment
There was a problem hiding this comment.
@digitarald Nice blog post! Left a few suggestions.
|
|
||
| April 24, 2026 by [Bhavya U](https://github.com/bhavyaus), [Connor Peet](https://github.com/connor4312), [Harald Kirschner](https://github.com/digitarald) | ||
|
|
||
| We just got back from [MCP Dev Summit](https://events.linuxfoundation.org/mcp-dev-summit-north-america/) in New York City, where we presented on [MCP Apps support in VS Code](https://code.visualstudio.com/blogs/2026/01/26/mcp-apps-support). One theme came up in nearly every session: too many tools. As the MCP ecosystem grows, servers and extensions each bring their own capabilities, and it adds up fast. Install a few [MCP servers](https://code.visualstudio.com/docs/copilot/customization/mcp-servers), enable some extensions, and your agent session suddenly has access to over 200 tools. Including all of their schemas in every request adds over 12,000 tokens on top of the base prompt, pushing a single turn past 18,000 tokens before you've even said "hello." |
Collaborator
There was a problem hiding this comment.
Suggested change
| We just got back from [MCP Dev Summit](https://events.linuxfoundation.org/mcp-dev-summit-north-america/) in New York City, where we presented on [MCP Apps support in VS Code](https://code.visualstudio.com/blogs/2026/01/26/mcp-apps-support). One theme came up in nearly every session: too many tools. As the MCP ecosystem grows, servers and extensions each bring their own capabilities, and it adds up fast. Install a few [MCP servers](https://code.visualstudio.com/docs/copilot/customization/mcp-servers), enable some extensions, and your agent session suddenly has access to over 200 tools. Including all of their schemas in every request adds over 12,000 tokens on top of the base prompt, pushing a single turn past 18,000 tokens before you've even said "hello." | |
| We just got back from [MCP Dev Summit](https://events.linuxfoundation.org/mcp-dev-summit-north-america/) in New York City, where we presented on [MCP Apps support in VS Code](https://code.visualstudio.com/blogs/2026/01/26/mcp-apps-support). One theme that consistently came up was that **there are too many tools**. As the MCP ecosystem grows, servers and extensions each bring their own capabilities, which adds up quickly. Install a few [MCP servers](https://code.visualstudio.com/docs/copilot/customization/mcp-servers), enable some extensions, and your agent session suddenly has access to over 200 tools. Including all of their schemas in every request adds over 12,000 tokens on top of the base prompt, pushing a single turn past 18,000 tokens before you've even said "hello." |
|
|
||
| We just got back from [MCP Dev Summit](https://events.linuxfoundation.org/mcp-dev-summit-north-america/) in New York City, where we presented on [MCP Apps support in VS Code](https://code.visualstudio.com/blogs/2026/01/26/mcp-apps-support). One theme came up in nearly every session: too many tools. As the MCP ecosystem grows, servers and extensions each bring their own capabilities, and it adds up fast. Install a few [MCP servers](https://code.visualstudio.com/docs/copilot/customization/mcp-servers), enable some extensions, and your agent session suddenly has access to over 200 tools. Including all of their schemas in every request adds over 12,000 tokens on top of the base prompt, pushing a single turn past 18,000 tokens before you've even said "hello." | ||
|
|
||
| That creates real problems. The model gets slower. It picks the wrong tool more often. And because the tool list changes between sessions (different MCP servers, different extensions), [prompt caching](https://learn.microsoft.com/en-us/azure/foundry/openai/how-to/prompt-caching), one of the biggest performance wins in production LLM systems, becomes unreliable. |
Collaborator
There was a problem hiding this comment.
Suggested change
| That creates real problems. The model gets slower. It picks the wrong tool more often. And because the tool list changes between sessions (different MCP servers, different extensions), [prompt caching](https://learn.microsoft.com/en-us/azure/foundry/openai/how-to/prompt-caching), one of the biggest performance wins in production LLM systems, becomes unreliable. | |
| That creates real problems: the **model gets slower and it picks the wrong tool more often**. And because the tool list changes between sessions (different MCP servers, different extensions), [prompt caching](https://learn.microsoft.com/en-us/azure/foundry/openai/how-to/prompt-caching), one of the biggest performance wins in production LLM systems, becomes unreliable. |
|
|
||
| That creates real problems. The model gets slower. It picks the wrong tool more often. And because the tool list changes between sessions (different MCP servers, different extensions), [prompt caching](https://learn.microsoft.com/en-us/azure/foundry/openai/how-to/prompt-caching), one of the biggest performance wins in production LLM systems, becomes unreliable. | ||
|
|
||
| We believe strongly that **the client should handle this**, not the user. Developers shouldn't have to think about which tools to enable, worry about token budgets, or pay penalties in performance and cost just because their ecosystem is rich. The hard parts of agentic optimization, including how tools are provided to the model, should be invisible. This post walks through how we built that in VS Code: progressive tool discovery. |
Collaborator
There was a problem hiding this comment.
Suggested change
| We believe strongly that **the client should handle this**, not the user. Developers shouldn't have to think about which tools to enable, worry about token budgets, or pay penalties in performance and cost just because their ecosystem is rich. The hard parts of agentic optimization, including how tools are provided to the model, should be invisible. This post walks through how we built that in VS Code: progressive tool discovery. | |
| We strongly believe that **the client should handle tool management**, not the user. Developers shouldn't have to think about which tools to enable, worry about token budgets, or pay penalties in performance and cost just because their ecosystem is rich. The hard parts of agentic optimization, including how tools are provided to the model, should be invisible. This post walks through how we built that into VS Code: **progressive tool discovery**. |
|
|
||
| ## A core toolkit and a search index | ||
|
|
||
| When we looked at telemetry, a clear pattern emerged. On any given turn, the agent uses two to five tools. The same ~30 tools cover nearly 88% of all invocations: reading files, editing code, searching the codebase, running terminal commands. The remaining 170+ tools are specialized: open a browser page, run a notebook cell, create a GitHub pull request, drag an element on a page. Important, but not on every turn. |
Collaborator
There was a problem hiding this comment.
Suggested change
| When we looked at telemetry, a clear pattern emerged. On any given turn, the agent uses two to five tools. The same ~30 tools cover nearly 88% of all invocations: reading files, editing code, searching the codebase, running terminal commands. The remaining 170+ tools are specialized: open a browser page, run a notebook cell, create a GitHub pull request, drag an element on a page. Important, but not on every turn. | |
| When we looked at telemetry, a clear pattern emerged. On any given turn, the agent typically uses two to five tools. And the same ~30 tools cover nearly 88% of all invocations: reading files, editing code, searching the codebase, running terminal commands. The remaining 170+ tools are only used for specialized scenarios: open a browser page, run a notebook cell, create a GitHub pull request, drag an element on a page. These tools are important, but not on every turn. |
|
|
||
| When we looked at telemetry, a clear pattern emerged. On any given turn, the agent uses two to five tools. The same ~30 tools cover nearly 88% of all invocations: reading files, editing code, searching the codebase, running terminal commands. The remaining 170+ tools are specialized: open a browser page, run a notebook cell, create a GitHub pull request, drag an element on a page. Important, but not on every turn. | ||
|
|
||
| This suggested a natural split. Keep a **core set** of tools always available, and make everything else **discoverable on demand**. |
Collaborator
There was a problem hiding this comment.
Suggested change
| This suggested a natural split. Keep a **core set** of tools always available, and make everything else **discoverable on demand**. | |
| The pattern pointed to a natural split. Keep a **core set** of tools always available, and make everything else **discoverable on demand**. |
|
|
||
| Here's where our approach diverges from what you might expect. Anthropic's API offers a built-in server-side tool search. You can include `tool_search_tool_regex` or `tool_search_tool_bm25` in your tools array, and the server handles search automatically using regex pattern matching or BM25 text search. | ||
|
|
||
| We actually shipped that first. In late 2025, we rolled out Anthropic's server-side search behind a feature flag, tested it via A/B experiment, and enabled it for Opus 4.5. It worked, but we ran into three issues: |
Collaborator
There was a problem hiding this comment.
Suggested change
| We actually shipped that first. In late 2025, we rolled out Anthropic's server-side search behind a feature flag, tested it via A/B experiment, and enabled it for Opus 4.5. It worked, but we ran into three issues: | |
| Initially, we used this approach. In late 2025, we rolled out Anthropic's server-side search behind a feature flag, tested it via A/B experiment, and enabled it for Opus 4.5. It worked, but we ran into three issues: |
| 2. **Dynamic tool discovery.** MCP servers support [dynamic tool discovery](https://modelcontextprotocol.io/specification/2025-06-18/server/tools#tool-discovery), where the set of available tools can change mid-session. Server-side search has no awareness of these changes, but a client-side implementation can re-index tools as they appear. | ||
| 3. **Enterprise compliance.** Anthropic's server-side tool search was not Zero Data Retention compliant, a blocker for enterprise customers. | ||
|
|
||
| So we pivoted. We replaced server-side search with a **client-side implementation** running entirely within VS Code, using **embedding-based semantic similarity**. When the model calls `tool_search`, VS Code computes an embedding of the query, compares it against pre-computed embeddings of every tool's name and description, and returns the closest matches by cosine similarity. |
Collaborator
There was a problem hiding this comment.
Suggested change
| So we pivoted. We replaced server-side search with a **client-side implementation** running entirely within VS Code, using **embedding-based semantic similarity**. When the model calls `tool_search`, VS Code computes an embedding of the query, compares it against pre-computed embeddings of every tool's name and description, and returns the closest matches by cosine similarity. | |
| So we pivoted. We replaced server-side search with a **client-side implementation** that runs entirely within VS Code and that uses **embedding-based semantic similarity**. When the model calls `tool_search`, VS Code computes an embedding of the query, compares it against pre-computed embeddings of every tool's name and description, and returns the closest matches by cosine similarity. |
|
|
||
| Embeddings beat an LLM at its own game, with a 24 percentage point improvement over defaults. And the client-side implementation needed fewer search invocations than the server-side regex approach to find the right tools in our evals. | ||
|
|
||
| The key insight: Anthropic's API doesn't care *who* does the search. It supports a "custom tool search" pattern where the client implements its own search tool and returns `tool_reference` content blocks, the same format the server-side search would produce. The API handles the rest, expanding those references into full tool schemas for the model. We get the best of both worlds: Anthropic's native deferred-tool infrastructure with our own search quality. |
Collaborator
There was a problem hiding this comment.
Suggested change
| The key insight: Anthropic's API doesn't care *who* does the search. It supports a "custom tool search" pattern where the client implements its own search tool and returns `tool_reference` content blocks, the same format the server-side search would produce. The API handles the rest, expanding those references into full tool schemas for the model. We get the best of both worlds: Anthropic's native deferred-tool infrastructure with our own search quality. | |
| The key insight was that Anthropic's API doesn't care *who* does the search. It supports a "custom tool search" pattern where the client implements its own search tool and returns `tool_reference` content blocks, the same format the server-side search would produce. The API handles the rest, expanding those references into full tool schemas for the model. As a result, we get the best of both worlds: Anthropic's native deferred-tool infrastructure with our own search quality. |
|
|
||
| ## Teaching the model the protocol | ||
|
|
||
| Progressive discovery only works if the model knows the rules. We invest significant prompt engineering in making this seamless: |
Collaborator
There was a problem hiding this comment.
Suggested change
| Progressive discovery only works if the model knows the rules. We invest significant prompt engineering in making this seamless: | |
| Progressive discovery only works if the model knows the rules. We invest significantly in prompt engineering to make this seamless for you: |
| - We explicitly warn against anti-patterns: don't call a deferred tool without searching first, don't re-search for tools already found. | ||
| - A reminder at the end of the prompt reinforces the protocol. Models follow instructions well, but critical behaviors deserve repetition. | ||
|
|
||
| We also learned the hard way that some tools can't be deferred. Early on, we deferred `view_image`. The model saw the name in a flat list but never its description, and it simply didn't think to search for an image-viewing tool when encountering `.png` files. Benchmark failures led us to a simple rule: tools that represent core capabilities the model might not think to look for need to stay always-available. Telemetry, not guesswork, drives which tools make the cut. |
Collaborator
There was a problem hiding this comment.
Suggested change
| We also learned the hard way that some tools can't be deferred. Early on, we deferred `view_image`. The model saw the name in a flat list but never its description, and it simply didn't think to search for an image-viewing tool when encountering `.png` files. Benchmark failures led us to a simple rule: tools that represent core capabilities the model might not think to look for need to stay always-available. Telemetry, not guesswork, drives which tools make the cut. | |
| We also learned the hard way that some tools can't be deferred. Early on, we deferred `view_image` and the model only saw the name in a flat list but never its description. So the model simply didn't think to search for an image-viewing tool when it encountered `.png` files. Benchmark failures led us to a simple rule: tools that represent core capabilities and that the model might not think to look for, need to stay always-available. Telemetry, not guesswork, drives which tools make the cut. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Engineering blog post on how VS Code handles MCP tool overload with progressive tool discovery.
defer_loading)tool_referenceand OpenAItool_searchintegrationTODOs before publish:
tool-discovery-hero.pngsocial image