Add blog post: Solving MCP Tool Overload by digitarald · Pull Request #9695 · microsoft/vscode-docs

digitarald · 2026-04-23T20:32:56Z

Engineering blog post on how VS Code handles MCP tool overload with progressive tool discovery.

Explains the core/deferred tool split (~30 always-available, 170+ deferred with defer_loading)
Client-side embedding search replacing Anthropic server-side regex/BM25
Two-layer embedding cache (pre-computed + local persistent LRU)
Prompt caching optimization via stable tool prefix
Covers Anthropic tool_reference and OpenAI tool_search integration
Co-authored by Bhavya U, Connor Peet, Harald Kirschner

TODOs before publish:

Replace ASCII art diagrams with designed visuals
Create tool-discovery-hero.png social image
Verify MCP spec URL anchor

ntrogh

@digitarald Nice blog post! Left a few suggestions.

ntrogh · 2026-04-24T08:36:49Z

+
+April 24, 2026 by [Bhavya U](https://github.com/bhavyaus), [Connor Peet](https://github.com/connor4312), [Harald Kirschner](https://github.com/digitarald)
+
+We just got back from [MCP Dev Summit](https://events.linuxfoundation.org/mcp-dev-summit-north-america/) in New York City, where we presented on [MCP Apps support in VS Code](https://code.visualstudio.com/blogs/2026/01/26/mcp-apps-support). One theme came up in nearly every session: too many tools. As the MCP ecosystem grows, servers and extensions each bring their own capabilities, and it adds up fast. Install a few [MCP servers](https://code.visualstudio.com/docs/copilot/customization/mcp-servers), enable some extensions, and your agent session suddenly has access to over 200 tools. Including all of their schemas in every request adds over 12,000 tokens on top of the base prompt, pushing a single turn past 18,000 tokens before you've even said "hello."


Suggested change

We just got back from [MCP Dev Summit](https://events.linuxfoundation.org/mcp-dev-summit-north-america/) in New York City, where we presented on [MCP Apps support in VS Code](https://code.visualstudio.com/blogs/2026/01/26/mcp-apps-support). One theme came up in nearly every session: too many tools. As the MCP ecosystem grows, servers and extensions each bring their own capabilities, and it adds up fast. Install a few [MCP servers](https://code.visualstudio.com/docs/copilot/customization/mcp-servers), enable some extensions, and your agent session suddenly has access to over 200 tools. Including all of their schemas in every request adds over 12,000 tokens on top of the base prompt, pushing a single turn past 18,000 tokens before you've even said "hello."

We just got back from [MCP Dev Summit](https://events.linuxfoundation.org/mcp-dev-summit-north-america/) in New York City, where we presented on [MCP Apps support in VS Code](https://code.visualstudio.com/blogs/2026/01/26/mcp-apps-support). One theme that consistently came up was that **there are too many tools**. As the MCP ecosystem grows, servers and extensions each bring their own capabilities, which adds up quickly. Install a few [MCP servers](https://code.visualstudio.com/docs/copilot/customization/mcp-servers), enable some extensions, and your agent session suddenly has access to over 200 tools. Including all of their schemas in every request adds over 12,000 tokens on top of the base prompt, pushing a single turn past 18,000 tokens before you've even said "hello."

ntrogh · 2026-04-24T08:40:12Z

+
+We just got back from [MCP Dev Summit](https://events.linuxfoundation.org/mcp-dev-summit-north-america/) in New York City, where we presented on [MCP Apps support in VS Code](https://code.visualstudio.com/blogs/2026/01/26/mcp-apps-support). One theme came up in nearly every session: too many tools. As the MCP ecosystem grows, servers and extensions each bring their own capabilities, and it adds up fast. Install a few [MCP servers](https://code.visualstudio.com/docs/copilot/customization/mcp-servers), enable some extensions, and your agent session suddenly has access to over 200 tools. Including all of their schemas in every request adds over 12,000 tokens on top of the base prompt, pushing a single turn past 18,000 tokens before you've even said "hello."
+
+That creates real problems. The model gets slower. It picks the wrong tool more often. And because the tool list changes between sessions (different MCP servers, different extensions), [prompt caching](https://learn.microsoft.com/en-us/azure/foundry/openai/how-to/prompt-caching), one of the biggest performance wins in production LLM systems, becomes unreliable.


Suggested change

That creates real problems. The model gets slower. It picks the wrong tool more often. And because the tool list changes between sessions (different MCP servers, different extensions), [prompt caching](https://learn.microsoft.com/en-us/azure/foundry/openai/how-to/prompt-caching), one of the biggest performance wins in production LLM systems, becomes unreliable.

That creates real problems: the **model gets slower and it picks the wrong tool more often**. And because the tool list changes between sessions (different MCP servers, different extensions), [prompt caching](https://learn.microsoft.com/en-us/azure/foundry/openai/how-to/prompt-caching), one of the biggest performance wins in production LLM systems, becomes unreliable.

ntrogh · 2026-04-24T08:42:28Z

+
+That creates real problems. The model gets slower. It picks the wrong tool more often. And because the tool list changes between sessions (different MCP servers, different extensions), [prompt caching](https://learn.microsoft.com/en-us/azure/foundry/openai/how-to/prompt-caching), one of the biggest performance wins in production LLM systems, becomes unreliable.
+
+We believe strongly that **the client should handle this**, not the user. Developers shouldn't have to think about which tools to enable, worry about token budgets, or pay penalties in performance and cost just because their ecosystem is rich. The hard parts of agentic optimization, including how tools are provided to the model, should be invisible. This post walks through how we built that in VS Code: progressive tool discovery.


Suggested change

We believe strongly that **the client should handle this**, not the user. Developers shouldn't have to think about which tools to enable, worry about token budgets, or pay penalties in performance and cost just because their ecosystem is rich. The hard parts of agentic optimization, including how tools are provided to the model, should be invisible. This post walks through how we built that in VS Code: progressive tool discovery.

We strongly believe that **the client should handle tool management**, not the user. Developers shouldn't have to think about which tools to enable, worry about token budgets, or pay penalties in performance and cost just because their ecosystem is rich. The hard parts of agentic optimization, including how tools are provided to the model, should be invisible. This post walks through how we built that into VS Code: **progressive tool discovery**.

ntrogh · 2026-04-24T08:45:55Z

+
+## A core toolkit and a search index
+
+When we looked at telemetry, a clear pattern emerged. On any given turn, the agent uses two to five tools. The same ~30 tools cover nearly 88% of all invocations: reading files, editing code, searching the codebase, running terminal commands. The remaining 170+ tools are specialized: open a browser page, run a notebook cell, create a GitHub pull request, drag an element on a page. Important, but not on every turn.


Suggested change

When we looked at telemetry, a clear pattern emerged. On any given turn, the agent uses two to five tools. The same ~30 tools cover nearly 88% of all invocations: reading files, editing code, searching the codebase, running terminal commands. The remaining 170+ tools are specialized: open a browser page, run a notebook cell, create a GitHub pull request, drag an element on a page. Important, but not on every turn.

When we looked at telemetry, a clear pattern emerged. On any given turn, the agent typically uses two to five tools. And the same ~30 tools cover nearly 88% of all invocations: reading files, editing code, searching the codebase, running terminal commands. The remaining 170+ tools are only used for specialized scenarios: open a browser page, run a notebook cell, create a GitHub pull request, drag an element on a page. These tools are important, but not on every turn.

ntrogh · 2026-04-24T08:57:11Z

+
+When we looked at telemetry, a clear pattern emerged. On any given turn, the agent uses two to five tools. The same ~30 tools cover nearly 88% of all invocations: reading files, editing code, searching the codebase, running terminal commands. The remaining 170+ tools are specialized: open a browser page, run a notebook cell, create a GitHub pull request, drag an element on a page. Important, but not on every turn.
+
+This suggested a natural split. Keep a **core set** of tools always available, and make everything else **discoverable on demand**.


Suggested change

This suggested a natural split. Keep a **core set** of tools always available, and make everything else **discoverable on demand**.

The pattern pointed to a natural split. Keep a **core set** of tools always available, and make everything else **discoverable on demand**.

ntrogh · 2026-04-24T09:05:30Z

+
+Here's where our approach diverges from what you might expect. Anthropic's API offers a built-in server-side tool search. You can include `tool_search_tool_regex` or `tool_search_tool_bm25` in your tools array, and the server handles search automatically using regex pattern matching or BM25 text search.
+
+We actually shipped that first. In late 2025, we rolled out Anthropic's server-side search behind a feature flag, tested it via A/B experiment, and enabled it for Opus 4.5. It worked, but we ran into three issues:


Suggested change

We actually shipped that first. In late 2025, we rolled out Anthropic's server-side search behind a feature flag, tested it via A/B experiment, and enabled it for Opus 4.5. It worked, but we ran into three issues:

Initially, we used this approach. In late 2025, we rolled out Anthropic's server-side search behind a feature flag, tested it via A/B experiment, and enabled it for Opus 4.5. It worked, but we ran into three issues:

ntrogh · 2026-04-24T09:07:54Z

+2. **Dynamic tool discovery.** MCP servers support [dynamic tool discovery](https://modelcontextprotocol.io/specification/2025-06-18/server/tools#tool-discovery), where the set of available tools can change mid-session. Server-side search has no awareness of these changes, but a client-side implementation can re-index tools as they appear.
+3. **Enterprise compliance.** Anthropic's server-side tool search was not Zero Data Retention compliant, a blocker for enterprise customers.
+
+So we pivoted. We replaced server-side search with a **client-side implementation** running entirely within VS Code, using **embedding-based semantic similarity**. When the model calls `tool_search`, VS Code computes an embedding of the query, compares it against pre-computed embeddings of every tool's name and description, and returns the closest matches by cosine similarity.


Suggested change

So we pivoted. We replaced server-side search with a **client-side implementation** running entirely within VS Code, using **embedding-based semantic similarity**. When the model calls `tool_search`, VS Code computes an embedding of the query, compares it against pre-computed embeddings of every tool's name and description, and returns the closest matches by cosine similarity.

So we pivoted. We replaced server-side search with a **client-side implementation** that runs entirely within VS Code and that uses **embedding-based semantic similarity**. When the model calls `tool_search`, VS Code computes an embedding of the query, compares it against pre-computed embeddings of every tool's name and description, and returns the closest matches by cosine similarity.

ntrogh · 2026-04-24T09:11:51Z

+
+Embeddings beat an LLM at its own game, with a 24 percentage point improvement over defaults. And the client-side implementation needed fewer search invocations than the server-side regex approach to find the right tools in our evals.
+
+The key insight: Anthropic's API doesn't care *who* does the search. It supports a "custom tool search" pattern where the client implements its own search tool and returns `tool_reference` content blocks, the same format the server-side search would produce. The API handles the rest, expanding those references into full tool schemas for the model. We get the best of both worlds: Anthropic's native deferred-tool infrastructure with our own search quality.


Suggested change

The key insight: Anthropic's API doesn't care *who* does the search. It supports a "custom tool search" pattern where the client implements its own search tool and returns `tool_reference` content blocks, the same format the server-side search would produce. The API handles the rest, expanding those references into full tool schemas for the model. We get the best of both worlds: Anthropic's native deferred-tool infrastructure with our own search quality.

The key insight was that Anthropic's API doesn't care *who* does the search. It supports a "custom tool search" pattern where the client implements its own search tool and returns `tool_reference` content blocks, the same format the server-side search would produce. The API handles the rest, expanding those references into full tool schemas for the model. As a result, we get the best of both worlds: Anthropic's native deferred-tool infrastructure with our own search quality.

ntrogh · 2026-04-24T09:15:04Z

+
+## Teaching the model the protocol
+
+Progressive discovery only works if the model knows the rules. We invest significant prompt engineering in making this seamless:


Suggested change

Progressive discovery only works if the model knows the rules. We invest significant prompt engineering in making this seamless:

Progressive discovery only works if the model knows the rules. We invest significantly in prompt engineering to make this seamless for you:

ntrogh · 2026-04-24T09:17:57Z

+- We explicitly warn against anti-patterns: don't call a deferred tool without searching first, don't re-search for tools already found.
+- A reminder at the end of the prompt reinforces the protocol. Models follow instructions well, but critical behaviors deserve repetition.
+
+We also learned the hard way that some tools can't be deferred. Early on, we deferred `view_image`. The model saw the name in a flat list but never its description, and it simply didn't think to search for an image-viewing tool when encountering `.png` files. Benchmark failures led us to a simple rule: tools that represent core capabilities the model might not think to look for need to stay always-available. Telemetry, not guesswork, drives which tools make the cut.


Suggested change

We also learned the hard way that some tools can't be deferred. Early on, we deferred `view_image`. The model saw the name in a flat list but never its description, and it simply didn't think to search for an image-viewing tool when encountering `.png` files. Benchmark failures led us to a simple rule: tools that represent core capabilities the model might not think to look for need to stay always-available. Telemetry, not guesswork, drives which tools make the cut.

We also learned the hard way that some tools can't be deferred. Early on, we deferred `view_image` and the model only saw the name in a flat list but never its description. So the model simply didn't think to search for an image-viewing tool when it encountered `.png` files. Benchmark failures led us to a simple rule: tools that represent core capabilities and that the model might not think to look for, need to stay always-available. Telemetry, not guesswork, drives which tools make the cut.

Add blog post: Solving MCP Tool Overload

8f21da2

digitarald requested review from bhavyaus, connor4312 and ntrogh April 23, 2026 20:39

ntrogh reviewed Apr 24, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add blog post: Solving MCP Tool Overload#9695

Add blog post: Solving MCP Tool Overload#9695
digitarald wants to merge 1 commit intomainfrom
users/digitarald/blog-mcp-tool-overload-v2

digitarald commented Apr 23, 2026

Uh oh!

ntrogh left a comment

Uh oh!

ntrogh Apr 24, 2026

Uh oh!

ntrogh Apr 24, 2026

Uh oh!

ntrogh Apr 24, 2026

Uh oh!

ntrogh Apr 24, 2026

Uh oh!

ntrogh Apr 24, 2026

Uh oh!

ntrogh Apr 24, 2026

Uh oh!

ntrogh Apr 24, 2026

Uh oh!

ntrogh Apr 24, 2026

Uh oh!

ntrogh Apr 24, 2026

Uh oh!

ntrogh Apr 24, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants


		April 24, 2026 by [Bhavya U](https://github.com/bhavyaus), [Connor Peet](https://github.com/connor4312), [Harald Kirschner](https://github.com/digitarald)

		We just got back from [MCP Dev Summit](https://events.linuxfoundation.org/mcp-dev-summit-north-america/) in New York City, where we presented on [MCP Apps support in VS Code](https://code.visualstudio.com/blogs/2026/01/26/mcp-apps-support). One theme came up in nearly every session: too many tools. As the MCP ecosystem grows, servers and extensions each bring their own capabilities, and it adds up fast. Install a few [MCP servers](https://code.visualstudio.com/docs/copilot/customization/mcp-servers), enable some extensions, and your agent session suddenly has access to over 200 tools. Including all of their schemas in every request adds over 12,000 tokens on top of the base prompt, pushing a single turn past 18,000 tokens before you've even said "hello."


		We just got back from [MCP Dev Summit](https://events.linuxfoundation.org/mcp-dev-summit-north-america/) in New York City, where we presented on [MCP Apps support in VS Code](https://code.visualstudio.com/blogs/2026/01/26/mcp-apps-support). One theme came up in nearly every session: too many tools. As the MCP ecosystem grows, servers and extensions each bring their own capabilities, and it adds up fast. Install a few [MCP servers](https://code.visualstudio.com/docs/copilot/customization/mcp-servers), enable some extensions, and your agent session suddenly has access to over 200 tools. Including all of their schemas in every request adds over 12,000 tokens on top of the base prompt, pushing a single turn past 18,000 tokens before you've even said "hello."

		That creates real problems. The model gets slower. It picks the wrong tool more often. And because the tool list changes between sessions (different MCP servers, different extensions), [prompt caching](https://learn.microsoft.com/en-us/azure/foundry/openai/how-to/prompt-caching), one of the biggest performance wins in production LLM systems, becomes unreliable.

	That creates real problems. The model gets slower. It picks the wrong tool more often. And because the tool list changes between sessions (different MCP servers, different extensions), [prompt caching](https://learn.microsoft.com/en-us/azure/foundry/openai/how-to/prompt-caching), one of the biggest performance wins in production LLM systems, becomes unreliable.
	That creates real problems: the model gets slower and it picks the wrong tool more often. And because the tool list changes between sessions (different MCP servers, different extensions), [prompt caching](https://learn.microsoft.com/en-us/azure/foundry/openai/how-to/prompt-caching), one of the biggest performance wins in production LLM systems, becomes unreliable.


		That creates real problems. The model gets slower. It picks the wrong tool more often. And because the tool list changes between sessions (different MCP servers, different extensions), [prompt caching](https://learn.microsoft.com/en-us/azure/foundry/openai/how-to/prompt-caching), one of the biggest performance wins in production LLM systems, becomes unreliable.

		We believe strongly that the client should handle this, not the user. Developers shouldn't have to think about which tools to enable, worry about token budgets, or pay penalties in performance and cost just because their ecosystem is rich. The hard parts of agentic optimization, including how tools are provided to the model, should be invisible. This post walks through how we built that in VS Code: progressive tool discovery.


		## A core toolkit and a search index

		When we looked at telemetry, a clear pattern emerged. On any given turn, the agent uses two to five tools. The same ~30 tools cover nearly 88% of all invocations: reading files, editing code, searching the codebase, running terminal commands. The remaining 170+ tools are specialized: open a browser page, run a notebook cell, create a GitHub pull request, drag an element on a page. Important, but not on every turn.


		When we looked at telemetry, a clear pattern emerged. On any given turn, the agent uses two to five tools. The same ~30 tools cover nearly 88% of all invocations: reading files, editing code, searching the codebase, running terminal commands. The remaining 170+ tools are specialized: open a browser page, run a notebook cell, create a GitHub pull request, drag an element on a page. Important, but not on every turn.

		This suggested a natural split. Keep a core set of tools always available, and make everything else discoverable on demand.

	This suggested a natural split. Keep a core set of tools always available, and make everything else discoverable on demand.
	The pattern pointed to a natural split. Keep a core set of tools always available, and make everything else discoverable on demand.


		Here's where our approach diverges from what you might expect. Anthropic's API offers a built-in server-side tool search. You can include `tool_search_tool_regex` or `tool_search_tool_bm25` in your tools array, and the server handles search automatically using regex pattern matching or BM25 text search.

		We actually shipped that first. In late 2025, we rolled out Anthropic's server-side search behind a feature flag, tested it via A/B experiment, and enabled it for Opus 4.5. It worked, but we ran into three issues:

	We actually shipped that first. In late 2025, we rolled out Anthropic's server-side search behind a feature flag, tested it via A/B experiment, and enabled it for Opus 4.5. It worked, but we ran into three issues:
	Initially, we used this approach. In late 2025, we rolled out Anthropic's server-side search behind a feature flag, tested it via A/B experiment, and enabled it for Opus 4.5. It worked, but we ran into three issues:

	So we pivoted. We replaced server-side search with a client-side implementation running entirely within VS Code, using embedding-based semantic similarity. When the model calls `tool_search`, VS Code computes an embedding of the query, compares it against pre-computed embeddings of every tool's name and description, and returns the closest matches by cosine similarity.
	So we pivoted. We replaced server-side search with a client-side implementation that runs entirely within VS Code and that uses embedding-based semantic similarity. When the model calls `tool_search`, VS Code computes an embedding of the query, compares it against pre-computed embeddings of every tool's name and description, and returns the closest matches by cosine similarity.


		Embeddings beat an LLM at its own game, with a 24 percentage point improvement over defaults. And the client-side implementation needed fewer search invocations than the server-side regex approach to find the right tools in our evals.

		The key insight: Anthropic's API doesn't care who does the search. It supports a "custom tool search" pattern where the client implements its own search tool and returns `tool_reference` content blocks, the same format the server-side search would produce. The API handles the rest, expanding those references into full tool schemas for the model. We get the best of both worlds: Anthropic's native deferred-tool infrastructure with our own search quality.


		## Teaching the model the protocol

		Progressive discovery only works if the model knows the rules. We invest significant prompt engineering in making this seamless:

Conversation

digitarald commented Apr 23, 2026

Uh oh!

ntrogh left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants