feat(mcp): suppress inter-tool-call prose — final answer is what people actually read#38
Merged
Merged
Conversation
…thing read User feedback: "even during investigations it tells a fuck ton of stuff that no one reads. everyone only looks at the final result." Real waste in a typical Claude Code investigation: ● Found seeder. Now check HOLDS write path. Searched for 2 patterns (ctrl+o to expand) ● Confirmed dupe vector. Different MERGE shapes per writer: ... ● Now check why whales missed. Each "Found X. Now Y." is 5-15 output tokens × N tool calls. On a multi-turn investigation that's 20-40% of the keeba-arm output budget, all of it spent on prose nobody reads — users scroll to the final consolidated answer block. Tightens the existing keeba style + CLAUDE.md template with a "Silence between tool calls" rule: Tool call → tool result → next tool call → final consolidated answer. No interleaved prose. The single exception: pause to ask if a tool call genuinely needs user input to continue. Two new test phrase pins (output-style + CLAUDE.md) so future edits can't quietly soften the rule. This is layered on top of the existing keeba style, not a separate strict preset — the existing style already commits to terseness and inter-tool silence is the next logical rule. Users who want progress markers can /output-style default to revert to baseline behavior. No bench validation in this PR: claude --print doesn't activate output styles without explicit --append-system-prompt-file plumbing, so the headless bench would measure a different thing than what the user gets in interactive mode after /output-style keeba. Validate post-merge manually. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
User feedback: "even during investigations it tells a fuck ton of stuff that no one reads. everyone only looks at the final result."
Tightens the existing keeba style (
~/.claude/output-styles/keeba.md) + CLAUDE.md template with a "Silence between tool calls" rule:Why
Real keeba-arm output from a recent investigation:
Each "Found X. Now Y." line is 5-15 output tokens × N tool calls. On a multi-turn investigation (5-8 tool calls) that's 20-40% of the output budget burned on prose nobody reads — users scroll past it to the final consolidated answer block.
Existing keeba style already commits to terseness ("no preamble", "no closing summary", "quote don't restate"). This PR adds the next logical rule: also no prose between tool calls.
Why one rule, not a separate strict preset
Two reasons:
--with-output-styleopted into terseness; inter-tool silence is the next step on the same axis, not a different policy.keeba-strict.mdpreset doubles maintenance and forces the user to remember to switch styles per task. Single style with one logical default is simpler.Users who want progress markers can
/output-style defaultto revert.Test plan
go test ./internal/cli/...— green (test pins for both output-style and CLAUDE.md added)gofumpt -lcleangolangci-lint run— 0 issues/output-style keebain interactive session, re-run any multi-turn investigation prompt, compare/costagainst the prior run. Expected: output drops 15-30% further on top of current keeba-arm baseline. If it doesn't move, the model is ignoring the directive and the next lever is API-side proxy interception (separate, expensive, not in this PR).Why no headless bench in this PR
claude --printdoesn't activate output styles without explicit--append-system-prompt-fileplumbing, so a headless bench would measure "appended system prompt content" not "/output-style keeba active". Different shape than what the user actually gets. Manual validation in interactive Claude Code is the right test.🤖 Generated with Claude Code