feat(mcp): suppress inter-tool-call prose — final answer is what people actually read by aomerk · Pull Request #38 · aomerk/keeba

aomerk · 2026-05-01T11:42:04Z

Summary

User feedback: "even during investigations it tells a fuck ton of stuff that no one reads. everyone only looks at the final result."

Tightens the existing keeba style (~/.claude/output-styles/keeba.md) + CLAUDE.md template with a "Silence between tool calls" rule:

Tool call → tool result → next tool call → final consolidated answer. No interleaved prose. The single exception: pause to ask if a tool call genuinely requires user input to continue.

Why

Real keeba-arm output from a recent investigation:

● Found seeder. Now check HOLDS write path.
  Searched for 2 patterns (ctrl+o to expand)
● Confirmed dupe vector. Different MERGE shapes per writer:
  ...
● Now check why whales missed.

Each "Found X. Now Y." line is 5-15 output tokens × N tool calls. On a multi-turn investigation (5-8 tool calls) that's 20-40% of the output budget burned on prose nobody reads — users scroll past it to the final consolidated answer block.

Existing keeba style already commits to terseness ("no preamble", "no closing summary", "quote don't restate"). This PR adds the next logical rule: also no prose between tool calls.

Why one rule, not a separate strict preset

Two reasons:

The "default keeba" style is already aggressive. Users who installed --with-output-style opted into terseness; inter-tool silence is the next step on the same axis, not a different policy.
Splitting into a separate keeba-strict.md preset doubles maintenance and forces the user to remember to switch styles per task. Single style with one logical default is simpler.

Users who want progress markers can /output-style default to revert.

Test plan

go test ./internal/cli/... — green (test pins for both output-style and CLAUDE.md added)
gofumpt -l clean
golangci-lint run — 0 issues
Post-merge manual A/B (run by user): rebuild keeba, re-install, /output-style keeba in interactive session, re-run any multi-turn investigation prompt, compare /cost against the prior run. Expected: output drops 15-30% further on top of current keeba-arm baseline. If it doesn't move, the model is ignoring the directive and the next lever is API-side proxy interception (separate, expensive, not in this PR).

Why no headless bench in this PR

claude --print doesn't activate output styles without explicit --append-system-prompt-file plumbing, so a headless bench would measure "appended system prompt content" not "/output-style keeba active". Different shape than what the user actually gets. Manual validation in interactive Claude Code is the right test.

🤖 Generated with Claude Code

…thing read User feedback: "even during investigations it tells a fuck ton of stuff that no one reads. everyone only looks at the final result." Real waste in a typical Claude Code investigation: ● Found seeder. Now check HOLDS write path. Searched for 2 patterns (ctrl+o to expand) ● Confirmed dupe vector. Different MERGE shapes per writer: ... ● Now check why whales missed. Each "Found X. Now Y." is 5-15 output tokens × N tool calls. On a multi-turn investigation that's 20-40% of the keeba-arm output budget, all of it spent on prose nobody reads — users scroll to the final consolidated answer block. Tightens the existing keeba style + CLAUDE.md template with a "Silence between tool calls" rule: Tool call → tool result → next tool call → final consolidated answer. No interleaved prose. The single exception: pause to ask if a tool call genuinely needs user input to continue. Two new test phrase pins (output-style + CLAUDE.md) so future edits can't quietly soften the rule. This is layered on top of the existing keeba style, not a separate strict preset — the existing style already commits to terseness and inter-tool silence is the next logical rule. Users who want progress markers can /output-style default to revert to baseline behavior. No bench validation in this PR: claude --print doesn't activate output styles without explicit --append-system-prompt-file plumbing, so the headless bench would measure a different thing than what the user gets in interactive mode after /output-style keeba. Validate post-merge manually. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

aomerk merged commit bcf2916 into main May 1, 2026
2 checks passed

aomerk deleted the feat-suppress-intertool-prose branch May 1, 2026 11:49

aomerk mentioned this pull request May 1, 2026

feat(mcp): --with-output-style auto-activates the style (no manual /style needed) #39

Merged

3 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(mcp): suppress inter-tool-call prose — final answer is what people actually read#38

feat(mcp): suppress inter-tool-call prose — final answer is what people actually read#38
aomerk merged 1 commit into
mainfrom
feat-suppress-intertool-prose

aomerk commented May 1, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

aomerk commented May 1, 2026

Summary

Why

Why one rule, not a separate strict preset

Test plan

Why no headless bench in this PR

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant