feat(distill): cost/scale preflight before O(n²) semantic judgments (#76)#119
Merged
Conversation
) watchmen distill (semantic mode) judges every n*(n-1)/2 skill pair with an LLM call and gave no warning before the loop started. A 13-skill project is 78 calls; larger ones run into the hundreds. Add a preflight that runs before any judgment HTTP calls and asks to continue: 13 skills → 78 pair judgments (78 model calls to <model>) ~217k input tokens estimated cost ~$0.65 (input tokens only; output not counted) Design notes: - The headline is the call count — exact, and meaningful no matter how the run is billed. That's the real surprise the issue describes. - Dollars are gated on a metered (pay-per-token) provider. Under a claude-pro / chatgpt OAuth subscription there's no per-token charge, so we drop the price and warn about quota burn instead. - We do NOT estimate output tokens (unknowable pre-run), so the dollar figure is an explicit input-only floor rather than a fabricated total. Input tokens are a metadata-size approximation validated at ~1.05x actual on a real bundle. - Skipped for --json, non-interactive stdin, and the new --yes/-y flag, so automated callers see no behaviour change. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Closes #76.
watchmen distill(semantic mode) judges everyn*(n-1)/2skill pair with an LLM call and gave no warning before the loop started. A 13-skill project is 78 calls; larger ones run into the hundreds, which can be a real surprise on a metered provider.What it does
A preflight that runs before any judgment HTTP calls and asks to continue:
Under an OAuth subscription, where per-token cost is meaningless, it drops the price and warns about quota burn instead:
Design notes
config.active_provider()decides; OAuth providers (claude-pro/chatgpt) get the quota warning, not a price._semantic_prompt(fixed scaffold + each SKILL.md excerpt capped at 8k chars, billed once per pair it appears in).--json, non-interactive stdin, and the new--yes/-yflag.Acceptance criteria (from #76)
--yesflag skips the prompt--jsonand non-interactive runs skip the prompt (no behaviour change)coder-probundle (13 skills, 78 pairs)Testing
tests/test_skillmesh.py: pair count, metered cost, subscription has no dollar cost, input-only floor, pricing scaling, single-skill (zero pairs), and the skip/proceed/cancel + subscription-render paths.🤖 Generated with Claude Code