Skip to content

feat(distill): cost/scale preflight before O(n²) semantic judgments (#76)#119

Merged
aktasbatuhan merged 2 commits into
mainfrom
worktree-distill-cost-preflight
Jun 8, 2026
Merged

feat(distill): cost/scale preflight before O(n²) semantic judgments (#76)#119
aktasbatuhan merged 2 commits into
mainfrom
worktree-distill-cost-preflight

Conversation

@aktasbatuhan

Copy link
Copy Markdown
Member

Closes #76.

watchmen distill (semantic mode) judges every n*(n-1)/2 skill pair with an LLM call and gave no warning before the loop started. A 13-skill project is 78 calls; larger ones run into the hundreds, which can be a real surprise on a metered provider.

What it does

A preflight that runs before any judgment HTTP calls and asks to continue:

distill coder-pro
─────────────
13 skills → 78 pair judgments (78 model calls to qwen/qwen3.7-plus)
~217k input tokens
estimated cost ~$0.65 (input tokens only; output not counted)

Under an OAuth subscription, where per-token cost is meaningless, it drops the price and warns about quota burn instead:

13 skills → 78 pair judgments (78 model calls to claude-opus-4-8)
~217k input tokens
billed under your claude-pro subscription — no per-token charge, but this spends 78 calls against your quota

Design notes

  • Headline is the call count. It's exact and meaningful regardless of billing — that's the real surprise the issue is about. The price is a secondary, conditional line.
  • Dollars gated on a metered provider. config.active_provider() decides; OAuth providers (claude-pro / chatgpt) get the quota warning, not a price.
  • No invented output tokens. The judge's reply length isn't knowable pre-run, so the dollar figure is an explicit input-only floor, not a fabricated total. Input tokens are a metadata-size approximation that mirrors the real _semantic_prompt (fixed scaffold + each SKILL.md excerpt capped at 8k chars, billed once per pair it appears in).
  • No behaviour change for automation. Skipped for --json, non-interactive stdin, and the new --yes/-y flag.

Acceptance criteria (from #76)

  • TTY semantic distill shows the preflight with pair count + ballpark cost
  • --yes flag skips the prompt
  • --json and non-interactive runs skip the prompt (no behaviour change)
  • Estimate within ~2x of actual on a tested project — 1.05x on the real coder-pro bundle (13 skills, 78 pairs)

Testing

  • 8 new tests in tests/test_skillmesh.py: pair count, metered cost, subscription has no dollar cost, input-only floor, pricing scaling, single-skill (zero pairs), and the skip/proceed/cancel + subscription-render paths.
  • Full suite green: 518 passed, no regressions.

🤖 Generated with Claude Code

aktasbatuhan and others added 2 commits June 4, 2026 16:19
)

watchmen distill (semantic mode) judges every n*(n-1)/2 skill pair with an
LLM call and gave no warning before the loop started. A 13-skill project is
78 calls; larger ones run into the hundreds.

Add a preflight that runs before any judgment HTTP calls and asks to continue:

  13 skills → 78 pair judgments (78 model calls to <model>)
  ~217k input tokens
  estimated cost ~$0.65 (input tokens only; output not counted)

Design notes:
- The headline is the call count — exact, and meaningful no matter how the run
  is billed. That's the real surprise the issue describes.
- Dollars are gated on a metered (pay-per-token) provider. Under a claude-pro /
  chatgpt OAuth subscription there's no per-token charge, so we drop the price
  and warn about quota burn instead.
- We do NOT estimate output tokens (unknowable pre-run), so the dollar figure is
  an explicit input-only floor rather than a fabricated total. Input tokens are
  a metadata-size approximation validated at ~1.05x actual on a real bundle.
- Skipped for --json, non-interactive stdin, and the new --yes/-y flag, so
  automated callers see no behaviour change.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@aktasbatuhan aktasbatuhan merged commit 2272677 into main Jun 8, 2026
7 checks passed
@aktasbatuhan aktasbatuhan deleted the worktree-distill-cost-preflight branch June 8, 2026 12:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

distill: cost preflight before O(n²) LLM pair judgments

1 participant