feat(distill): cost/scale preflight before O(n²) semantic judgments (#76) by aktasbatuhan · Pull Request #119 · firstbatchxyz/watchmen

aktasbatuhan · 2026-06-04T15:19:36Z

Closes #76.

watchmen distill (semantic mode) judges every n*(n-1)/2 skill pair with an LLM call and gave no warning before the loop started. A 13-skill project is 78 calls; larger ones run into the hundreds, which can be a real surprise on a metered provider.

What it does

A preflight that runs before any judgment HTTP calls and asks to continue:

distill coder-pro
─────────────
13 skills → 78 pair judgments (78 model calls to qwen/qwen3.7-plus)
~217k input tokens
estimated cost ~$0.65 (input tokens only; output not counted)

Under an OAuth subscription, where per-token cost is meaningless, it drops the price and warns about quota burn instead:

13 skills → 78 pair judgments (78 model calls to claude-opus-4-8)
~217k input tokens
billed under your claude-pro subscription — no per-token charge, but this spends 78 calls against your quota

Design notes

Headline is the call count. It's exact and meaningful regardless of billing — that's the real surprise the issue is about. The price is a secondary, conditional line.
Dollars gated on a metered provider. config.active_provider() decides; OAuth providers (claude-pro / chatgpt) get the quota warning, not a price.
No invented output tokens. The judge's reply length isn't knowable pre-run, so the dollar figure is an explicit input-only floor, not a fabricated total. Input tokens are a metadata-size approximation that mirrors the real _semantic_prompt (fixed scaffold + each SKILL.md excerpt capped at 8k chars, billed once per pair it appears in).
No behaviour change for automation. Skipped for --json, non-interactive stdin, and the new --yes/-y flag.

Acceptance criteria (from #76)

TTY semantic distill shows the preflight with pair count + ballpark cost
--yes flag skips the prompt
--json and non-interactive runs skip the prompt (no behaviour change)
Estimate within ~2x of actual on a tested project — 1.05x on the real coder-pro bundle (13 skills, 78 pairs)

Testing

8 new tests in tests/test_skillmesh.py: pair count, metered cost, subscription has no dollar cost, input-only floor, pricing scaling, single-skill (zero pairs), and the skip/proceed/cancel + subscription-render paths.
Full suite green: 518 passed, no regressions.

🤖 Generated with Claude Code

) watchmen distill (semantic mode) judges every n*(n-1)/2 skill pair with an LLM call and gave no warning before the loop started. A 13-skill project is 78 calls; larger ones run into the hundreds. Add a preflight that runs before any judgment HTTP calls and asks to continue: 13 skills → 78 pair judgments (78 model calls to <model>) ~217k input tokens estimated cost ~$0.65 (input tokens only; output not counted) Design notes: - The headline is the call count — exact, and meaningful no matter how the run is billed. That's the real surprise the issue describes. - Dollars are gated on a metered (pay-per-token) provider. Under a claude-pro / chatgpt OAuth subscription there's no per-token charge, so we drop the price and warn about quota burn instead. - We do NOT estimate output tokens (unknowable pre-run), so the dollar figure is an explicit input-only floor rather than a fabricated total. Input tokens are a metadata-size approximation validated at ~1.05x actual on a real bundle. - Skipped for --json, non-interactive stdin, and the new --yes/-y flag, so automated callers see no behaviour change. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

…-preflight

aktasbatuhan and others added 2 commits June 4, 2026 16:19

Merge remote-tracking branch 'origin/main' into worktree-distill-cost…

f693402

…-preflight

aktasbatuhan merged commit 2272677 into main Jun 8, 2026
7 checks passed

aktasbatuhan deleted the worktree-distill-cost-preflight branch June 8, 2026 12:22

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(distill): cost/scale preflight before O(n²) semantic judgments (#76)#119

feat(distill): cost/scale preflight before O(n²) semantic judgments (#76)#119
aktasbatuhan merged 2 commits into
mainfrom
worktree-distill-cost-preflight

aktasbatuhan commented Jun 4, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

aktasbatuhan commented Jun 4, 2026

What it does

Design notes

Acceptance criteria (from #76)

Testing

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant