feat(models): cost metering, model routing & budget gating (#464) by markhayden · Pull Request #500 · markhayden/bakin

markhayden · 2026-06-13T22:39:00Z

Summary

Implements per-run cost metering, per-task-type model routing, and budget gating for agent turns — widening issue #464 (budget gating only) into the full cost-optimization story. Design record: .claude/specs/models-cost-optimization.md (+ -plan.md).

Built in four phases, each commit green on its own; full suite 5054 pass / 0 fail.

Phase 0 — cleanup

Deleted the dead taskProfiles + showUsageMetrics (wired to nothing at dispatch time).

Phase 1 — Metering (foundation)

Adapter surfaces per-turn token usage (from the trajectory model.completed event) on MessageResult.usage — previously discarded on the happy path.
Structured pricing on the 9 cloud LLM catalog entries; display string derived via formatCostRange. computeCostUsdMicros returns null when pricing/usage is absent — honest "$ unavailable", never a fabricated $0.
Durable run_costs ledger table (migration v3, keyed by run_id, first-write-wins) + spendTotal/spendByAgent/spendByModel verbs.
Dispatch records cost on settle (via the models.priceTurn hook, so core stays pricing-agnostic) and feeds the live usage recorder (tokensIn/tokensOut/costUsdMicros).
Spend tab + GET /spend?window=.

Phase 2 — Routing (model + thinking per turn)

Verified OpenClaw's gateway agent RPC accepts per-turn model/thinking (P2.0), then threaded them through MessageArgs → gateway params.
Origin-based policy (scheduled|workflow|adhoc|recovery|decomposition) + per-task tag overrides, resolved at dispatch (src/core/model-routing.ts). Empty config = unchanged behavior (regression-guarded).
Routing tab (per-origin model + thinking-level selectors, tag-override list).

Phase 3 — Budget gating (#464)

Pure budget evaluator (src/core/budget.ts): warn at warnPct (default 0.8), defer at 100% — defer-not-pause (diverges from paperclip, no lost work). Daily + monthly windows.
budgetGate consulted before every claimDispatchRun; fail-closed (unreadable ledger defers); warn/defer audits debounce per window.
budget health check (utilization + deferred-run count) + a global-caps editor on the Spend tab.

Phase 4 — Docs

Updated .claude/knowledge/{models-plugin,execution-ledger,usage-recording,dispatch}.md; spec marked implemented.

Design notes / deviations

Kept costRange for image/video/local models (token-per-1M pricing can't express "$0.055 per image"); only cloud LLMs moved to structured pricing.
Routing/budget types live in core (src/core/{model-routing,budget}.ts); the models plugin depends on core (correct direction) and exposes policy to dispatch via hooks (models.getRoutingConfig, models.getBudgetPolicy, models.priceTurn).
Cost is an estimate — cached-token discounts aren't modeled (trajectory usage doesn't break them out), so totals read slightly high under prompt caching. Labeled in the UI.

Scope flagged for follow-up

Budget editor UI exposes global caps; per-agent caps are fully supported by the data model + gating + API, just not in the UI yet.

Testing

TDD throughout (RED→GREEN per task). New/updated tests cover: trajectory usage parse, pricing math + null paths, run_costs idempotency + windowed rollups, dispatch settle cost (metered + unmetered), classifyOrigin/resolveTurnModel + cascade, adapter model/thinking params, dispatch routing application + empty-config regression, budget evaluator (allow/warn/defer), budgetGate fail-closed defer, budget defer + no-policy regression in dispatch, budget health check (ok/warn/error/ledger-unreachable), and all the new routes.

🤖 Generated with Claude Code

Recommendation phase for issue #464, widened from budget gating to the full cost story (metering → routing → gating). Captures the Models-plugin audit, paperclip competitor analysis, locked design decisions, phased roadmap, and per-task commit strategy. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

The taskProfiles feature (free-text task-type → recommended-model with full CRUD UI) was wired to nothing at dispatch time, and the showUsageMetrics setting was declared but never read. Both are removed to clear the surface for origin-based routing (replacement lands in Phase 2). - types: drop TaskProfile + the two dead settings fields - index: delete DEFAULT_TASK_PROFILES, the profile zod schemas, and the GET/PUT /profiles routes; drop showUsageMetrics from settingsSchema - models-page: remove the Task Profiles tab, state, handlers, render block - tests: drop /profiles route + settings-field assertions and the fetch mock Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

OpenClaw records per-run usage (input/output/total) in the trajectory model.completed event, but the happy path returned only the assistant text and discarded it. Surface it on MessageResult so dispatch can meter cost (Phase 1). - concepts: add MessageUsage + MessageResult.usage (re-exported) - trajectory-forensics: expose usage on the success outcome (reuse the existing model.completed parser); thread it through TrajectoryRecoveredTurn - runtime: runOpenClawAgentGateway returns { content, usage }; readTurnUsage reads the success trajectory tail; messaging.send attaches usage. Usage is omitted (never zero-filled) when the runtime recorded none or the send was unthreaded (no trajectory). - tests: parser usage cases + end-to-end send usage (present + absent) Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

costRange was a hand-typed display string never usable for cost math. Add structured ModelPricing (inputPer1M/outputPer1M, micro-dollar cost helper) to the 9 cloud LLM entries — the single source of truth for LLM cost; the display string is now derived via formatCostRange. Non-token models (image/video/local) keep their literal costRange, which token pricing can't express. - known-models: ModelPricing type; pricing{} on cloud LLMs; formatCostRange + computeCostUsdMicros (returns null when pricing/usage absent — honest '$ unavailable', never a fabricated zero) - index: enrichment derives costRange from pricing when no literal - tests: pricing shape, derived display, micro-dollar math + null paths Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Durable per-run cost attribution (ledger migration v3), keyed by run_id — the same key dispatch settles on. A billing fact, not content: token counts + an estimated micro-dollar cost (null = unmetered). - recordRunCost: first-write-wins (INSERT OR IGNORE) so a transport retry of the same run can't double-count - spendTotal/spendByAgent/spendByModel: windowed rollups; null cost rows count as runs with zero dollars (COALESCE), never dropped - purgeTaskRows cascades to run_costs - facade re-exports the verbs + types - tests: rollups, idempotency, window/agent filtering, unmetered row Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

On a settled task turn, attribute its cost: the threadId IS the ledger run id, so recordRunCost is first-write-wins idempotent. Pricing is delegated to the models plugin via a new models.priceTurn hook (core stays pricing-agnostic); the same data feeds the live usage recorder. - usage: UsageEntry gains tokensIn/tokensOut/costUsdMicros (recordUsage now passes them through) - dispatch: sendDispatchMessage returns MessageResult; recordTurnCost runs in the success settle — invokes models.priceTurn, writes run_costs, feeds the recorder. Never throws into the settle path. Absent plugin → tokens recorded, cost null (unmetered), never a fabricated zero. - models: models.priceTurn hook resolves the effective model (explicit → agent config) and returns estimated micro-dollar cost from catalog pricing - tests: priceTurn hook (priced/unpriced/no-tokens); dispatch settle writes a costed row and an unmetered row Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

GET /spend?window=24h|7d|30d|all returns windowed rollups from the run_costs ledger (total + by-agent + by-model). A reporting read, so it degrades gracefully (returns zeros) when the ledger is unavailable rather than crashing the page. - index: /spend route + spend-window parsing; '' model id → 'unknown' - models-page: Spend tab — total card, by-agent + by-model tables, window toggle (URL-backed), 'estimated' caveat, '$ unavailable' for unmetered (zero-cost) rows instead of '$0.00' - tests: route rollups + window default; component spend fetch mock Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

dispatch.ts now imports recordRunCost from the execution-ledger facade (P1.4), so any test mocking that module must export it or the import throws at load. Add it to the in-memory fake. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Verified against the installed OpenClaw 2026.6.5 source (dist/agent-via-gateway-*.js): the gateway 'agent' method accepts `model` and `thinking` as top-level params, alongside the agentId/message/ sessionId/idempotencyKey params Bakin's adapter already sends. The routing phase's one blocking unknown is cleared — per-turn model/thinking override is a clean adapter addition, no agent-config mutation needed. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Bakin-owned per-turn routing policy (the values; the adapter serves them). The routing key is the deterministic dispatch ORIGIN with a per-task TAG override. - classifyOrigin: recovery (dispatch-context) → workflow → scheduled → decomposition → adhoc - resolveTurnModel: model and thinking resolve independently across tag → origin → inherit. No per-agent/global tail — returning nothing means 'inherit', i.e. the adapter passes no model and the runtime uses the agent's configured default (unchanged pre-routing behavior). - types (Origin/ThinkingLevel/RoutingPolicy/TagOverride/RoutingConfig) live in core so the resolver is self-contained; the models plugin will depend on these (correct direction). - tests: origin truth table, cascade precedence, independent model/thinking, 'inherit', empty-config no-op Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

MessageArgs gains optional model + thinking; the adapter threads them through the turn options onto the gateway 'agent' RPC params (verified in P2.0 to accept both). Omitted when unset, so a turn with no routing override behaves exactly as before — the runtime uses the agent's configured model/default. - concepts: MessageArgs.model/thinking - runtime: OpenClawAgentTurnOptions threads them; send/stream forward them; gateway params set model/thinking only when defined - tests: params carry model+thinking when set; absent when unset Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

fireDispatchTurn resolves the per-turn model/thinking from the Bakin routing policy (read via the new models.getRoutingConfig hook) before sending, threads it onto the gateway, and prices the run against the resolved model. All three dispatch paths (cycle, single, workflow) flow through fireDispatchTurn so they inherit routing; only dispatchSingleTask flags isRecovery (source==='recovery') for the recovery-origin policy. - dispatch: DispatchTask carries scheduleJobId/parentId/tags; resolveDispatch Routing (fail-soft → {} inherit); send + recordTurnCost use the resolved model; audits task.routed when an override applies - models: models.getRoutingConfig hook reads settings.routing (empty default); ModelsPluginSettings.routing - tests: origin policy reaches the turn; empty-config regression (no model/ thinking → unchanged dispatch); 7-hook registration Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

GET/PUT /routing persist the per-turn routing policy to plugin settings, validated by a zod schema (origin enum from core ORIGINS, thinking levels incl. 'inherit'). A Routing tab edits it: a row per dispatch origin with a model + thinking selector, plus an add/remove tag-override list. Blank rows and 'inherit' thinking are dropped on save to keep storage clean. - index: /routing GET+PUT + RoutingConfig zod schema - models-page: Routing tab (origins table + tag overrides), pending/save - tests: GET default + PUT persist + unknown-origin rejection; component routing fetch mock Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Pure spend-ceiling logic dispatch will consult before claiming a run (P3.2 wires it). Given a policy and the ledger-sourced spend per scope × window, decide allow / warn / defer. - BudgetPolicy: global caps + warnPct, per-agent caps (USD; unset = unlimited) - dayStartMs/monthStartMs: local calendar-day + calendar-month boundaries (daily catches a runaway night; monthly aligns with the provider invoice) - evaluateBudget: defer (cap met/exceeded) > warn (>= warnPct, default 0.8) > allow; returns the worst breach for the audit reason - defer (not pause) is the deliberate divergence from paperclip — no lost work, spend just throttles until the window rolls over - tests: boundaries, allow/warn/defer, defer-beats-warn, per-agent, warnPct Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Before each dispatch claims a run, budgetGate consults the budget policy (read via the new models.getBudgetPolicy hook) against ledger spend for the day + month windows, global + per-agent. On defer the task stays in todo (no claim, no send) and resumes when the window rolls over or the cap is raised — no work lost. Warn/defer audits debounce per window so a cap at 85%, or a task deferred every cycle, doesn't spam the log. - dispatch: budgetGate + auditBudgetOnce; wired before all three claim sites (cycle, single, workflow). FAIL-CLOSED: a failed spend read defers (consistent with the ledger's posture). No policy → allow. - models: models.getBudgetPolicy hook + ModelsPluginSettings.budget - tests: over-cap defers (no send); no-policy regression (dispatch proceeds) Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Closes the gating loop with config + visibility. - models: GET/PUT /budget (zod-validated BudgetPolicy: positive caps, warnPct in (0,1]); global-caps editor + utilization on the Spend tab - health: 'budget' check — ok under caps, warn approaching warnPct or when runs were deferred in 24h, error at/over a cap (dispatch blocked) or when the spend ledger is unreachable (gating fails closed) - tests: budget GET/PUT + negative-cap rejection; health check across ok/warn/error + ledger-unreachable; component budget fetch mock Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

dispatch.ts now imports spendTotal from the execution-ledger facade for budget gating (P3.2); the usage-wiring test mocks that module, so it must export spendTotal or the import throws at load. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

- models-plugin: pricing, metering, routing, budget gating, Spend view - execution-ledger: run_costs table + spend verbs - usage-recording: tokensIn/tokensOut/costUsdMicros fields - dispatch: model routing + budget gate around turn fire - spec: mark implemented Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Export budgetGate as a test seam and unit-test the paths the pure evaluator can't reach: no-policy allow, caps-with-zero-spend allow, and the FAIL-CLOSED defer-with-audit when the spend ledger read throws. The shared-db dispatch harness can't force a ledger throw without corrupting later tests, so this isolates the gate with a mocked ledger. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

The session-death ladder's dispatchSingleTask(...,'recovery') path set isRecovery, but the main dispatchTasks cycle also re-dispatches tasks carrying a persisted sessionDeath record (when the ladder timer is lost to a restart or the immediate re-dispatch was budget-deferred) — and pushed the turn without isRecovery, so it routed by task shape instead of 'recovery'. Thread isRecovery=!!recovery into the pendingTurns push so both paths route identically. - test: a sessionDeath task re-dispatched via the main cycle routes to the configured 'recovery' model Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

…iew #2,#3) The budget cap previously only saw dispatched task turns — watchdog nudges, doctor notifications, agent-to-agent sends, and orchestrator completion pings billed tokens but were invisible to spendTotal, so a runaway non-dispatch loop went uncapped (the exact #464 failure mode). - new src/core/agent-cost.ts: shared meterAgentTurn used by dispatch's settle path AND the four non-dispatch send sites. Synthetic runId + null task_id for non-dispatch turns (run_costs.task_id now nullable, v3). - review #3: price against the model the runtime ACTUALLY ran (usage.model) before the requested override, so a rejected/fell-back per-turn model is billed correctly. - agent-cost imports its ledger/usage/hook deps DYNAMICALLY so the newly metering modules don't drag the ledger into their static graph — existing partial-mock tests keep working; a missing export just no-ops in the existing try/catch. - tests: meterAgentTurn (synthetic id, null task, actual-model pricing, never-throws); ledger null-task_id row counts in spend. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

,#10) - #6: per-cycle spend memoization — budgetGate accepts a cache so the two global spend aggregates aren't recomputed for every queued task in a cycle (was 4 SQL sums/task; globals are cycle-constant since costs only land on settle, after the loop). - #7: dedup the three identical gate sites behind deferForBudget(). - #9: the budget health check now calls evaluateBudget for the global scope instead of re-implementing the cap-vs-spend math — the doctor can't drift from what dispatch enforces. - #10: reword the budget window doc — it's local-time and an estimate, not invoice-exact (was overclaiming provider-invoice alignment). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

…eview #8) The SDK's plugin-facing runtime message types were independent duplicates of core's and hadn't gained the per-turn model/thinking args or the usage result field — so a plugin author using ctx.runtime.messaging.send couldn't pass routing overrides or read token usage the runtime supports. Add model?/thinking? to RuntimeMessageArgs and a RuntimeMessageUsage on RuntimeMessageResult. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

#5) - #4: document readTurnUsage's reliance on OpenClaw's model.completed → session.ended → gateway-frame write ordering, and that a broken ordering only yields a silently-unmetered turn (no crash/wrong cost). - #5: note that computeCostUsdMicros returns null for total-only usage by design — the input/output split is required for accurate per-1M pricing, so we don't guess. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

…lable Reflects the review fixes — meter-all (not dispatch-only) via agent-cost, nullable task_id for non-dispatch turns, actual-model attribution, and the per-cycle budget spend cache. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

markhayden · 2026-06-13T23:44:18Z

Self-review applied (high-effort, 7 finder angles + verify)

Ran a full review of the branch and fixed everything actionable before manual testing. Summary:

🔴 Correctness — fixed

feat: clipboard image paste in task details #1 Recovery routing on the main cycle — the dispatchTasks cycle re-dispatched session-death tasks without isRecovery, so they routed by task-shape instead of the recovery origin (only the immediate ladder path was correct). Threaded isRecovery=!!recovery.
fix: clean up approval process and content rendering in task details #2 Budget cap under-counted spend — only dispatched task turns were metered; watchdog/doctor/orchestrator/agent sends billed tokens invisibly. Added a shared meterAgentTurn (src/core/agent-cost.ts) used by all 5 send sites; run_costs.task_id is now nullable for non-dispatch turns. The cap now bounds true total spend.
feat: power all plugin search bars with Antfly full-text search #3 Wrong-model cost attribution — now prices against the model the runtime actually ran (usage.model) before any requested override.

🟠 Accuracy — addressed

Auto-archive confirmed tasks to prevent TASKBOARD.md bloat #4 / Persist drawer width in localStorage #5 — documented the trajectory write-ordering reliance (a broken ordering only yields a silently-unmetered turn, never a crash) and the total-only→null pricing fallback.

🟡 Cleanups — fixed

Persist activity log open/closed state in localStorage #6 per-cycle spend memoization (was 4 SQL sums/task; globals are cycle-constant).
refactor: plugin architecture, design system, and UI polish #7 deduped the 3 gate sites behind deferForBudget.
feat: support external assets (Google Docs, PDFs, URLs) #8 synced the SDK RuntimeMessageArgs/Result types with model/thinking/usage.
feat: manual asset upload from the UI #9 the budget health check now reuses evaluateBudget (can't drift from the dispatch gate).
feat(calendar): persist and browse past brainstorm conversations #10 reworded the budget-window doc (local-time estimate, not invoice-exact).

Deliberately left (with rationale)

Dollar-formatter duplication — the client (models-page) vs server (health/budget) import boundary + differing semantics (null-for-unmetered vs always-show) make a clean shared helper scope-creep; both are small and local.
thinking as string at the adapter — already validated by the zod route + the gateway; tightening the type would couple the adapter contract to core routing.
budgetAuditedWindows Set growth (~8 keys/day, resets on restart) and the React pending ?? base copy-paste — cosmetic.

Full suite: 5060 pass / 0 fail. agent-cost imports its ledger/usage/hook deps dynamically so the wider metering surface didn't break existing partial-mock tests. Ready for manual testing.

Manual testing showed non-dispatch sends recorded $0 (no trajectory → no usage) and that dispatched turns ignored cache reads, undercounting cost badly (a turn with 34k cache reads priced ~5x low). - adapter: extractOpenClawAgentUsage reads result.meta.agentMeta.usage (input/output/total + cacheRead/cacheWrite), preferred over the trajectory re-read. Works for UNTHREADED sends too → non-dispatch turns are now priced (fixes the gap) — and removes the happy-path trajectory read (review #4). Trajectory stays the fallback. - MessageUsage / SDK RuntimeMessageUsage gain cacheRead/cacheWrite. - computeCostUsdMicros prices cacheRead at cachedReadPer1M, defaulting to 0.1x input (common cross-provider rate) when unspecified — estimate-grade but far better than pricing cache reads at $0. Threaded through the models.priceTurn hook + meterAgentTurn. - tests: payload-sourced usage incl. cache (no trajectory); cache pricing (default 0.1x + explicit override) Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Add a 'Bakin Metered Spend (24h, estimated)' card fed by the run_costs ledger (via the models /spend route) — distinct from the existing runtime-reported 'Runtime Cost Estimate' card. Puts our cost where the operator naturally looks, not only in Models → Spend. Fetched best-effort (optional if the models plugin is disabled), zero-cost rows show '$ n/a'. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Image inference is a separate billed path from chat-turn tokens — pixel's image task only recorded its text turn, not the (often larger) image cost. Now persistImageResult meters each generate/edit via meterImageTurn → a run_costs spend event attributed to the invoking agent, so image spend shows in the Spend tab/Health and counts toward the budget cap. - known-models: imagePerUsd (flat per-image rate; flux-pro=0.055) + computeImageCostUsdMicros. Provider-priced/ranged models stay unpriced (run recorded, '$ unavailable') — never guessed. - models.priceImage hook mirrors priceTurn; meterImageTurn (agent-cost) records the image event (no tokens; synthetic image: runId). - images/tools: meter on the shared persist chokepoint (generate + edit). - tests: image cost math, priceImage hook, meterImageTurn (priced + null). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

…, health card Reflects the manual-testing fixes: usage now read from the gateway payload (incl. cache tokens), cache-read pricing, image-generation spend events, and the Bakin Metered Spend health card. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

…t Usage End-user confusion: two cards both reading as 'cost'. Resolve by giving each one job (review of the cost UX): - Rename 'Runtime Cost Estimate' → 'Runtime Usage' (tokens only, dollars dropped; badge shows total tokens). It's runtime-reported token usage. - 'Bakin Metered Spend' → 'Bakin Spend' — the single dollar/budget card (the figure the cap gates on), with a one-line clarifier. - Layout: Context Usage now full-width; Runtime Usage + Bakin Spend sit half-width side by side. - test: updated to assert the usage-only runtime card (no $ rendered). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Per UX request: - Order: summary tiles → Estimated Token Usage + Estimated Cost → Tool Usage → Context Usage → Search → Active Plugins → Diagnostics. - Rename 'Runtime Usage' → 'Estimated Token Usage', 'Bakin Spend' → 'Estimated Cost'. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

…bustness Last-review findings on the manual-test work: - computeCostUsdMicros no longer drops a CACHE-ONLY turn (the input==0 && output==0 guard returned null before adding cacheRead — the exact 34k cache-read case); and now prices cacheWrite (was shown but billed at $0). 0.1×/1.25× defaults are named constants, documented as Anthropic-exact / approximate elsewhere. - adapter: a total-only gateway-payload usage block no longer masks the trajectory's priceable input/output split (extractOpenClawAgentUsage requires input/output, else falls back). - health dashboard: the cross-plugin /spend fetch can't reject the core Promise.all (→ .catch null) and is trusted only on 2xx (the 500 error path returns {totalUsdMicros:0}, which must not render as a real $0); the two cards drop to full-width when only one is present. - agent-cost: meterAgentTurn/meterImageTurn now share one recordSpend writer so the budget-cap spend contract is single-sourced. - docs + stale /spend description corrected (cache IS modeled now). - tests: cache-only + cacheWrite pricing; total-only payload fallback. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

…mization # Conflicts: # .claude/knowledge/execution-ledger.md # packages/sdk/src/types/index.ts # plugins/models/types.ts

The models plugin manifest still listed the deleted /profiles routes and lacked the new /spend, /routing (GET+PUT), and /budget (GET+PUT) routes, so docs:check failed (openapi.json out of sync). Update contributes.apiRoutes + description and regenerate the docs artifacts (openapi.json, hooks, exec-tools, cli, reference pages). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

main advanced past WS2's branch point (WS3 #501 usePluginEvent, #500 models cost). Conflicts + semantic fixes resolved: - tasks/plan.md + tasks/todo.md: textual conflicts (each workstream rewrites these). Archived WS2's as tasks/{plan,todo}-ws2-core-extractions.md (matches the plan-ws1-contract-types.md convention); kept main's active WS3 plan/todo. - #500 added two NEW getHookRegistry consumers on the OLD import path that WS2's K1 moved to the leaf module: plugins/health/lib/system-checks/budget.ts and src/core/agent-cost.ts → repointed both to @bakin/core/hooks/hook-registry-singleton (relative for the plugin file). - #500's tests (agent-cost, budget-gate, health/budget) mocked getHookRegistry only on the legacy facade; added the leaf mock (K1 partial-mock sweep) so all three exercise the real import site. Verified: bun run typecheck clean; bun run test 5072 pass / 0 fail; madge shows 6 type-only cycles (the 4 WS2 documented + 2 docs/ cycles inherited from main, all erased at compile) — none route through scripts/lib/registry, so WS2's runtime cycle break holds. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

markhayden and others added 25 commits June 13, 2026 09:12

markhayden and others added 4 commits June 13, 2026 18:46

markhayden and others added 5 commits June 13, 2026 19:31

Merge remote-tracking branch 'origin/main' into feat/models-cost-opti…

776e51f

…mization # Conflicts: # .claude/knowledge/execution-ledger.md # packages/sdk/src/types/index.ts # plugins/models/types.ts

markhayden merged commit e6582db into main Jun 14, 2026
1 check passed

markhayden deleted the feat/models-cost-optimization branch June 14, 2026 02:28

markhayden mentioned this pull request Jun 15, 2026

feat(sdk): WS3b — remaining SDK client primitives (useJsonFetch, ConfirmDialog, useAvailableModels, formatters, EmptyState, toneBadge) #502

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(models): cost metering, model routing & budget gating (#464)#500

feat(models): cost metering, model routing & budget gating (#464)#500
markhayden merged 34 commits into
mainfrom
feat/models-cost-optimization

markhayden commented Jun 13, 2026

Uh oh!

markhayden commented Jun 13, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

markhayden commented Jun 13, 2026

Summary

Phase 0 — cleanup

Phase 1 — Metering (foundation)

Phase 2 — Routing (model + thinking per turn)

Phase 3 — Budget gating (#464)

Phase 4 — Docs

Design notes / deviations

Scope flagged for follow-up

Testing

Uh oh!

markhayden commented Jun 13, 2026

Self-review applied (high-effort, 7 finder angles + verify)

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant