feat(models): cost metering, model routing & budget gating (#464)#500
Merged
Conversation
Recommendation phase for issue #464, widened from budget gating to the full cost story (metering → routing → gating). Captures the Models-plugin audit, paperclip competitor analysis, locked design decisions, phased roadmap, and per-task commit strategy. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The taskProfiles feature (free-text task-type → recommended-model with full CRUD UI) was wired to nothing at dispatch time, and the showUsageMetrics setting was declared but never read. Both are removed to clear the surface for origin-based routing (replacement lands in Phase 2). - types: drop TaskProfile + the two dead settings fields - index: delete DEFAULT_TASK_PROFILES, the profile zod schemas, and the GET/PUT /profiles routes; drop showUsageMetrics from settingsSchema - models-page: remove the Task Profiles tab, state, handlers, render block - tests: drop /profiles route + settings-field assertions and the fetch mock Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
OpenClaw records per-run usage (input/output/total) in the trajectory
model.completed event, but the happy path returned only the assistant
text and discarded it. Surface it on MessageResult so dispatch can meter
cost (Phase 1).
- concepts: add MessageUsage + MessageResult.usage (re-exported)
- trajectory-forensics: expose usage on the success outcome (reuse the
existing model.completed parser); thread it through TrajectoryRecoveredTurn
- runtime: runOpenClawAgentGateway returns { content, usage }; readTurnUsage
reads the success trajectory tail; messaging.send attaches usage. Usage is
omitted (never zero-filled) when the runtime recorded none or the send was
unthreaded (no trajectory).
- tests: parser usage cases + end-to-end send usage (present + absent)
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
costRange was a hand-typed display string never usable for cost math.
Add structured ModelPricing (inputPer1M/outputPer1M, micro-dollar cost
helper) to the 9 cloud LLM entries — the single source of truth for LLM
cost; the display string is now derived via formatCostRange. Non-token
models (image/video/local) keep their literal costRange, which token
pricing can't express.
- known-models: ModelPricing type; pricing{} on cloud LLMs; formatCostRange
+ computeCostUsdMicros (returns null when pricing/usage absent — honest
'$ unavailable', never a fabricated zero)
- index: enrichment derives costRange from pricing when no literal
- tests: pricing shape, derived display, micro-dollar math + null paths
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Durable per-run cost attribution (ledger migration v3), keyed by run_id — the same key dispatch settles on. A billing fact, not content: token counts + an estimated micro-dollar cost (null = unmetered). - recordRunCost: first-write-wins (INSERT OR IGNORE) so a transport retry of the same run can't double-count - spendTotal/spendByAgent/spendByModel: windowed rollups; null cost rows count as runs with zero dollars (COALESCE), never dropped - purgeTaskRows cascades to run_costs - facade re-exports the verbs + types - tests: rollups, idempotency, window/agent filtering, unmetered row Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
On a settled task turn, attribute its cost: the threadId IS the ledger run id, so recordRunCost is first-write-wins idempotent. Pricing is delegated to the models plugin via a new models.priceTurn hook (core stays pricing-agnostic); the same data feeds the live usage recorder. - usage: UsageEntry gains tokensIn/tokensOut/costUsdMicros (recordUsage now passes them through) - dispatch: sendDispatchMessage returns MessageResult; recordTurnCost runs in the success settle — invokes models.priceTurn, writes run_costs, feeds the recorder. Never throws into the settle path. Absent plugin → tokens recorded, cost null (unmetered), never a fabricated zero. - models: models.priceTurn hook resolves the effective model (explicit → agent config) and returns estimated micro-dollar cost from catalog pricing - tests: priceTurn hook (priced/unpriced/no-tokens); dispatch settle writes a costed row and an unmetered row Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
GET /spend?window=24h|7d|30d|all returns windowed rollups from the run_costs ledger (total + by-agent + by-model). A reporting read, so it degrades gracefully (returns zeros) when the ledger is unavailable rather than crashing the page. - index: /spend route + spend-window parsing; '' model id → 'unknown' - models-page: Spend tab — total card, by-agent + by-model tables, window toggle (URL-backed), 'estimated' caveat, '$ unavailable' for unmetered (zero-cost) rows instead of '$0.00' - tests: route rollups + window default; component spend fetch mock Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
dispatch.ts now imports recordRunCost from the execution-ledger facade (P1.4), so any test mocking that module must export it or the import throws at load. Add it to the in-memory fake. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Verified against the installed OpenClaw 2026.6.5 source (dist/agent-via-gateway-*.js): the gateway 'agent' method accepts `model` and `thinking` as top-level params, alongside the agentId/message/ sessionId/idempotencyKey params Bakin's adapter already sends. The routing phase's one blocking unknown is cleared — per-turn model/thinking override is a clean adapter addition, no agent-config mutation needed. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Bakin-owned per-turn routing policy (the values; the adapter serves them). The routing key is the deterministic dispatch ORIGIN with a per-task TAG override. - classifyOrigin: recovery (dispatch-context) → workflow → scheduled → decomposition → adhoc - resolveTurnModel: model and thinking resolve independently across tag → origin → inherit. No per-agent/global tail — returning nothing means 'inherit', i.e. the adapter passes no model and the runtime uses the agent's configured default (unchanged pre-routing behavior). - types (Origin/ThinkingLevel/RoutingPolicy/TagOverride/RoutingConfig) live in core so the resolver is self-contained; the models plugin will depend on these (correct direction). - tests: origin truth table, cascade precedence, independent model/thinking, 'inherit', empty-config no-op Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
MessageArgs gains optional model + thinking; the adapter threads them through the turn options onto the gateway 'agent' RPC params (verified in P2.0 to accept both). Omitted when unset, so a turn with no routing override behaves exactly as before — the runtime uses the agent's configured model/default. - concepts: MessageArgs.model/thinking - runtime: OpenClawAgentTurnOptions threads them; send/stream forward them; gateway params set model/thinking only when defined - tests: params carry model+thinking when set; absent when unset Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
fireDispatchTurn resolves the per-turn model/thinking from the Bakin
routing policy (read via the new models.getRoutingConfig hook) before
sending, threads it onto the gateway, and prices the run against the
resolved model. All three dispatch paths (cycle, single, workflow) flow
through fireDispatchTurn so they inherit routing; only dispatchSingleTask
flags isRecovery (source==='recovery') for the recovery-origin policy.
- dispatch: DispatchTask carries scheduleJobId/parentId/tags; resolveDispatch
Routing (fail-soft → {} inherit); send + recordTurnCost use the resolved
model; audits task.routed when an override applies
- models: models.getRoutingConfig hook reads settings.routing (empty default);
ModelsPluginSettings.routing
- tests: origin policy reaches the turn; empty-config regression (no model/
thinking → unchanged dispatch); 7-hook registration
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
GET/PUT /routing persist the per-turn routing policy to plugin settings, validated by a zod schema (origin enum from core ORIGINS, thinking levels incl. 'inherit'). A Routing tab edits it: a row per dispatch origin with a model + thinking selector, plus an add/remove tag-override list. Blank rows and 'inherit' thinking are dropped on save to keep storage clean. - index: /routing GET+PUT + RoutingConfig zod schema - models-page: Routing tab (origins table + tag overrides), pending/save - tests: GET default + PUT persist + unknown-origin rejection; component routing fetch mock Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Pure spend-ceiling logic dispatch will consult before claiming a run (P3.2 wires it). Given a policy and the ledger-sourced spend per scope × window, decide allow / warn / defer. - BudgetPolicy: global caps + warnPct, per-agent caps (USD; unset = unlimited) - dayStartMs/monthStartMs: local calendar-day + calendar-month boundaries (daily catches a runaway night; monthly aligns with the provider invoice) - evaluateBudget: defer (cap met/exceeded) > warn (>= warnPct, default 0.8) > allow; returns the worst breach for the audit reason - defer (not pause) is the deliberate divergence from paperclip — no lost work, spend just throttles until the window rolls over - tests: boundaries, allow/warn/defer, defer-beats-warn, per-agent, warnPct Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Before each dispatch claims a run, budgetGate consults the budget policy (read via the new models.getBudgetPolicy hook) against ledger spend for the day + month windows, global + per-agent. On defer the task stays in todo (no claim, no send) and resumes when the window rolls over or the cap is raised — no work lost. Warn/defer audits debounce per window so a cap at 85%, or a task deferred every cycle, doesn't spam the log. - dispatch: budgetGate + auditBudgetOnce; wired before all three claim sites (cycle, single, workflow). FAIL-CLOSED: a failed spend read defers (consistent with the ledger's posture). No policy → allow. - models: models.getBudgetPolicy hook + ModelsPluginSettings.budget - tests: over-cap defers (no send); no-policy regression (dispatch proceeds) Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Closes the gating loop with config + visibility. - models: GET/PUT /budget (zod-validated BudgetPolicy: positive caps, warnPct in (0,1]); global-caps editor + utilization on the Spend tab - health: 'budget' check — ok under caps, warn approaching warnPct or when runs were deferred in 24h, error at/over a cap (dispatch blocked) or when the spend ledger is unreachable (gating fails closed) - tests: budget GET/PUT + negative-cap rejection; health check across ok/warn/error + ledger-unreachable; component budget fetch mock Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
dispatch.ts now imports spendTotal from the execution-ledger facade for budget gating (P3.2); the usage-wiring test mocks that module, so it must export spendTotal or the import throws at load. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
- models-plugin: pricing, metering, routing, budget gating, Spend view - execution-ledger: run_costs table + spend verbs - usage-recording: tokensIn/tokensOut/costUsdMicros fields - dispatch: model routing + budget gate around turn fire - spec: mark implemented Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Export budgetGate as a test seam and unit-test the paths the pure evaluator can't reach: no-policy allow, caps-with-zero-spend allow, and the FAIL-CLOSED defer-with-audit when the spend ledger read throws. The shared-db dispatch harness can't force a ledger throw without corrupting later tests, so this isolates the gate with a mocked ledger. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The session-death ladder's dispatchSingleTask(...,'recovery') path set isRecovery, but the main dispatchTasks cycle also re-dispatches tasks carrying a persisted sessionDeath record (when the ladder timer is lost to a restart or the immediate re-dispatch was budget-deferred) — and pushed the turn without isRecovery, so it routed by task shape instead of 'recovery'. Thread isRecovery=!!recovery into the pendingTurns push so both paths route identically. - test: a sessionDeath task re-dispatched via the main cycle routes to the configured 'recovery' model Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…iew #2,#3) The budget cap previously only saw dispatched task turns — watchdog nudges, doctor notifications, agent-to-agent sends, and orchestrator completion pings billed tokens but were invisible to spendTotal, so a runaway non-dispatch loop went uncapped (the exact #464 failure mode). - new src/core/agent-cost.ts: shared meterAgentTurn used by dispatch's settle path AND the four non-dispatch send sites. Synthetic runId + null task_id for non-dispatch turns (run_costs.task_id now nullable, v3). - review #3: price against the model the runtime ACTUALLY ran (usage.model) before the requested override, so a rejected/fell-back per-turn model is billed correctly. - agent-cost imports its ledger/usage/hook deps DYNAMICALLY so the newly metering modules don't drag the ledger into their static graph — existing partial-mock tests keep working; a missing export just no-ops in the existing try/catch. - tests: meterAgentTurn (synthetic id, null task, actual-model pricing, never-throws); ledger null-task_id row counts in spend. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
,#10) - #6: per-cycle spend memoization — budgetGate accepts a cache so the two global spend aggregates aren't recomputed for every queued task in a cycle (was 4 SQL sums/task; globals are cycle-constant since costs only land on settle, after the loop). - #7: dedup the three identical gate sites behind deferForBudget(). - #9: the budget health check now calls evaluateBudget for the global scope instead of re-implementing the cap-vs-spend math — the doctor can't drift from what dispatch enforces. - #10: reword the budget window doc — it's local-time and an estimate, not invoice-exact (was overclaiming provider-invoice alignment). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…eview #8) The SDK's plugin-facing runtime message types were independent duplicates of core's and hadn't gained the per-turn model/thinking args or the usage result field — so a plugin author using ctx.runtime.messaging.send couldn't pass routing overrides or read token usage the runtime supports. Add model?/thinking? to RuntimeMessageArgs and a RuntimeMessageUsage on RuntimeMessageResult. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
#5) - #4: document readTurnUsage's reliance on OpenClaw's model.completed → session.ended → gateway-frame write ordering, and that a broken ordering only yields a silently-unmetered turn (no crash/wrong cost). - #5: note that computeCostUsdMicros returns null for total-only usage by design — the input/output split is required for accurate per-1M pricing, so we don't guess. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…lable Reflects the review fixes — meter-all (not dispatch-only) via agent-cost, nullable task_id for non-dispatch turns, actual-model attribution, and the per-cycle budget spend cache. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Owner
Author
Self-review applied (high-effort, 7 finder angles + verify)Ran a full review of the branch and fixed everything actionable before manual testing. Summary: 🔴 Correctness — fixed
🟠 Accuracy — addressed
🟡 Cleanups — fixed
Deliberately left (with rationale)
Full suite: 5060 pass / 0 fail. |
Manual testing showed non-dispatch sends recorded $0 (no trajectory → no usage) and that dispatched turns ignored cache reads, undercounting cost badly (a turn with 34k cache reads priced ~5x low). - adapter: extractOpenClawAgentUsage reads result.meta.agentMeta.usage (input/output/total + cacheRead/cacheWrite), preferred over the trajectory re-read. Works for UNTHREADED sends too → non-dispatch turns are now priced (fixes the gap) — and removes the happy-path trajectory read (review #4). Trajectory stays the fallback. - MessageUsage / SDK RuntimeMessageUsage gain cacheRead/cacheWrite. - computeCostUsdMicros prices cacheRead at cachedReadPer1M, defaulting to 0.1x input (common cross-provider rate) when unspecified — estimate-grade but far better than pricing cache reads at $0. Threaded through the models.priceTurn hook + meterAgentTurn. - tests: payload-sourced usage incl. cache (no trajectory); cache pricing (default 0.1x + explicit override) Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Add a 'Bakin Metered Spend (24h, estimated)' card fed by the run_costs ledger (via the models /spend route) — distinct from the existing runtime-reported 'Runtime Cost Estimate' card. Puts our cost where the operator naturally looks, not only in Models → Spend. Fetched best-effort (optional if the models plugin is disabled), zero-cost rows show '$ n/a'. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Image inference is a separate billed path from chat-turn tokens — pixel's image task only recorded its text turn, not the (often larger) image cost. Now persistImageResult meters each generate/edit via meterImageTurn → a run_costs spend event attributed to the invoking agent, so image spend shows in the Spend tab/Health and counts toward the budget cap. - known-models: imagePerUsd (flat per-image rate; flux-pro=0.055) + computeImageCostUsdMicros. Provider-priced/ranged models stay unpriced (run recorded, '$ unavailable') — never guessed. - models.priceImage hook mirrors priceTurn; meterImageTurn (agent-cost) records the image event (no tokens; synthetic image: runId). - images/tools: meter on the shared persist chokepoint (generate + edit). - tests: image cost math, priceImage hook, meterImageTurn (priced + null). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…, health card Reflects the manual-testing fixes: usage now read from the gateway payload (incl. cache tokens), cache-read pricing, image-generation spend events, and the Bakin Metered Spend health card. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…t Usage End-user confusion: two cards both reading as 'cost'. Resolve by giving each one job (review of the cost UX): - Rename 'Runtime Cost Estimate' → 'Runtime Usage' (tokens only, dollars dropped; badge shows total tokens). It's runtime-reported token usage. - 'Bakin Metered Spend' → 'Bakin Spend' — the single dollar/budget card (the figure the cap gates on), with a one-line clarifier. - Layout: Context Usage now full-width; Runtime Usage + Bakin Spend sit half-width side by side. - test: updated to assert the usage-only runtime card (no $ rendered). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Per UX request: - Order: summary tiles → Estimated Token Usage + Estimated Cost → Tool Usage → Context Usage → Search → Active Plugins → Diagnostics. - Rename 'Runtime Usage' → 'Estimated Token Usage', 'Bakin Spend' → 'Estimated Cost'. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…bustness
Last-review findings on the manual-test work:
- computeCostUsdMicros no longer drops a CACHE-ONLY turn (the input==0 &&
output==0 guard returned null before adding cacheRead — the exact 34k
cache-read case); and now prices cacheWrite (was shown but billed at $0).
0.1×/1.25× defaults are named constants, documented as Anthropic-exact /
approximate elsewhere.
- adapter: a total-only gateway-payload usage block no longer masks the
trajectory's priceable input/output split (extractOpenClawAgentUsage
requires input/output, else falls back).
- health dashboard: the cross-plugin /spend fetch can't reject the core
Promise.all (→ .catch null) and is trusted only on 2xx (the 500 error
path returns {totalUsdMicros:0}, which must not render as a real $0);
the two cards drop to full-width when only one is present.
- agent-cost: meterAgentTurn/meterImageTurn now share one recordSpend
writer so the budget-cap spend contract is single-sourced.
- docs + stale /spend description corrected (cache IS modeled now).
- tests: cache-only + cacheWrite pricing; total-only payload fallback.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…mization # Conflicts: # .claude/knowledge/execution-ledger.md # packages/sdk/src/types/index.ts # plugins/models/types.ts
The models plugin manifest still listed the deleted /profiles routes and lacked the new /spend, /routing (GET+PUT), and /budget (GET+PUT) routes, so docs:check failed (openapi.json out of sync). Update contributes.apiRoutes + description and regenerate the docs artifacts (openapi.json, hooks, exec-tools, cli, reference pages). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
markhayden
added a commit
that referenced
this pull request
Jun 15, 2026
main advanced past WS2's branch point (WS3 #501 usePluginEvent, #500 models cost). Conflicts + semantic fixes resolved: - tasks/plan.md + tasks/todo.md: textual conflicts (each workstream rewrites these). Archived WS2's as tasks/{plan,todo}-ws2-core-extractions.md (matches the plan-ws1-contract-types.md convention); kept main's active WS3 plan/todo. - #500 added two NEW getHookRegistry consumers on the OLD import path that WS2's K1 moved to the leaf module: plugins/health/lib/system-checks/budget.ts and src/core/agent-cost.ts → repointed both to @bakin/core/hooks/hook-registry-singleton (relative for the plugin file). - #500's tests (agent-cost, budget-gate, health/budget) mocked getHookRegistry only on the legacy facade; added the leaf mock (K1 partial-mock sweep) so all three exercise the real import site. Verified: bun run typecheck clean; bun run test 5072 pass / 0 fail; madge shows 6 type-only cycles (the 4 WS2 documented + 2 docs/ cycles inherited from main, all erased at compile) — none route through scripts/lib/registry, so WS2's runtime cycle break holds. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Implements per-run cost metering, per-task-type model routing, and budget gating for agent turns — widening issue #464 (budget gating only) into the full cost-optimization story. Design record:
.claude/specs/models-cost-optimization.md(+-plan.md).Built in four phases, each commit green on its own; full suite 5054 pass / 0 fail.
Phase 0 — cleanup
taskProfiles+showUsageMetrics(wired to nothing at dispatch time).Phase 1 — Metering (foundation)
usage(from the trajectorymodel.completedevent) onMessageResult.usage— previously discarded on the happy path.pricingon the 9 cloud LLM catalog entries; display string derived viaformatCostRange.computeCostUsdMicrosreturns null when pricing/usage is absent — honest "$ unavailable", never a fabricated$0.run_costsledger table (migration v3, keyed byrun_id, first-write-wins) +spendTotal/spendByAgent/spendByModelverbs.models.priceTurnhook, so core stays pricing-agnostic) and feeds the live usage recorder (tokensIn/tokensOut/costUsdMicros).GET /spend?window=.Phase 2 — Routing (model + thinking per turn)
agentRPC accepts per-turnmodel/thinking(P2.0), then threaded them throughMessageArgs→ gateway params.scheduled|workflow|adhoc|recovery|decomposition) + per-task tag overrides, resolved at dispatch (src/core/model-routing.ts). Empty config = unchanged behavior (regression-guarded).Phase 3 — Budget gating (#464)
src/core/budget.ts): warn atwarnPct(default 0.8), defer at 100% — defer-not-pause (diverges from paperclip, no lost work). Daily + monthly windows.budgetGateconsulted before everyclaimDispatchRun; fail-closed (unreadable ledger defers); warn/defer audits debounce per window.budgethealth check (utilization + deferred-run count) + a global-caps editor on the Spend tab.Phase 4 — Docs
.claude/knowledge/{models-plugin,execution-ledger,usage-recording,dispatch}.md; spec marked implemented.Design notes / deviations
costRangefor image/video/local models (token-per-1M pricing can't express "$0.055 per image"); only cloud LLMs moved to structuredpricing.src/core/{model-routing,budget}.ts); the models plugin depends on core (correct direction) and exposes policy to dispatch via hooks (models.getRoutingConfig,models.getBudgetPolicy,models.priceTurn).Scope flagged for follow-up
Testing
TDD throughout (RED→GREEN per task). New/updated tests cover: trajectory usage parse, pricing math + null paths,
run_costsidempotency + windowed rollups, dispatch settle cost (metered + unmetered),classifyOrigin/resolveTurnModel+ cascade, adapter model/thinking params, dispatch routing application + empty-config regression, budget evaluator (allow/warn/defer),budgetGatefail-closed defer, budget defer + no-policy regression in dispatch, budget health check (ok/warn/error/ledger-unreachable), and all the new routes.🤖 Generated with Claude Code