Skip to content

feat(models): cost metering, model routing & budget gating (#464)#500

Merged
markhayden merged 34 commits into
mainfrom
feat/models-cost-optimization
Jun 14, 2026
Merged

feat(models): cost metering, model routing & budget gating (#464)#500
markhayden merged 34 commits into
mainfrom
feat/models-cost-optimization

Conversation

@markhayden

Copy link
Copy Markdown
Owner

Summary

Implements per-run cost metering, per-task-type model routing, and budget gating for agent turns — widening issue #464 (budget gating only) into the full cost-optimization story. Design record: .claude/specs/models-cost-optimization.md (+ -plan.md).

Built in four phases, each commit green on its own; full suite 5054 pass / 0 fail.

Phase 0 — cleanup

  • Deleted the dead taskProfiles + showUsageMetrics (wired to nothing at dispatch time).

Phase 1 — Metering (foundation)

  • Adapter surfaces per-turn token usage (from the trajectory model.completed event) on MessageResult.usage — previously discarded on the happy path.
  • Structured pricing on the 9 cloud LLM catalog entries; display string derived via formatCostRange. computeCostUsdMicros returns null when pricing/usage is absent — honest "$ unavailable", never a fabricated $0.
  • Durable run_costs ledger table (migration v3, keyed by run_id, first-write-wins) + spendTotal/spendByAgent/spendByModel verbs.
  • Dispatch records cost on settle (via the models.priceTurn hook, so core stays pricing-agnostic) and feeds the live usage recorder (tokensIn/tokensOut/costUsdMicros).
  • Spend tab + GET /spend?window=.

Phase 2 — Routing (model + thinking per turn)

  • Verified OpenClaw's gateway agent RPC accepts per-turn model/thinking (P2.0), then threaded them through MessageArgs → gateway params.
  • Origin-based policy (scheduled|workflow|adhoc|recovery|decomposition) + per-task tag overrides, resolved at dispatch (src/core/model-routing.ts). Empty config = unchanged behavior (regression-guarded).
  • Routing tab (per-origin model + thinking-level selectors, tag-override list).

Phase 3 — Budget gating (#464)

  • Pure budget evaluator (src/core/budget.ts): warn at warnPct (default 0.8), defer at 100% — defer-not-pause (diverges from paperclip, no lost work). Daily + monthly windows.
  • budgetGate consulted before every claimDispatchRun; fail-closed (unreadable ledger defers); warn/defer audits debounce per window.
  • budget health check (utilization + deferred-run count) + a global-caps editor on the Spend tab.

Phase 4 — Docs

  • Updated .claude/knowledge/{models-plugin,execution-ledger,usage-recording,dispatch}.md; spec marked implemented.

Design notes / deviations

  • Kept costRange for image/video/local models (token-per-1M pricing can't express "$0.055 per image"); only cloud LLMs moved to structured pricing.
  • Routing/budget types live in core (src/core/{model-routing,budget}.ts); the models plugin depends on core (correct direction) and exposes policy to dispatch via hooks (models.getRoutingConfig, models.getBudgetPolicy, models.priceTurn).
  • Cost is an estimate — cached-token discounts aren't modeled (trajectory usage doesn't break them out), so totals read slightly high under prompt caching. Labeled in the UI.

Scope flagged for follow-up

  • Budget editor UI exposes global caps; per-agent caps are fully supported by the data model + gating + API, just not in the UI yet.

Testing

TDD throughout (RED→GREEN per task). New/updated tests cover: trajectory usage parse, pricing math + null paths, run_costs idempotency + windowed rollups, dispatch settle cost (metered + unmetered), classifyOrigin/resolveTurnModel + cascade, adapter model/thinking params, dispatch routing application + empty-config regression, budget evaluator (allow/warn/defer), budgetGate fail-closed defer, budget defer + no-policy regression in dispatch, budget health check (ok/warn/error/ledger-unreachable), and all the new routes.

🤖 Generated with Claude Code

markhayden and others added 25 commits June 13, 2026 09:12
Recommendation phase for issue #464, widened from budget gating to the
full cost story (metering → routing → gating). Captures the Models-plugin
audit, paperclip competitor analysis, locked design decisions, phased
roadmap, and per-task commit strategy.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The taskProfiles feature (free-text task-type → recommended-model with
full CRUD UI) was wired to nothing at dispatch time, and the
showUsageMetrics setting was declared but never read. Both are removed to
clear the surface for origin-based routing (replacement lands in Phase 2).

- types: drop TaskProfile + the two dead settings fields
- index: delete DEFAULT_TASK_PROFILES, the profile zod schemas, and the
  GET/PUT /profiles routes; drop showUsageMetrics from settingsSchema
- models-page: remove the Task Profiles tab, state, handlers, render block
- tests: drop /profiles route + settings-field assertions and the fetch mock

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
OpenClaw records per-run usage (input/output/total) in the trajectory
model.completed event, but the happy path returned only the assistant
text and discarded it. Surface it on MessageResult so dispatch can meter
cost (Phase 1).

- concepts: add MessageUsage + MessageResult.usage (re-exported)
- trajectory-forensics: expose usage on the success outcome (reuse the
  existing model.completed parser); thread it through TrajectoryRecoveredTurn
- runtime: runOpenClawAgentGateway returns { content, usage }; readTurnUsage
  reads the success trajectory tail; messaging.send attaches usage. Usage is
  omitted (never zero-filled) when the runtime recorded none or the send was
  unthreaded (no trajectory).
- tests: parser usage cases + end-to-end send usage (present + absent)

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
costRange was a hand-typed display string never usable for cost math.
Add structured ModelPricing (inputPer1M/outputPer1M, micro-dollar cost
helper) to the 9 cloud LLM entries — the single source of truth for LLM
cost; the display string is now derived via formatCostRange. Non-token
models (image/video/local) keep their literal costRange, which token
pricing can't express.

- known-models: ModelPricing type; pricing{} on cloud LLMs; formatCostRange
  + computeCostUsdMicros (returns null when pricing/usage absent — honest
  '$ unavailable', never a fabricated zero)
- index: enrichment derives costRange from pricing when no literal
- tests: pricing shape, derived display, micro-dollar math + null paths

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Durable per-run cost attribution (ledger migration v3), keyed by run_id —
the same key dispatch settles on. A billing fact, not content: token
counts + an estimated micro-dollar cost (null = unmetered).

- recordRunCost: first-write-wins (INSERT OR IGNORE) so a transport retry
  of the same run can't double-count
- spendTotal/spendByAgent/spendByModel: windowed rollups; null cost rows
  count as runs with zero dollars (COALESCE), never dropped
- purgeTaskRows cascades to run_costs
- facade re-exports the verbs + types
- tests: rollups, idempotency, window/agent filtering, unmetered row

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
On a settled task turn, attribute its cost: the threadId IS the ledger
run id, so recordRunCost is first-write-wins idempotent. Pricing is
delegated to the models plugin via a new models.priceTurn hook (core
stays pricing-agnostic); the same data feeds the live usage recorder.

- usage: UsageEntry gains tokensIn/tokensOut/costUsdMicros (recordUsage
  now passes them through)
- dispatch: sendDispatchMessage returns MessageResult; recordTurnCost runs
  in the success settle — invokes models.priceTurn, writes run_costs, feeds
  the recorder. Never throws into the settle path. Absent plugin → tokens
  recorded, cost null (unmetered), never a fabricated zero.
- models: models.priceTurn hook resolves the effective model (explicit →
  agent config) and returns estimated micro-dollar cost from catalog pricing
- tests: priceTurn hook (priced/unpriced/no-tokens); dispatch settle writes
  a costed row and an unmetered row

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
GET /spend?window=24h|7d|30d|all returns windowed rollups from the
run_costs ledger (total + by-agent + by-model). A reporting read, so it
degrades gracefully (returns zeros) when the ledger is unavailable rather
than crashing the page.

- index: /spend route + spend-window parsing; '' model id → 'unknown'
- models-page: Spend tab — total card, by-agent + by-model tables, window
  toggle (URL-backed), 'estimated' caveat, '$ unavailable' for unmetered
  (zero-cost) rows instead of '$0.00'
- tests: route rollups + window default; component spend fetch mock

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
dispatch.ts now imports recordRunCost from the execution-ledger facade
(P1.4), so any test mocking that module must export it or the import
throws at load. Add it to the in-memory fake.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Verified against the installed OpenClaw 2026.6.5 source
(dist/agent-via-gateway-*.js): the gateway 'agent' method accepts `model`
and `thinking` as top-level params, alongside the agentId/message/
sessionId/idempotencyKey params Bakin's adapter already sends. The
routing phase's one blocking unknown is cleared — per-turn model/thinking
override is a clean adapter addition, no agent-config mutation needed.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Bakin-owned per-turn routing policy (the values; the adapter serves them).
The routing key is the deterministic dispatch ORIGIN with a per-task TAG
override.

- classifyOrigin: recovery (dispatch-context) → workflow → scheduled →
  decomposition → adhoc
- resolveTurnModel: model and thinking resolve independently across
  tag → origin → inherit. No per-agent/global tail — returning nothing
  means 'inherit', i.e. the adapter passes no model and the runtime uses
  the agent's configured default (unchanged pre-routing behavior).
- types (Origin/ThinkingLevel/RoutingPolicy/TagOverride/RoutingConfig) live
  in core so the resolver is self-contained; the models plugin will depend
  on these (correct direction).
- tests: origin truth table, cascade precedence, independent model/thinking,
  'inherit', empty-config no-op

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
MessageArgs gains optional model + thinking; the adapter threads them
through the turn options onto the gateway 'agent' RPC params (verified in
P2.0 to accept both). Omitted when unset, so a turn with no routing
override behaves exactly as before — the runtime uses the agent's
configured model/default.

- concepts: MessageArgs.model/thinking
- runtime: OpenClawAgentTurnOptions threads them; send/stream forward them;
  gateway params set model/thinking only when defined
- tests: params carry model+thinking when set; absent when unset

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
fireDispatchTurn resolves the per-turn model/thinking from the Bakin
routing policy (read via the new models.getRoutingConfig hook) before
sending, threads it onto the gateway, and prices the run against the
resolved model. All three dispatch paths (cycle, single, workflow) flow
through fireDispatchTurn so they inherit routing; only dispatchSingleTask
flags isRecovery (source==='recovery') for the recovery-origin policy.

- dispatch: DispatchTask carries scheduleJobId/parentId/tags; resolveDispatch
  Routing (fail-soft → {} inherit); send + recordTurnCost use the resolved
  model; audits task.routed when an override applies
- models: models.getRoutingConfig hook reads settings.routing (empty default);
  ModelsPluginSettings.routing
- tests: origin policy reaches the turn; empty-config regression (no model/
  thinking → unchanged dispatch); 7-hook registration

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
GET/PUT /routing persist the per-turn routing policy to plugin settings,
validated by a zod schema (origin enum from core ORIGINS, thinking levels
incl. 'inherit'). A Routing tab edits it: a row per dispatch origin with a
model + thinking selector, plus an add/remove tag-override list. Blank
rows and 'inherit' thinking are dropped on save to keep storage clean.

- index: /routing GET+PUT + RoutingConfig zod schema
- models-page: Routing tab (origins table + tag overrides), pending/save
- tests: GET default + PUT persist + unknown-origin rejection; component
  routing fetch mock

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Pure spend-ceiling logic dispatch will consult before claiming a run
(P3.2 wires it). Given a policy and the ledger-sourced spend per scope ×
window, decide allow / warn / defer.

- BudgetPolicy: global caps + warnPct, per-agent caps (USD; unset = unlimited)
- dayStartMs/monthStartMs: local calendar-day + calendar-month boundaries
  (daily catches a runaway night; monthly aligns with the provider invoice)
- evaluateBudget: defer (cap met/exceeded) > warn (>= warnPct, default 0.8)
  > allow; returns the worst breach for the audit reason
- defer (not pause) is the deliberate divergence from paperclip — no lost
  work, spend just throttles until the window rolls over
- tests: boundaries, allow/warn/defer, defer-beats-warn, per-agent, warnPct

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Before each dispatch claims a run, budgetGate consults the budget policy
(read via the new models.getBudgetPolicy hook) against ledger spend for
the day + month windows, global + per-agent. On defer the task stays in
todo (no claim, no send) and resumes when the window rolls over or the cap
is raised — no work lost. Warn/defer audits debounce per window so a cap
at 85%, or a task deferred every cycle, doesn't spam the log.

- dispatch: budgetGate + auditBudgetOnce; wired before all three claim
  sites (cycle, single, workflow). FAIL-CLOSED: a failed spend read defers
  (consistent with the ledger's posture). No policy → allow.
- models: models.getBudgetPolicy hook + ModelsPluginSettings.budget
- tests: over-cap defers (no send); no-policy regression (dispatch proceeds)

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Closes the gating loop with config + visibility.

- models: GET/PUT /budget (zod-validated BudgetPolicy: positive caps,
  warnPct in (0,1]); global-caps editor + utilization on the Spend tab
- health: 'budget' check — ok under caps, warn approaching warnPct or when
  runs were deferred in 24h, error at/over a cap (dispatch blocked) or when
  the spend ledger is unreachable (gating fails closed)
- tests: budget GET/PUT + negative-cap rejection; health check across
  ok/warn/error + ledger-unreachable; component budget fetch mock

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
dispatch.ts now imports spendTotal from the execution-ledger facade for
budget gating (P3.2); the usage-wiring test mocks that module, so it must
export spendTotal or the import throws at load.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
- models-plugin: pricing, metering, routing, budget gating, Spend view
- execution-ledger: run_costs table + spend verbs
- usage-recording: tokensIn/tokensOut/costUsdMicros fields
- dispatch: model routing + budget gate around turn fire
- spec: mark implemented

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Export budgetGate as a test seam and unit-test the paths the pure
evaluator can't reach: no-policy allow, caps-with-zero-spend allow, and
the FAIL-CLOSED defer-with-audit when the spend ledger read throws. The
shared-db dispatch harness can't force a ledger throw without corrupting
later tests, so this isolates the gate with a mocked ledger.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The session-death ladder's dispatchSingleTask(...,'recovery') path set
isRecovery, but the main dispatchTasks cycle also re-dispatches tasks
carrying a persisted sessionDeath record (when the ladder timer is lost to
a restart or the immediate re-dispatch was budget-deferred) — and pushed
the turn without isRecovery, so it routed by task shape instead of
'recovery'. Thread isRecovery=!!recovery into the pendingTurns push so
both paths route identically.

- test: a sessionDeath task re-dispatched via the main cycle routes to the
  configured 'recovery' model

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…iew #2,#3)

The budget cap previously only saw dispatched task turns — watchdog
nudges, doctor notifications, agent-to-agent sends, and orchestrator
completion pings billed tokens but were invisible to spendTotal, so a
runaway non-dispatch loop went uncapped (the exact #464 failure mode).

- new src/core/agent-cost.ts: shared meterAgentTurn used by dispatch's
  settle path AND the four non-dispatch send sites. Synthetic runId + null
  task_id for non-dispatch turns (run_costs.task_id now nullable, v3).
- review #3: price against the model the runtime ACTUALLY ran (usage.model)
  before the requested override, so a rejected/fell-back per-turn model is
  billed correctly.
- agent-cost imports its ledger/usage/hook deps DYNAMICALLY so the newly
  metering modules don't drag the ledger into their static graph — existing
  partial-mock tests keep working; a missing export just no-ops in the
  existing try/catch.
- tests: meterAgentTurn (synthetic id, null task, actual-model pricing,
  never-throws); ledger null-task_id row counts in spend.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
,#10)

- #6: per-cycle spend memoization — budgetGate accepts a cache so the two
  global spend aggregates aren't recomputed for every queued task in a
  cycle (was 4 SQL sums/task; globals are cycle-constant since costs only
  land on settle, after the loop).
- #7: dedup the three identical gate sites behind deferForBudget().
- #9: the budget health check now calls evaluateBudget for the global scope
  instead of re-implementing the cap-vs-spend math — the doctor can't drift
  from what dispatch enforces.
- #10: reword the budget window doc — it's local-time and an estimate, not
  invoice-exact (was overclaiming provider-invoice alignment).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…eview #8)

The SDK's plugin-facing runtime message types were independent duplicates
of core's and hadn't gained the per-turn model/thinking args or the usage
result field — so a plugin author using ctx.runtime.messaging.send
couldn't pass routing overrides or read token usage the runtime supports.
Add model?/thinking? to RuntimeMessageArgs and a RuntimeMessageUsage on
RuntimeMessageResult.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
#5)

- #4: document readTurnUsage's reliance on OpenClaw's model.completed →
  session.ended → gateway-frame write ordering, and that a broken ordering
  only yields a silently-unmetered turn (no crash/wrong cost).
- #5: note that computeCostUsdMicros returns null for total-only usage by
  design — the input/output split is required for accurate per-1M pricing,
  so we don't guess.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…lable

Reflects the review fixes — meter-all (not dispatch-only) via agent-cost,
nullable task_id for non-dispatch turns, actual-model attribution, and the
per-cycle budget spend cache.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@markhayden

Copy link
Copy Markdown
Owner Author

Self-review applied (high-effort, 7 finder angles + verify)

Ran a full review of the branch and fixed everything actionable before manual testing. Summary:

🔴 Correctness — fixed

🟠 Accuracy — addressed

🟡 Cleanups — fixed

Deliberately left (with rationale)

  • Dollar-formatter duplication — the client (models-page) vs server (health/budget) import boundary + differing semantics (null-for-unmetered vs always-show) make a clean shared helper scope-creep; both are small and local.
  • thinking as string at the adapter — already validated by the zod route + the gateway; tightening the type would couple the adapter contract to core routing.
  • budgetAuditedWindows Set growth (~8 keys/day, resets on restart) and the React pending ?? base copy-paste — cosmetic.

Full suite: 5060 pass / 0 fail. agent-cost imports its ledger/usage/hook deps dynamically so the wider metering surface didn't break existing partial-mock tests. Ready for manual testing.

markhayden and others added 4 commits June 13, 2026 18:46
Manual testing showed non-dispatch sends recorded $0 (no trajectory →
no usage) and that dispatched turns ignored cache reads, undercounting
cost badly (a turn with 34k cache reads priced ~5x low).

- adapter: extractOpenClawAgentUsage reads result.meta.agentMeta.usage
  (input/output/total + cacheRead/cacheWrite), preferred over the trajectory
  re-read. Works for UNTHREADED sends too → non-dispatch turns are now
  priced (fixes the gap) — and removes the happy-path trajectory read
  (review #4). Trajectory stays the fallback.
- MessageUsage / SDK RuntimeMessageUsage gain cacheRead/cacheWrite.
- computeCostUsdMicros prices cacheRead at cachedReadPer1M, defaulting to
  0.1x input (common cross-provider rate) when unspecified — estimate-grade
  but far better than pricing cache reads at $0. Threaded through the
  models.priceTurn hook + meterAgentTurn.
- tests: payload-sourced usage incl. cache (no trajectory); cache pricing
  (default 0.1x + explicit override)

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Add a 'Bakin Metered Spend (24h, estimated)' card fed by the run_costs
ledger (via the models /spend route) — distinct from the existing
runtime-reported 'Runtime Cost Estimate' card. Puts our cost where the
operator naturally looks, not only in Models → Spend. Fetched best-effort
(optional if the models plugin is disabled), zero-cost rows show '$ n/a'.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Image inference is a separate billed path from chat-turn tokens — pixel's
image task only recorded its text turn, not the (often larger) image cost.
Now persistImageResult meters each generate/edit via meterImageTurn → a
run_costs spend event attributed to the invoking agent, so image spend
shows in the Spend tab/Health and counts toward the budget cap.

- known-models: imagePerUsd (flat per-image rate; flux-pro=0.055) +
  computeImageCostUsdMicros. Provider-priced/ranged models stay unpriced
  (run recorded, '$ unavailable') — never guessed.
- models.priceImage hook mirrors priceTurn; meterImageTurn (agent-cost)
  records the image event (no tokens; synthetic image: runId).
- images/tools: meter on the shared persist chokepoint (generate + edit).
- tests: image cost math, priceImage hook, meterImageTurn (priced + null).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…, health card

Reflects the manual-testing fixes: usage now read from the gateway payload
(incl. cache tokens), cache-read pricing, image-generation spend events,
and the Bakin Metered Spend health card.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
markhayden and others added 5 commits June 13, 2026 19:31
…t Usage

End-user confusion: two cards both reading as 'cost'. Resolve by giving
each one job (review of the cost UX):
- Rename 'Runtime Cost Estimate' → 'Runtime Usage' (tokens only, dollars
  dropped; badge shows total tokens). It's runtime-reported token usage.
- 'Bakin Metered Spend' → 'Bakin Spend' — the single dollar/budget card
  (the figure the cap gates on), with a one-line clarifier.
- Layout: Context Usage now full-width; Runtime Usage + Bakin Spend sit
  half-width side by side.
- test: updated to assert the usage-only runtime card (no $ rendered).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Per UX request:
- Order: summary tiles → Estimated Token Usage + Estimated Cost →
  Tool Usage → Context Usage → Search → Active Plugins → Diagnostics.
- Rename 'Runtime Usage' → 'Estimated Token Usage', 'Bakin Spend' →
  'Estimated Cost'.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…bustness

Last-review findings on the manual-test work:
- computeCostUsdMicros no longer drops a CACHE-ONLY turn (the input==0 &&
  output==0 guard returned null before adding cacheRead — the exact 34k
  cache-read case); and now prices cacheWrite (was shown but billed at $0).
  0.1×/1.25× defaults are named constants, documented as Anthropic-exact /
  approximate elsewhere.
- adapter: a total-only gateway-payload usage block no longer masks the
  trajectory's priceable input/output split (extractOpenClawAgentUsage
  requires input/output, else falls back).
- health dashboard: the cross-plugin /spend fetch can't reject the core
  Promise.all (→ .catch null) and is trusted only on 2xx (the 500 error
  path returns {totalUsdMicros:0}, which must not render as a real $0);
  the two cards drop to full-width when only one is present.
- agent-cost: meterAgentTurn/meterImageTurn now share one recordSpend
  writer so the budget-cap spend contract is single-sourced.
- docs + stale /spend description corrected (cache IS modeled now).
- tests: cache-only + cacheWrite pricing; total-only payload fallback.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…mization

# Conflicts:
#	.claude/knowledge/execution-ledger.md
#	packages/sdk/src/types/index.ts
#	plugins/models/types.ts
The models plugin manifest still listed the deleted /profiles routes and
lacked the new /spend, /routing (GET+PUT), and /budget (GET+PUT) routes,
so docs:check failed (openapi.json out of sync). Update contributes.apiRoutes
+ description and regenerate the docs artifacts (openapi.json, hooks,
exec-tools, cli, reference pages).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@markhayden markhayden merged commit e6582db into main Jun 14, 2026
1 check passed
@markhayden markhayden deleted the feat/models-cost-optimization branch June 14, 2026 02:28
markhayden added a commit that referenced this pull request Jun 15, 2026
main advanced past WS2's branch point (WS3 #501 usePluginEvent, #500 models
cost). Conflicts + semantic fixes resolved:

- tasks/plan.md + tasks/todo.md: textual conflicts (each workstream rewrites
  these). Archived WS2's as tasks/{plan,todo}-ws2-core-extractions.md (matches
  the plan-ws1-contract-types.md convention); kept main's active WS3 plan/todo.
- #500 added two NEW getHookRegistry consumers on the OLD import path that
  WS2's K1 moved to the leaf module: plugins/health/lib/system-checks/budget.ts
  and src/core/agent-cost.ts → repointed both to
  @bakin/core/hooks/hook-registry-singleton (relative for the plugin file).
- #500's tests (agent-cost, budget-gate, health/budget) mocked getHookRegistry
  only on the legacy facade; added the leaf mock (K1 partial-mock sweep) so all
  three exercise the real import site.

Verified: bun run typecheck clean; bun run test 5072 pass / 0 fail; madge shows
6 type-only cycles (the 4 WS2 documented + 2 docs/ cycles inherited from main,
all erased at compile) — none route through scripts/lib/registry, so WS2's
runtime cycle break holds.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant