Model Layer Phase-2: per-provider breaker, token tripwire, per-attempt logging, accounted shadow by nfeuer · Pull Request #118 · nfeuer/donna

nfeuer · 2026-06-12T15:05:07Z

Summary

Model Layer Phase-2 hardening — the four router.py-heavy items deferred from the Fable Model Layer critique (Phase-1 landed in #114). Design doc: docs/superpowers/specs/2026-06-11-model-layer-fable-critique-design.md.

Findings → fixes

#	Fix
#5	Per-provider circuit breaker. Replaced the single shared `CircuitBreaker` with a per-provider dict; `resilient_call` gets the breaker of the provider that actually runs the call (the fallback's provider after an overflow escalation). An Ollama outage opening Ollama's breaker no longer short-circuits Anthropic. Breaker-open now dispatches a fallback alert, not just a critical log.
#7	Token-truncation tripwire + self-calibrating divisor (design A). `tokens_in >= num_ctx − output_reserve` on an Ollama call ⇒ suspected silent truncation: log + alert + log the suspect-but-billed row (`interrupted=1`) + re-dispatch once to the fallback alias — or loud-fail with `ContextOverflowError` when none (KEEP path preserved). The `len//4` estimator is now calibrated by a per-task-type EMA (clamped, safety-factored) via `OllamaConfig.token_estimation`. No tokenizer dependency.
#2	Log billed calls on every error/retry path. Providers capture `usage` before parsing and raise `ResponseParseError` carrying metadata; `resilient_call` gained `is_retryable` + `on_attempt_failure` hooks + a shared `is_transient_error` classifier (retry transport/5xx/408/425/429/529; fail-fast 4xx/auth). One `invocation_log` row per billed attempt (`interrupted=1` on failure); parse failures retry at most once.
#3	Shadow through `complete()` (design B core). `_run_shadow` recurses into `complete(is_shadow=True)` via a synthetic `<task_type>::__shadow__` route, so shadow spend is budget-pre-checked, logged (`is_shadow=1`, real `cost_usd`), and breaker-guarded. Airtight recursion guard (a shadow never spawns its own shadow) + `shadow.enabled` kill-switch (default false). Resolves via `_lookup_routing_entry` (prefix-aware). Deleted the dead `TaskTypeEntry.shadow` field + the inert `shadow:` key in `task_types.yaml`.

KEEP list (verified, not "fixed")

The loud-fail ContextOverflowError when no fallback (the fix closes the under-estimate hole, doesn't soften the error), config-driven longest-prefix routing, the fire-and-forget-with-strong-refs shadow task pattern, and Ollama's loud NotImplementedError for tools/messages.

Deferred

The statistical weekly shadow stable-state auto-disable job — ML-FABLE-P2 in followups.md, trigger = shadow.enabled: true in prod. The config kill-switch + accounted spend landed now.

Review + testing

The router-heavy implementation was reviewed for the three highest-risk properties: the shadow recursion guard (runaway-spend risk), the tripwire's loud-fail preservation, and the breaker keying to the post-escalation provider. New test_router_phase2_hardening.py + updates; 90 Phase-2/resilience tests green, full model-layer suite green, ruff + mypy clean. (The 5 pre-existing failures are the environmental HuggingFace-403 sentence-transformers download.)

Spec §4.2/§4.4 + docs/domain/model-layer.md synced. Branched off main.

https://claude.ai/code/session_01ChAGsS8vJGrz44ojrASLYs

Generated by Claude Code

…ken tripwire, per-attempt logging, accounted shadow Implements the four router.py-heavy items deferred from the Model Layer Fable critique (Phase-1 = #114, merged). Design doc: docs/superpowers/specs/2026-06-11-model-layer-fable-critique-design.md. #5 Per-provider circuit breaker. Replaced the single shared CircuitBreaker with a per-provider dict (lazy _breaker_for(provider)); resilient_call is handed the breaker of the provider that ACTUALLY runs the call (the fallback's provider after an overflow escalation). An Ollama outage opening Ollama's breaker no longer short-circuits Anthropic calls. Breaker-open now dispatches a fallback alert, not just a critical log. #7 Token-truncation tripwire + self-calibrating divisor (design A). After each Ollama call, tokens_in >= num_ctx - output_reserve is treated as suspected SILENT truncation (the failure the loud ContextOverflowError exists to prevent): logs truncation_suspected, alerts, logs the suspect-but-billed local row (interrupted=1), and re-dispatches once to the fallback alias — or loud-fails with ContextOverflowError when none (KEEP path preserved). The len//4 estimator is now calibrated by a per-task-type EMA of observed len(prompt)/tokens_in, clamped to divisor_bounds and scaled by safety_factor (new OllamaConfig.token_estimation). No tokenizer dependency. #2 Log billed calls on every error/retry path. Providers capture usage BEFORE parsing and raise ResponseParseError carrying metadata; resilient_call gained is_retryable + on_attempt_failure hooks + a shared is_transient_error classifier (retry transport/5xx/408/425/429/529; fail-fast 4xx/auth). The router logs one invocation_log row per billed attempt (interrupted=1 on failure) and caps parse-failure retries at one. No billed call goes unlogged. #3 Shadow through complete() + kill-switch (design B core). _run_shadow now recurses into complete(is_shadow=True) via a synthetic <task_type>::__shadow__ route, so shadow spend is budget-pre-checked, logged (is_shadow=1, real cost_usd), and breaker-guarded. Airtight recursion guard (a shadow never spawns its own shadow) + config kill-switch shadow.enabled (default false). Shadow resolves via _lookup_routing_entry (prefix-aware). Deleted the dead TaskTypeEntry.shadow field + the inert shadow key in task_types.yaml. Deferred (per design doc): the statistical weekly shadow stable-state auto-disable job — logged as ML-FABLE-P2 in followups.md, trigger = shadow.enabled: true in prod. Spec sync: spec_v3.md §4.2 + §4.4; docs/domain/model-layer.md. Reviewed the safety-critical diffs (shadow recursion guard, tripwire loud-fail, breaker keying). Tests: new test_router_phase2_hardening.py + updates; 90 phase-2/ resilience tests green, full model-layer suite green, ruff + mypy clean. The 5 pre-existing HuggingFace-403 embeddings test failures are environmental.

nfeuer merged commit c6b6b3c into main Jun 12, 2026
12 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Model Layer Phase-2: per-provider breaker, token tripwire, per-attempt logging, accounted shadow#118

Model Layer Phase-2: per-provider breaker, token tripwire, per-attempt logging, accounted shadow#118
nfeuer merged 1 commit into
mainfrom
claude/model-layer-phase-2

nfeuer commented Jun 12, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

nfeuer commented Jun 12, 2026

Summary

Findings → fixes

KEEP list (verified, not "fixed")

Deferred

Review + testing

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants