Skip to content

Model Layer Phase-2: per-provider breaker, token tripwire, per-attempt logging, accounted shadow#118

Merged
nfeuer merged 1 commit into
mainfrom
claude/model-layer-phase-2
Jun 12, 2026
Merged

Model Layer Phase-2: per-provider breaker, token tripwire, per-attempt logging, accounted shadow#118
nfeuer merged 1 commit into
mainfrom
claude/model-layer-phase-2

Conversation

@nfeuer

@nfeuer nfeuer commented Jun 12, 2026

Copy link
Copy Markdown
Owner

Summary

Model Layer Phase-2 hardening — the four router.py-heavy items deferred from the Fable Model Layer critique (Phase-1 landed in #114). Design doc: docs/superpowers/specs/2026-06-11-model-layer-fable-critique-design.md.

Findings → fixes

# Fix
#5 Per-provider circuit breaker. Replaced the single shared CircuitBreaker with a per-provider dict; resilient_call gets the breaker of the provider that actually runs the call (the fallback's provider after an overflow escalation). An Ollama outage opening Ollama's breaker no longer short-circuits Anthropic. Breaker-open now dispatches a fallback alert, not just a critical log.
#7 Token-truncation tripwire + self-calibrating divisor (design A). tokens_in >= num_ctx − output_reserve on an Ollama call ⇒ suspected silent truncation: log + alert + log the suspect-but-billed row (interrupted=1) + re-dispatch once to the fallback alias — or loud-fail with ContextOverflowError when none (KEEP path preserved). The len//4 estimator is now calibrated by a per-task-type EMA (clamped, safety-factored) via OllamaConfig.token_estimation. No tokenizer dependency.
#2 Log billed calls on every error/retry path. Providers capture usage before parsing and raise ResponseParseError carrying metadata; resilient_call gained is_retryable + on_attempt_failure hooks + a shared is_transient_error classifier (retry transport/5xx/408/425/429/529; fail-fast 4xx/auth). One invocation_log row per billed attempt (interrupted=1 on failure); parse failures retry at most once.
#3 Shadow through complete() (design B core). _run_shadow recurses into complete(is_shadow=True) via a synthetic <task_type>::__shadow__ route, so shadow spend is budget-pre-checked, logged (is_shadow=1, real cost_usd), and breaker-guarded. Airtight recursion guard (a shadow never spawns its own shadow) + shadow.enabled kill-switch (default false). Resolves via _lookup_routing_entry (prefix-aware). Deleted the dead TaskTypeEntry.shadow field + the inert shadow: key in task_types.yaml.

KEEP list (verified, not "fixed")

The loud-fail ContextOverflowError when no fallback (the fix closes the under-estimate hole, doesn't soften the error), config-driven longest-prefix routing, the fire-and-forget-with-strong-refs shadow task pattern, and Ollama's loud NotImplementedError for tools/messages.

Deferred

The statistical weekly shadow stable-state auto-disable job — ML-FABLE-P2 in followups.md, trigger = shadow.enabled: true in prod. The config kill-switch + accounted spend landed now.

Review + testing

The router-heavy implementation was reviewed for the three highest-risk properties: the shadow recursion guard (runaway-spend risk), the tripwire's loud-fail preservation, and the breaker keying to the post-escalation provider. New test_router_phase2_hardening.py + updates; 90 Phase-2/resilience tests green, full model-layer suite green, ruff + mypy clean. (The 5 pre-existing failures are the environmental HuggingFace-403 sentence-transformers download.)

Spec §4.2/§4.4 + docs/domain/model-layer.md synced. Branched off main.

https://claude.ai/code/session_01ChAGsS8vJGrz44ojrASLYs


Generated by Claude Code

…ken tripwire, per-attempt logging, accounted shadow

Implements the four router.py-heavy items deferred from the Model Layer Fable
critique (Phase-1 = #114, merged). Design doc:
docs/superpowers/specs/2026-06-11-model-layer-fable-critique-design.md.

#5 Per-provider circuit breaker. Replaced the single shared CircuitBreaker with
   a per-provider dict (lazy _breaker_for(provider)); resilient_call is handed
   the breaker of the provider that ACTUALLY runs the call (the fallback's
   provider after an overflow escalation). An Ollama outage opening Ollama's
   breaker no longer short-circuits Anthropic calls. Breaker-open now dispatches
   a fallback alert, not just a critical log.

#7 Token-truncation tripwire + self-calibrating divisor (design A). After each
   Ollama call, tokens_in >= num_ctx - output_reserve is treated as suspected
   SILENT truncation (the failure the loud ContextOverflowError exists to
   prevent): logs truncation_suspected, alerts, logs the suspect-but-billed
   local row (interrupted=1), and re-dispatches once to the fallback alias —
   or loud-fails with ContextOverflowError when none (KEEP path preserved).
   The len//4 estimator is now calibrated by a per-task-type EMA of observed
   len(prompt)/tokens_in, clamped to divisor_bounds and scaled by safety_factor
   (new OllamaConfig.token_estimation). No tokenizer dependency.

#2 Log billed calls on every error/retry path. Providers capture usage BEFORE
   parsing and raise ResponseParseError carrying metadata; resilient_call gained
   is_retryable + on_attempt_failure hooks + a shared is_transient_error
   classifier (retry transport/5xx/408/425/429/529; fail-fast 4xx/auth). The
   router logs one invocation_log row per billed attempt (interrupted=1 on
   failure) and caps parse-failure retries at one. No billed call goes unlogged.

#3 Shadow through complete() + kill-switch (design B core). _run_shadow now
   recurses into complete(is_shadow=True) via a synthetic <task_type>::__shadow__
   route, so shadow spend is budget-pre-checked, logged (is_shadow=1, real
   cost_usd), and breaker-guarded. Airtight recursion guard (a shadow never
   spawns its own shadow) + config kill-switch shadow.enabled (default false).
   Shadow resolves via _lookup_routing_entry (prefix-aware). Deleted the dead
   TaskTypeEntry.shadow field + the inert shadow key in task_types.yaml.

Deferred (per design doc): the statistical weekly shadow stable-state
auto-disable job — logged as ML-FABLE-P2 in followups.md, trigger =
shadow.enabled: true in prod.

Spec sync: spec_v3.md §4.2 + §4.4; docs/domain/model-layer.md. Reviewed the
safety-critical diffs (shadow recursion guard, tripwire loud-fail, breaker
keying). Tests: new test_router_phase2_hardening.py + updates; 90 phase-2/
resilience tests green, full model-layer suite green, ruff + mypy clean.
The 5 pre-existing HuggingFace-403 embeddings test failures are environmental.
@nfeuer nfeuer merged commit c6b6b3c into main Jun 12, 2026
12 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants