Model Layer Phase-2: per-provider breaker, token tripwire, per-attempt logging, accounted shadow#118
Merged
Merged
Conversation
…ken tripwire, per-attempt logging, accounted shadow Implements the four router.py-heavy items deferred from the Model Layer Fable critique (Phase-1 = #114, merged). Design doc: docs/superpowers/specs/2026-06-11-model-layer-fable-critique-design.md. #5 Per-provider circuit breaker. Replaced the single shared CircuitBreaker with a per-provider dict (lazy _breaker_for(provider)); resilient_call is handed the breaker of the provider that ACTUALLY runs the call (the fallback's provider after an overflow escalation). An Ollama outage opening Ollama's breaker no longer short-circuits Anthropic calls. Breaker-open now dispatches a fallback alert, not just a critical log. #7 Token-truncation tripwire + self-calibrating divisor (design A). After each Ollama call, tokens_in >= num_ctx - output_reserve is treated as suspected SILENT truncation (the failure the loud ContextOverflowError exists to prevent): logs truncation_suspected, alerts, logs the suspect-but-billed local row (interrupted=1), and re-dispatches once to the fallback alias — or loud-fails with ContextOverflowError when none (KEEP path preserved). The len//4 estimator is now calibrated by a per-task-type EMA of observed len(prompt)/tokens_in, clamped to divisor_bounds and scaled by safety_factor (new OllamaConfig.token_estimation). No tokenizer dependency. #2 Log billed calls on every error/retry path. Providers capture usage BEFORE parsing and raise ResponseParseError carrying metadata; resilient_call gained is_retryable + on_attempt_failure hooks + a shared is_transient_error classifier (retry transport/5xx/408/425/429/529; fail-fast 4xx/auth). The router logs one invocation_log row per billed attempt (interrupted=1 on failure) and caps parse-failure retries at one. No billed call goes unlogged. #3 Shadow through complete() + kill-switch (design B core). _run_shadow now recurses into complete(is_shadow=True) via a synthetic <task_type>::__shadow__ route, so shadow spend is budget-pre-checked, logged (is_shadow=1, real cost_usd), and breaker-guarded. Airtight recursion guard (a shadow never spawns its own shadow) + config kill-switch shadow.enabled (default false). Shadow resolves via _lookup_routing_entry (prefix-aware). Deleted the dead TaskTypeEntry.shadow field + the inert shadow key in task_types.yaml. Deferred (per design doc): the statistical weekly shadow stable-state auto-disable job — logged as ML-FABLE-P2 in followups.md, trigger = shadow.enabled: true in prod. Spec sync: spec_v3.md §4.2 + §4.4; docs/domain/model-layer.md. Reviewed the safety-critical diffs (shadow recursion guard, tripwire loud-fail, breaker keying). Tests: new test_router_phase2_hardening.py + updates; 90 phase-2/ resilience tests green, full model-layer suite green, ruff + mypy clean. The 5 pre-existing HuggingFace-403 embeddings test failures are environmental.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Model Layer Phase-2 hardening — the four
router.py-heavy items deferred from the Fable Model Layer critique (Phase-1 landed in #114). Design doc:docs/superpowers/specs/2026-06-11-model-layer-fable-critique-design.md.Findings → fixes
CircuitBreakerwith a per-provider dict;resilient_callgets the breaker of the provider that actually runs the call (the fallback's provider after an overflow escalation). An Ollama outage opening Ollama's breaker no longer short-circuits Anthropic. Breaker-open now dispatches a fallback alert, not just a critical log.tokens_in >= num_ctx − output_reserveon an Ollama call ⇒ suspected silent truncation: log + alert + log the suspect-but-billed row (interrupted=1) + re-dispatch once to the fallback alias — or loud-fail withContextOverflowErrorwhen none (KEEP path preserved). Thelen//4estimator is now calibrated by a per-task-type EMA (clamped, safety-factored) viaOllamaConfig.token_estimation. No tokenizer dependency.usagebefore parsing and raiseResponseParseErrorcarrying metadata;resilient_callgainedis_retryable+on_attempt_failurehooks + a sharedis_transient_errorclassifier (retry transport/5xx/408/425/429/529; fail-fast 4xx/auth). Oneinvocation_logrow per billed attempt (interrupted=1on failure); parse failures retry at most once.complete()(design B core)._run_shadowrecurses intocomplete(is_shadow=True)via a synthetic<task_type>::__shadow__route, so shadow spend is budget-pre-checked, logged (is_shadow=1, realcost_usd), and breaker-guarded. Airtight recursion guard (a shadow never spawns its own shadow) +shadow.enabledkill-switch (default false). Resolves via_lookup_routing_entry(prefix-aware). Deleted the deadTaskTypeEntry.shadowfield + the inertshadow:key intask_types.yaml.KEEP list (verified, not "fixed")
The loud-fail
ContextOverflowErrorwhen no fallback (the fix closes the under-estimate hole, doesn't soften the error), config-driven longest-prefix routing, the fire-and-forget-with-strong-refs shadow task pattern, and Ollama's loudNotImplementedErrorfor tools/messages.Deferred
The statistical weekly shadow stable-state auto-disable job —
ML-FABLE-P2infollowups.md, trigger =shadow.enabled: truein prod. The config kill-switch + accounted spend landed now.Review + testing
The router-heavy implementation was reviewed for the three highest-risk properties: the shadow recursion guard (runaway-spend risk), the tripwire's loud-fail preservation, and the breaker keying to the post-escalation provider. New
test_router_phase2_hardening.py+ updates; 90 Phase-2/resilience tests green, full model-layer suite green,ruff+mypyclean. (The 5 pre-existing failures are the environmental HuggingFace-403sentence-transformersdownload.)Spec §4.2/§4.4 +
docs/domain/model-layer.mdsynced. Branched offmain.https://claude.ai/code/session_01ChAGsS8vJGrz44ojrASLYs
Generated by Claude Code