fix: surface upstream provider errors in failure path#81
Merged
Conversation
`BaseOutput.default_failure` used full Pydantic validation, so a response schema with a required-without-default field (e.g. test fixture `SimpleOutput.text: str`) would raise a `ValidationError` from inside the failure path itself — masking the original Anthropic/OpenAI/Gemini error with a confusing pydantic message. Switch to `model_construct` so the failure object cannot mask the upstream cause; production schemas all declare defaults, so behaviour is unchanged. Add a generic `LLMProvider._fail(response_format, reason, *, exc=None)` helper and route every provider's error-boundary `default_failure` call through it. Logs via `logger.error(..., exc_info=True)` when an exception is passed, `logger.warning(...)` otherwise — putting consistent, provider-tagged, traceback-bearing diagnostics into CI logs across OpenAI, Gemini, and Anthropic (OpenRouter inherits OpenAI's path). Anthropic's "no tool use found" fallback now also captures `response.stop_reason` and `content_block_types` in the failure reason so we can tell apart "Claude refused", "Claude returned plain text", and "empty tool input" — the three plausible causes behind the currently failing real-LLM Anthropic tests on main. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #81 +/- ##
==========================================
- Coverage 80.84% 80.75% -0.09%
==========================================
Files 15 15
Lines 1237 1372 +135
==========================================
+ Hits 1000 1108 +108
- Misses 237 264 +27
🚀 New features to boost your workflow:
|
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
PR #80 (a no-op
pre-commit autoupdate) is currently red because 5 real-LLM Anthropic tests fail onmain. Same 5 failures exist onmainsince #71 — they're independent of #80. The visible failure (pydantic ValidationError: SimpleOutput.text Field required) is a masked error: the test'sSimpleOutputdeclarestext: strwith no default, and when the Anthropic call raises anAnthropicError,BaseOutput.default_failurere-validates and produces aValidationErrorabout the failure object itself. That hides the real Anthropic API error from CI logs, so we can't tell why Anthropic fails.This PR doesn't disable any tests. It makes the underlying error visible across all providers (per project preference for generic, modular fixes over per-provider patches), so a follow-up can fix the root cause.
BaseOutput.default_failure→ usescls.model_construct(...)instead of full Pydantic validation. The failure object can never again mask the upstream cause. Production schemas all declare defaults, so behaviour is unchanged.LLMProvider._fail(response_format, reason, *, exc=None)helper centralises logging acrossOpenAIProvider,GeminiProvider,AnthropicProvider(OpenRouterProviderinherits OpenAI's path). Useslogger.error(..., exc_info=True)when an exception is passed,logger.warning(...)otherwise — replaces the existing inconsistent mix (OpenAI loggedstr(e)with no traceback; Gemini/Anthropic logged nothing).default_failurecall sites in the four providers now route throughself._fail.response.stop_reasonandcontent_block_typesso logs can distinguish "Claude refused" / "plain text returned" / "empty tool input".Test plan
uv run pytest -m "not real_llm_query"— 154 passed locallyERROR [AnthropicProvider] ...line with full traceback🤖 Generated with Claude Code