Skip to content

fix: surface upstream provider errors in failure path#81

Merged
Marius1311 merged 1 commit into
mainfrom
claude/objective-ramanujan-0a2de9
May 14, 2026
Merged

fix: surface upstream provider errors in failure path#81
Marius1311 merged 1 commit into
mainfrom
claude/objective-ramanujan-0a2de9

Conversation

@Marius1311
Copy link
Copy Markdown
Member

Summary

PR #80 (a no-op pre-commit autoupdate) is currently red because 5 real-LLM Anthropic tests fail on main. Same 5 failures exist on main since #71 — they're independent of #80. The visible failure (pydantic ValidationError: SimpleOutput.text Field required) is a masked error: the test's SimpleOutput declares text: str with no default, and when the Anthropic call raises an AnthropicError, BaseOutput.default_failure re-validates and produces a ValidationError about the failure object itself. That hides the real Anthropic API error from CI logs, so we can't tell why Anthropic fails.

This PR doesn't disable any tests. It makes the underlying error visible across all providers (per project preference for generic, modular fixes over per-provider patches), so a follow-up can fix the root cause.

  • BaseOutput.default_failure → uses cls.model_construct(...) instead of full Pydantic validation. The failure object can never again mask the upstream cause. Production schemas all declare defaults, so behaviour is unchanged.
  • New LLMProvider._fail(response_format, reason, *, exc=None) helper centralises logging across OpenAIProvider, GeminiProvider, AnthropicProvider (OpenRouterProvider inherits OpenAI's path). Uses logger.error(..., exc_info=True) when an exception is passed, logger.warning(...) otherwise — replaces the existing inconsistent mix (OpenAI logged str(e) with no traceback; Gemini/Anthropic logged nothing).
  • All 11 default_failure call sites in the four providers now route through self._fail.
  • Anthropic's "no tool use found" fallback also captures response.stop_reason and content_block_types so logs can distinguish "Claude refused" / "plain text returned" / "empty tool input".

Test plan

  • uv run pytest -m "not real_llm_query" — 154 passed locally
  • CI green on all hatch matrix environments
  • Anthropic real-LLM tests still fail (expected — this PR exposes the root cause, doesn't fix it), and the failure logs now contain a provider-tagged ERROR [AnthropicProvider] ... line with full traceback
  • Read those logs and open a follow-up to fix the underlying Anthropic issue (likely candidates: bump default model, dated model alias, tool-schema sanitisation, or retry/backoff)

🤖 Generated with Claude Code

`BaseOutput.default_failure` used full Pydantic validation, so a response
schema with a required-without-default field (e.g. test fixture
`SimpleOutput.text: str`) would raise a `ValidationError` from inside the
failure path itself — masking the original Anthropic/OpenAI/Gemini error
with a confusing pydantic message. Switch to `model_construct` so the
failure object cannot mask the upstream cause; production schemas all
declare defaults, so behaviour is unchanged.

Add a generic `LLMProvider._fail(response_format, reason, *, exc=None)`
helper and route every provider's error-boundary `default_failure` call
through it. Logs via `logger.error(..., exc_info=True)` when an exception
is passed, `logger.warning(...)` otherwise — putting consistent,
provider-tagged, traceback-bearing diagnostics into CI logs across
OpenAI, Gemini, and Anthropic (OpenRouter inherits OpenAI's path).

Anthropic's "no tool use found" fallback now also captures
`response.stop_reason` and `content_block_types` in the failure reason
so we can tell apart "Claude refused", "Claude returned plain text", and
"empty tool input" — the three plausible causes behind the currently
failing real-LLM Anthropic tests on main.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
@codecov
Copy link
Copy Markdown

codecov Bot commented May 14, 2026

Codecov Report

❌ Patch coverage is 66.66667% with 7 lines in your changes missing coverage. Please review.
✅ Project coverage is 80.75%. Comparing base (514e1ab) to head (0d613cb).
⚠️ Report is 1 commits behind head on main.

Files with missing lines Patch % Lines
src/cell_annotator/model/_providers.py 65.00% 7 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main      #81      +/-   ##
==========================================
- Coverage   80.84%   80.75%   -0.09%     
==========================================
  Files          15       15              
  Lines        1237     1372     +135     
==========================================
+ Hits         1000     1108     +108     
- Misses        237      264      +27     
Files with missing lines Coverage Δ
src/cell_annotator/_response_formats.py 100.00% <100.00%> (ø)
src/cell_annotator/model/_providers.py 75.95% <65.00%> (+2.32%) ⬆️

... and 3 files with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@Marius1311 Marius1311 merged commit 6d81dbc into main May 14, 2026
10 of 14 checks passed
@Marius1311 Marius1311 deleted the claude/objective-ramanujan-0a2de9 branch May 14, 2026 15:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant