fix: surface upstream provider errors in failure path by Marius1311 · Pull Request #81 · quadbio/cell-annotator

Marius1311 · 2026-05-14T15:08:44Z

Summary

PR #80 (a no-op pre-commit autoupdate) is currently red because 5 real-LLM Anthropic tests fail on main. Same 5 failures exist on main since #71 — they're independent of #80. The visible failure (pydantic ValidationError: SimpleOutput.text Field required) is a masked error: the test's SimpleOutput declares text: str with no default, and when the Anthropic call raises an AnthropicError, BaseOutput.default_failure re-validates and produces a ValidationError about the failure object itself. That hides the real Anthropic API error from CI logs, so we can't tell why Anthropic fails.

This PR doesn't disable any tests. It makes the underlying error visible across all providers (per project preference for generic, modular fixes over per-provider patches), so a follow-up can fix the root cause.

BaseOutput.default_failure → uses cls.model_construct(...) instead of full Pydantic validation. The failure object can never again mask the upstream cause. Production schemas all declare defaults, so behaviour is unchanged.
New LLMProvider._fail(response_format, reason, *, exc=None) helper centralises logging across OpenAIProvider, GeminiProvider, AnthropicProvider (OpenRouterProvider inherits OpenAI's path). Uses logger.error(..., exc_info=True) when an exception is passed, logger.warning(...) otherwise — replaces the existing inconsistent mix (OpenAI logged str(e) with no traceback; Gemini/Anthropic logged nothing).
All 11 default_failure call sites in the four providers now route through self._fail.
Anthropic's "no tool use found" fallback also captures response.stop_reason and content_block_types so logs can distinguish "Claude refused" / "plain text returned" / "empty tool input".

Test plan

uv run pytest -m "not real_llm_query" — 154 passed locally
CI green on all hatch matrix environments
Anthropic real-LLM tests still fail (expected — this PR exposes the root cause, doesn't fix it), and the failure logs now contain a provider-tagged ERROR [AnthropicProvider] ... line with full traceback
Read those logs and open a follow-up to fix the underlying Anthropic issue (likely candidates: bump default model, dated model alias, tool-schema sanitisation, or retry/backoff)

🤖 Generated with Claude Code

`BaseOutput.default_failure` used full Pydantic validation, so a response schema with a required-without-default field (e.g. test fixture `SimpleOutput.text: str`) would raise a `ValidationError` from inside the failure path itself — masking the original Anthropic/OpenAI/Gemini error with a confusing pydantic message. Switch to `model_construct` so the failure object cannot mask the upstream cause; production schemas all declare defaults, so behaviour is unchanged. Add a generic `LLMProvider._fail(response_format, reason, *, exc=None)` helper and route every provider's error-boundary `default_failure` call through it. Logs via `logger.error(..., exc_info=True)` when an exception is passed, `logger.warning(...)` otherwise — putting consistent, provider-tagged, traceback-bearing diagnostics into CI logs across OpenAI, Gemini, and Anthropic (OpenRouter inherits OpenAI's path). Anthropic's "no tool use found" fallback now also captures `response.stop_reason` and `content_block_types` in the failure reason so we can tell apart "Claude refused", "Claude returned plain text", and "empty tool input" — the three plausible causes behind the currently failing real-LLM Anthropic tests on main. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

codecov · 2026-05-14T15:47:59Z

Codecov Report

❌ Patch coverage is 66.66667% with 7 lines in your changes missing coverage. Please review.
✅ Project coverage is 80.75%. Comparing base (514e1ab) to head (0d613cb).
⚠️ Report is 1 commits behind head on main.

Files with missing lines	Patch %	Lines
src/cell_annotator/model/_providers.py	65.00%	7 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             main      #81      +/-   ##
==========================================
- Coverage   80.84%   80.75%   -0.09%     
==========================================
  Files          15       15              
  Lines        1237     1372     +135     
==========================================
+ Hits         1000     1108     +108     
- Misses        237      264      +27

Files with missing lines	Coverage Δ
src/cell_annotator/_response_formats.py	`100.00% <100.00%> (ø)`
src/cell_annotator/model/_providers.py	`75.95% <65.00%> (+2.32%)`	⬆️

... and 3 files with indirect coverage changes

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Marius1311 merged commit 6d81dbc into main May 14, 2026
10 of 14 checks passed

Marius1311 deleted the claude/objective-ramanujan-0a2de9 branch May 14, 2026 15:48

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: surface upstream provider errors in failure path#81

fix: surface upstream provider errors in failure path#81
Marius1311 merged 1 commit into
mainfrom
claude/objective-ramanujan-0a2de9

Marius1311 commented May 14, 2026

Uh oh!

codecov Bot commented May 14, 2026 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

Marius1311 commented May 14, 2026

Summary

Test plan

Uh oh!

codecov Bot commented May 14, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

codecov Bot commented May 14, 2026 •

edited

Loading