Provider-aware endpoint error handling in inference_provider model server

Follow-up from PR #1286 review (ananthsub).

The `inference_provider` model server currently issues chat-completion requests directly:

```python
async with self._semaphore:
    chat_completion_dict = await self._client.create_chat_completion(**body_dict)
```

As we add support for more hosted providers (Fireworks, Together.ai, Baseten, DeepInfra, Nebius, Friendli, OpenRouter, HF Inference, Gemini, ...), their OpenAI-compatible endpoints differ in how they surface errors (rate limits, auth failures, model-not-found, transient 5xx, etc.). We may want provider-aware / more granular error handling here rather than letting raw errors propagate uniformly.

Scope:
- Survey error response shapes/status codes across supported providers
- Decide on a normalization / retry / surfacing strategy
- Apply consistently in `responses_api_models/inference_provider/app.py`

Ref: PR #1286 review comment.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Provider-aware endpoint error handling in inference_provider model server #1748

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Uh oh!

Provider-aware endpoint error handling in inference_provider model server #1748

Description

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions