You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
When configuring LLMs in the admin app, instructors should pick a provider, enter the API key, choose a primary model, and optionally specify a fallback model that takes over on rate-limit (429) or upstream errors. Surfaced by a real incident where google/gemma-4-31b-it:free hit upstream rate limits with no graceful degradation — the worker returned 200 but streamed zero events.
Requirements
Admin UI: provider dropdown, API key input (masked, write-only), primary model picker (live catalog), optional fallback model picker
Live model catalog fetch proxied via the worker for each provider (OpenRouter, OpenAI, Anthropic)
Test-connection button that validates the key + model availability before save
Schema delta: add fallback_llm_config_id self-FK on llm_configs (nullable, ON DELETE SET NULL)
Runtime: chat handler catches AI_RetryError / AI_APICallError with status 429 or 5xx, retries via the resolved fallback config
Cycle / chain-depth protection (e.g. cap at 3 hops)
Provider-agnostic — fallback should work for Anthropic, OpenAI, and any future provider, not just OpenRouter
Context
Surfaced 2026-06-03 during chat smoke-testing on cdcore/chore/research. google/gemma-4-31b-it:free hit 429 upstream from Google AI Studio. The AI SDK retried 3 times (reason: 'maxRetriesExceeded') and closed the stream silently.
The existing schema in apps/web/src/db/schema/content.ts already covers provider, model, and credential pointer on LLMConfig. Fallback is the one new piece.
Self-FK pattern matches existing PromptTemplate.previous_version_id — Drizzle migration is a single column.
Runtime change in apps/web/src/server/routes/chat.ts is ~20 lines once the schema lands: wrap streamText in a try/catch keyed on retryable errors, look up the fallback config, retry once.
Admin UI is larger — needs a /api/admin/llm-providers/<provider>/models endpoint per supported provider so the dropdown shows live availability, ideally with a short cache TTL.
Open Questions
Does fallback resolution walk the existing inheritance chain (Homework → Course → Organization), or is it pinned to whichever LLMConfig actually fired?
When fallback fires, does the user see anything (muted toast: "switched to backup model") or is it silent?
Test-connection in admin, or rely on first real chat call to surface bad config?
Summary
When configuring LLMs in the admin app, instructors should pick a provider, enter the API key, choose a primary model, and optionally specify a fallback model that takes over on rate-limit (429) or upstream errors. Surfaced by a real incident where
google/gemma-4-31b-it:freehit upstream rate limits with no graceful degradation — the worker returned 200 but streamed zero events.Requirements
fallback_llm_config_idself-FK onllm_configs(nullable,ON DELETE SET NULL)AI_RetryError/AI_APICallErrorwith status429or5xx, retries via the resolved fallback configContext
cdcore/chore/research.google/gemma-4-31b-it:freehit 429 upstream from Google AI Studio. The AI SDK retried 3 times (reason: 'maxRetriesExceeded') and closed the stream silently.apps/web/src/db/schema/content.tsalready covers provider, model, and credential pointer onLLMConfig. Fallback is the one new piece.docs/architecture/multi-tenant-data-model.md§6.2 (LLMConfig + OrganizationCredential).Implementation Notes
PromptTemplate.previous_version_id— Drizzle migration is a single column.apps/web/src/server/routes/chat.tsis ~20 lines once the schema lands: wrapstreamTextin a try/catch keyed on retryable errors, look up the fallback config, retry once./api/admin/llm-providers/<provider>/modelsendpoint per supported provider so the dropdown shows live availability, ideally with a short cache TTL.Open Questions
Homework→Course→Organization), or is it pinned to whicheverLLMConfigactually fired?