feat: add LiteLLM as AI gateway provider#193
Conversation
📝 WalkthroughWalkthroughAdds a new LiteLLM provider: registers enum entries, implements a LiteLLM adapter and config loader, updates default LLM selection to prefer LiteLLM when configured, adds the ChangesLiteLLM provider
Sequence DiagramsequenceDiagram
participant App as Application
participant Init as get_default_llm()
participant LLM as LiteLLM
participant Lib as litellm
App->>Init: request default LLM
Init->>Init: read DEFAULT_LLM env
alt DEFAULT_LLM == "litellm"
Init-->>App: return LiteLLM instance
end
App->>LLM: chat_completions(messages, tools, params)
LLM->>LLM: load LITELLM_* env config
LLM->>LLM: format messages and tools (OpenAI-style)
LLM->>Lib: litellm.completion(model, messages, tools, params...)
Lib-->>LLM: completion response (choices, usage, tool_calls)
LLM->>LLM: parse choices, tool_calls, token usage
LLM-->>App: LLMResponse (content, tool_calls, usage, status)
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 2
🧹 Nitpick comments (3)
backend/director/llm/litellm.py (2)
91-93: Avoid mutable default argument fortools.
tools: list = []is shared across calls and is a well-known Python footgun (Ruff B006). Even though this code never mutates it, the lint rule is correct and the idiom is easy to fix.♻️ Suggested fix
- def chat_completions( - self, messages: list, tools: list = [], stop=None, response_format=None - ): + def chat_completions( + self, messages: list, tools: list = None, stop=None, response_format=None + ): """Get chat completions via LiteLLM. Routes to 100+ providers (OpenAI, Anthropic, Azure, Bedrock, etc.) based on the model string in LITELLM_CHAT_MODEL. """ import litellm + tools = tools or [] + params = {🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@backend/director/llm/litellm.py` around lines 91 - 93, The function chat_completions currently uses a mutable default argument tools: list = [], which can be shared across calls; change the signature to tools: Optional[list] = None (or similar) and inside chat_completions set tools = [] if tools is None to ensure a fresh list per call; update any type hints/imports if needed and keep behavior identical otherwise (refer to the chat_completions parameter tools and its use within the method).
121-125: Use logging and preserve the traceback instead of
print(f"Error: {e}")discards the traceback and writes to stdout, which is hard to filter in production logs. The rest of the project likely uses the standardloggingmodule — please log withlogger.exception(...)so operators can diagnose upstream provider failures (rate limits, auth, network) without losing the stack trace.♻️ Suggested fix
+import logging import json ... +logger = logging.getLogger(__name__) ... try: response = litellm.completion(**params) except Exception as e: - print(f"Error: {e}") + logger.exception("LiteLLM completion failed") return LLMResponse(content=f"Error: {e}")🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@backend/director/llm/litellm.py` around lines 121 - 125, Replace the print-based error handling around litellm.completion(**params) with proper logging that preserves the traceback: import/get the module logger (e.g., logging.getLogger(__name__) or reuse the existing logger) and call logger.exception(...) inside the except block so the stack trace is recorded, then return the LLMResponse as before (optionally with a sanitized error message) — update the except handling that currently catches Exception as e around the litellm.completion call and reference LLMResponse to keep the existing return behavior.backend/requirements.txt (1)
7-7: Consider pinninglitellmto match the rest of the file's style.All other dependencies in this file use exact
==pins (e.g.anthropic==0.45.2,openai==1.55.3), butlitellmuses a range>=1.60.0,<2.0.0. LiteLLM ships frequent releases with occasional behavioral changes within the 1.x line, so a range pin can produce non-reproducible builds and surprise behavior changes between deploys. Consider pinning to a known-good version for consistency.📦 Suggested change
-litellm>=1.60.0,<2.0.0 +litellm==1.60.0🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@backend/requirements.txt` at line 7, The litellm dependency in requirements.txt uses a version range which deviates from the file's exact-pinning style; change the entry for "litellm" to an exact pinned version (e.g. replace "litellm>=1.60.0,<2.0.0" with a specific known-good release like "litellm==1.60.0" or whatever tested version you validated) so it matches the other exact pins and ensures reproducible builds.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@backend/director/llm/litellm.py`:
- Around line 101-109: The params dict in the LiteLLM call is dropping
configurable sampling params (e.g., top_p) so settings from BaseLLM.__init__
(self.top_p) are silently ignored; update the params construction in the method
that builds the request (the params dict containing "model", "messages",
"temperature", etc.) to include self.top_p (and any other exposed tunables) when
not None, while keeping "drop_params": True only for providers that don't
support them — i.e., forward self.top_p into params (and similarly handle other
tunables) so providers that accept top_p receive it and unsupported providers
can still rely on drop_params.
- Around line 127-146: Parsing LLM responses can raise when response.usage is
None or when tool_call.function.arguments is empty/None; update the code around
the LLMResponse construction (the block that reads response.choices[0],
response.usage and builds tool_calls) to defensively handle these cases: treat
missing usage by defaulting send_tokens/recv_tokens/total_tokens to 0 (or None
if preferred) instead of accessing response.usage.prompt_tokens, and parse
tool_call.function.arguments only when it is a non-empty string (fallback to {}
or [] when it's "" or None); either move this parsing into the try/except that
wraps litellm.completion() or add a local try/except around the tool_calls/usage
extraction so JSONDecodeError/AttributeError are caught and converted to the LLM
error response path, referencing the LLMResponse construction,
response.choices[0].message.tool_calls, and tool_call.function.arguments
symbols.
---
Nitpick comments:
In `@backend/director/llm/litellm.py`:
- Around line 91-93: The function chat_completions currently uses a mutable
default argument tools: list = [], which can be shared across calls; change the
signature to tools: Optional[list] = None (or similar) and inside
chat_completions set tools = [] if tools is None to ensure a fresh list per
call; update any type hints/imports if needed and keep behavior identical
otherwise (refer to the chat_completions parameter tools and its use within the
method).
- Around line 121-125: Replace the print-based error handling around
litellm.completion(**params) with proper logging that preserves the traceback:
import/get the module logger (e.g., logging.getLogger(__name__) or reuse the
existing logger) and call logger.exception(...) inside the except block so the
stack trace is recorded, then return the LLMResponse as before (optionally with
a sanitized error message) — update the except handling that currently catches
Exception as e around the litellm.completion call and reference LLMResponse to
keep the existing return behavior.
In `@backend/requirements.txt`:
- Line 7: The litellm dependency in requirements.txt uses a version range which
deviates from the file's exact-pinning style; change the entry for "litellm" to
an exact pinned version (e.g. replace "litellm>=1.60.0,<2.0.0" with a specific
known-good release like "litellm==1.60.0" or whatever tested version you
validated) so it matches the other exact pins and ensures reproducible builds.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: a4ffa4b2-65f7-470b-a6cd-ac9f53eff459
📒 Files selected for processing (6)
backend/director/constants.pybackend/director/llm/__init__.pybackend/director/llm/litellm.pybackend/requirements.txtbackend/tests/__init__.pybackend/tests/test_litellm.py
There was a problem hiding this comment.
🧹 Nitpick comments (3)
backend/tests/test_litellm.py (3)
376-388: Test may be fragile due toclear=Falseon env patching.
test_litellm_not_default_when_openai_key_setrelies onDEFAULT_LLM=""but keeps the rest of the environment intact. If the test runner hasDEFAULT_LLM=litellm(or any value) set outside,mock.patch.dictoverrides it — that's fine — but other keys likeANTHROPIC_API_KEY/GOOGLEAI_API_KEYfrom the host env could still influenceget_default_llm()in future refactors. The current assertion (not isinstance(llm, LiteLLM)) is tolerant, so this passes today, but considerclear=Truewith an explicit minimal env for determinism across CI machines.🔧 Suggested hardening
- with mock.patch.dict( - "os.environ", - {"OPENAI_API_KEY": "sk-test", "DEFAULT_LLM": ""}, - clear=False, - ): + with mock.patch.dict( + "os.environ", + {"OPENAI_API_KEY": "sk-test", "DEFAULT_LLM": ""}, + clear=True, + ):Same consideration applies to
test_get_default_llm_returns_litellm(Line 370) andtest_config_reads_from_env(Line 509), though those are less sensitive sinceDEFAULT_LLM=litellmshort-circuits selection and the config test asserts only the overridden values.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@backend/tests/test_litellm.py` around lines 376 - 388, The test test_litellm_not_default_when_openai_key_set is fragile because mock.patch.dict uses clear=False; update the test to call mock.patch.dict with clear=True and provide an explicit minimal environment dict (e.g., {"OPENAI_API_KEY":"sk-test","DEFAULT_LLM":""}) so external keys like ANTHROPIC_API_KEY/GOOGLEAI_API_KEY won’t affect director.llm.get_default_llm; apply the same change to test_get_default_llm_returns_litellm and test_config_reads_from_env where env isolation is required.
23-27: Minor:idshadows a Python builtin.Ruff flags
A002here. Consider renaming toid_orcall_idin this test helper to avoid shadowing. Low priority since it's isolated to a test fixture class.🔧 Proposed rename
class _ToolCall: - def __init__(self, id, name, arguments): - self.id = id + def __init__(self, call_id, name, arguments): + self.id = call_id self.function = _FnCall(name, json.dumps(arguments)) self.type = "function"Update call sites (
_ToolCall("tc1", ...),_ToolCall("tc2", ...)) accordingly — they use positional args so no changes needed.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@backend/tests/test_litellm.py` around lines 23 - 27, Rename the constructor parameter and attribute in class _ToolCall to avoid shadowing the builtin: change __init__(self, id, name, arguments) and self.id to use a non-shadowing name like id_ or call_id (e.g., __init__(self, id_, name, arguments) and self.id_ = id_); update any internal references and all test call sites that construct _ToolCall with positional args (e.g., _ToolCall("tc1", ...), _ToolCall("tc2", ...)) so they continue to work with the new parameter name.
322-340: Consider fixing the type inconsistency in LLMResponse.statusThe
LLMResponsedataclass defaults align correctly with the test expectations (status=LLMResponseStatus.ERRORand token counts=0), so the tests will pass. However,statusis typed asintbut defaults toLLMResponseStatus.ERROR, which evaluates toFalse(a boolean). Either change the type annotation tostatus: boolor redesignLLMResponseStatusas a properEnumfor type safety. Additionally, the error branch inchat_completionsshould explicitly setstatus=LLMResponseStatus.ERRORfor clarity rather than relying on defaults.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@backend/tests/test_litellm.py` around lines 322 - 340, LLMResponse.status is currently typed as int while defaulting to LLMResponseStatus.ERROR (a bool-like value) which causes a type mismatch; update the LLMResponse dataclass to use a proper enum type (e.g., status: LLMResponseStatus) or change LLMResponseStatus into a real Enum, and then in LiteLLM.chat_completions ensure the error branch explicitly sets status=LLMResponseStatus.ERROR (and set send_tokens/recv_tokens/total_tokens=0) instead of relying on defaults to guarantee type safety and clear intent.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Nitpick comments:
In `@backend/tests/test_litellm.py`:
- Around line 376-388: The test test_litellm_not_default_when_openai_key_set is
fragile because mock.patch.dict uses clear=False; update the test to call
mock.patch.dict with clear=True and provide an explicit minimal environment dict
(e.g., {"OPENAI_API_KEY":"sk-test","DEFAULT_LLM":""}) so external keys like
ANTHROPIC_API_KEY/GOOGLEAI_API_KEY won’t affect director.llm.get_default_llm;
apply the same change to test_get_default_llm_returns_litellm and
test_config_reads_from_env where env isolation is required.
- Around line 23-27: Rename the constructor parameter and attribute in class
_ToolCall to avoid shadowing the builtin: change __init__(self, id, name,
arguments) and self.id to use a non-shadowing name like id_ or call_id (e.g.,
__init__(self, id_, name, arguments) and self.id_ = id_); update any internal
references and all test call sites that construct _ToolCall with positional args
(e.g., _ToolCall("tc1", ...), _ToolCall("tc2", ...)) so they continue to work
with the new parameter name.
- Around line 322-340: LLMResponse.status is currently typed as int while
defaulting to LLMResponseStatus.ERROR (a bool-like value) which causes a type
mismatch; update the LLMResponse dataclass to use a proper enum type (e.g.,
status: LLMResponseStatus) or change LLMResponseStatus into a real Enum, and
then in LiteLLM.chat_completions ensure the error branch explicitly sets
status=LLMResponseStatus.ERROR (and set send_tokens/recv_tokens/total_tokens=0)
instead of relying on defaults to guarantee type safety and clear intent.
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 7ed9157f-eb11-4399-bcc5-46e65757fcef
📒 Files selected for processing (1)
backend/tests/test_litellm.py
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (2)
backend/director/llm/litellm.py (2)
154-156: Use the module logger instead of
print(f"Error: {e}")bypasses the application's logging configuration (levels, handlers, structured output). Other providers indirector/llm/likely use a logger; align with that pattern so failures are observable in production. Optionally includeexc_info=Trueto capture the traceback for debugging multi-provider routing failures.♻️ Suggested fix
+import logging import json from pydantic import Field from pydantic_settings import SettingsConfigDict from director.llm.base import BaseLLM, BaseLLMConfig, LLMResponse, LLMResponseStatus from director.constants import LLMType, EnvPrefix + +logger = logging.getLogger(__name__) @@ except Exception as e: - print(f"Error: {e}") + logger.exception("LiteLLM completion failed: %s", e) return LLMResponse(content=f"Error: {e}")🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@backend/director/llm/litellm.py` around lines 154 - 156, Replace the raw print in the exception handler with the module logger: use the existing module-level logger (e.g., logger) to log the error and include exc_info=True to capture the traceback, then return the same LLMResponse; update the except block that currently does "except Exception as e: print(f\"Error: {e}\") return LLMResponse(...)" to call logger.error("Error creating LiteLLM response" or similar, exc_info=True) and then return LLMResponse(content=f"Error: {e}").
91-93: Avoid mutable default argument fortools.
tools: list = []is shared across all calls. While the current code only readstools(so no actual bug today), this is a well-known Python pitfall that's a footgun against future refactors and is flagged by Ruff (B006).♻️ Suggested fix
- def chat_completions( - self, messages: list, tools: list = [], stop=None, response_format=None - ): + def chat_completions( + self, messages: list, tools: list = None, stop=None, response_format=None + ): """Get chat completions via LiteLLM. Routes to 100+ providers (OpenAI, Anthropic, Azure, Bedrock, etc.) based on the model string in LITELLM_CHAT_MODEL. """ import litellm + tools = tools or [] + params = {🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@backend/director/llm/litellm.py` around lines 91 - 93, The chat_completions method uses a mutable default for tools (tools: list = []), which can be shared across calls; change the signature to use None (e.g., tools: Optional[list] = None) and inside chat_completions initialize tools = [] if tools is None to avoid the shared mutable default; update any type hints/imports as needed and ensure callers are unaffected.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@backend/tests/test_litellm.py`:
- Around line 537-570: The tests rely on environment variables and can be flaky;
update TestLiteLLMConfig tests (specifically test_default_config_values and
test_config_no_api_key_required) to isolate env state by clearing or patching
os.environ for LITELLM_* before constructing LiteLLMConfig/LiteLLM: use
mock.patch.dict("os.environ", {}, clear=True) scoped to each test or explicitly
pop/clear the LITELLM_CHAT_MODEL, LITELLM_API_KEY, LITELLM_MAX_TOKENS keys so
LiteLLMConfig returns deterministic defaults while leaving
test_config_reads_from_env unchanged.
---
Nitpick comments:
In `@backend/director/llm/litellm.py`:
- Around line 154-156: Replace the raw print in the exception handler with the
module logger: use the existing module-level logger (e.g., logger) to log the
error and include exc_info=True to capture the traceback, then return the same
LLMResponse; update the except block that currently does "except Exception as e:
print(f\"Error: {e}\") return LLMResponse(...)" to call logger.error("Error
creating LiteLLM response" or similar, exc_info=True) and then return
LLMResponse(content=f"Error: {e}").
- Around line 91-93: The chat_completions method uses a mutable default for
tools (tools: list = []), which can be shared across calls; change the signature
to use None (e.g., tools: Optional[list] = None) and inside chat_completions
initialize tools = [] if tools is None to avoid the shared mutable default;
update any type hints/imports as needed and ensure callers are unaffected.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: ff8ffdf2-8de6-434e-b53a-dad9c0c27205
📒 Files selected for processing (2)
backend/director/llm/litellm.pybackend/tests/test_litellm.py
|
cc @ashish-spext and @0xrohitgarg |
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@backend/director/llm/litellm.py`:
- Around line 157-159: The except block in litellm.py currently returns the raw
exception text to callers; change it to log the full error (use logger.exception
or include the exception object in logger.error) and return a generic failure
message instead of content=f"Error: {e}". Specifically, update the except
handling around the LiteLLM completion call so logger.exception("LiteLLM
completion failed") (or equivalent) records the full stack/exception, and return
LLMResponse(content="Error: failed to generate completion") (or a similar
generic message) so no upstream/internal error details are leaked.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: fb476039-f832-4390-9346-9cab07106204
📒 Files selected for processing (2)
backend/director/llm/litellm.pybackend/tests/test_litellm.py
| except Exception as e: | ||
| logger.error("LiteLLM completion failed: %s", e) | ||
| return LLMResponse(content=f"Error: {e}") |
There was a problem hiding this comment.
Do not return raw upstream exception text to callers.
content=f"Error: {e}" exposes provider/network error details directly in the response path. Those exceptions can include internal endpoints, request metadata, or other sensitive diagnostics. Return a generic failure message and keep the full error in logs instead.
Suggested fix
- except Exception as e:
- logger.error("LiteLLM completion failed: %s", e)
- return LLMResponse(content=f"Error: {e}")
+ except Exception:
+ logger.exception("LiteLLM completion failed")
+ return LLMResponse(
+ content="Error: LiteLLM completion failed",
+ status=LLMResponseStatus.ERROR,
+ )🧰 Tools
🪛 Ruff (0.15.12)
[warning] 157-157: Do not catch blind exception: Exception
(BLE001)
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@backend/director/llm/litellm.py` around lines 157 - 159, The except block in
litellm.py currently returns the raw exception text to callers; change it to log
the full error (use logger.exception or include the exception object in
logger.error) and return a generic failure message instead of content=f"Error:
{e}". Specifically, update the except handling around the LiteLLM completion
call so logger.exception("LiteLLM completion failed") (or equivalent) records
the full stack/exception, and return LLMResponse(content="Error: failed to
generate completion") (or a similar generic message) so no upstream/internal
error details are leaked.
Summary
LiteLLMas a new LLM provider extendingBaseLLMLITELLM_CHAT_MODELenv var, no code changes neededBaseLLM+ config pattern used by the existing OpenAI, Anthropic, and GoogleAI providersWhat kind of change does this PR introduce? Feature, adds LiteLLM as a new LLM provider
Why was this change needed? Director currently supports 4 LLM providers (OpenAI, Anthropic, GoogleAI, VideoDB Proxy), each with separate integration code. LiteLLM provides a unified interface to 100+
providers through a single
completion()call. Users switch providers by changing an env var.Other information: See sections below.
What Changed
backend/director/llm/litellm.pyLiteLLMprovider extendingBaseLLMwithLiteLLMConfigbackend/director/llm/__init__.pyget_default_llm()backend/director/constants.pyLLMType.LITELLMandEnvPrefix.LITELLM_backend/requirements.txtlitellm>=1.60.0,<2.0.0backend/tests/test_litellm.pyUsage
Quick Start
Any of the https://docs.litellm.ai/docs/providers works.
Programmatic Usage
Testing
Unit Tests (37/37 Passing)
Live E2E (Azure AI Foundry, Claude Sonnet via LiteLLM)
Model: azure_ai/claude-sonnet-4-6
chat_completions() -> Content: "4", Status: SUCCESS, Tokens: send=20, recv=5, total=25
get_default_llm() -> Type: LiteLLM, Response: "OK"
Integration Bug Found and Fixed
provider conflicts automatically. Also added defensive handling for None usage responses and empty/None tool-call arguments, which can occur with certain LiteLLM backends.
Risk / Compatibility
Summary by CodeRabbit
New Features
Tests