Avoid injecting unsupported input files into LLM messages#5799
Conversation
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Organization UI Review profile: CHILL Plan: Pro Plus Run ID: 📒 Files selected for processing (2)
🚧 Files skipped from review as they are similar to previous changes (1)
📝 WalkthroughWalkthroughA new utility filters input files by LLM multimodal capability and supported content types. Agent executors (CrewAgentExecutor and experimental agent_executor) call this helper before injecting files into the last user message and return early if no supported files remain. Tests cover multimodal and non-multimodal behaviors. ChangesMultimodal file injection control
Sequence DiagramsequenceDiagram
participant CrewAgentExecutor
participant ExperimentalAgentExecutor
participant get_auto_injected_files
participant BaseLLM
participant MessageHistory
CrewAgentExecutor->>get_auto_injected_files: merged files + llm
ExperimentalAgentExecutor->>get_auto_injected_files: input files + llm
get_auto_injected_files->>BaseLLM: supports_multimodal()
BaseLLM-->>get_auto_injected_files: True/False
alt multimodal
get_auto_injected_files->>get_auto_injected_files: get_supported_content_types(provider, api)
get_auto_injected_files->>get_auto_injected_files: filter files by content_type
get_auto_injected_files-->>Caller: filtered files
Caller->>MessageHistory: inject filtered files into last user message
else not multimodal or no supported files
get_auto_injected_files-->>Caller: {}
Caller-->>Caller: return early (no injection)
end
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Tip 💬 Introducing Slack Agent: The best way for teams to turn conversations into code.Slack Agent is built on CodeRabbit's deep understanding of your code, so your team can collaborate across the entire SDLC without losing context.
Built for teams:
One agent for your entire SDLC. Right inside Slack. Comment |
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (1)
lib/crewai/tests/utilities/test_file_injection.py (1)
30-43: ⚡ Quick winAdd a regression test for unsupported file values (no
content_type).Current tests don’t cover the skip-not-crash path for unsupported file objects. Add one case (e.g.,
{"bad": object()}) asserting{}to protect this behavior.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@lib/crewai/tests/utilities/test_file_injection.py` around lines 30 - 43, Add a regression test that ensures unsupported file values without a content_type are skipped (not crash) by creating a new test function (e.g., test_unsupported_file_values_are_skipped) that constructs files = {"bad": object()} and calls get_auto_injected_files(files, llm) with a multimodal DummyLLM (e.g., DummyLLM(model="openai/gpt-4o", multimodal=True)), then asserts the result is {} to protect the skip-not-crash behavior; place this with the other tests (near test_text_files_are_not_injected_for_non_multimodal_llm and test_only_supported_files_are_injected_for_multimodal_llm) and reuse existing helper types (DummyLLM, get_auto_injected_files) for consistency.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@lib/crewai/src/crewai/utilities/file_injection.py`:
- Around line 33-39: The dict comprehension that filters by
file_input.content_type can raise AttributeError for unsupported objects in
files; update the filter in the comprehension (the expression building the
return dict) to guard access to content_type (e.g., use getattr(file_input,
"content_type", "") or check hasattr(file_input, "content_type") before calling
.startswith) so unsupported entries are skipped instead of raising; locate the
comprehension that iterates over files.items() and replace the direct
file_input.content_type.startswith(...) call with a safe guard using getattr or
an explicit attribute check.
---
Nitpick comments:
In `@lib/crewai/tests/utilities/test_file_injection.py`:
- Around line 30-43: Add a regression test that ensures unsupported file values
without a content_type are skipped (not crash) by creating a new test function
(e.g., test_unsupported_file_values_are_skipped) that constructs files = {"bad":
object()} and calls get_auto_injected_files(files, llm) with a multimodal
DummyLLM (e.g., DummyLLM(model="openai/gpt-4o", multimodal=True)), then asserts
the result is {} to protect the skip-not-crash behavior; place this with the
other tests (near test_text_files_are_not_injected_for_non_multimodal_llm and
test_only_supported_files_are_injected_for_multimodal_llm) and reuse existing
helper types (DummyLLM, get_auto_injected_files) for consistency.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro Plus
Run ID: 21fbc214-5ac3-418a-8879-329470a81a86
📒 Files selected for processing (4)
lib/crewai/src/crewai/agents/crew_agent_executor.pylib/crewai/src/crewai/experimental/agent_executor.pylib/crewai/src/crewai/utilities/file_injection.pylib/crewai/tests/utilities/test_file_injection.py
There was a problem hiding this comment.
♻️ Duplicate comments (1)
lib/crewai/src/crewai/utilities/file_injection.py (1)
33-39:⚠️ Potential issue | 🟠 Major | ⚡ Quick winGuard non-string
content_typebefore calling.startswith().Line 37 can still fail when an unsupported value has
content_type=None(or any non-str), because.startswith()is called unconditionally on that value. Unsupported entries should be skipped, not crash filtering.Proposed fix
return { name: file_input for name, file_input in files.items() - if any( - getattr(file_input, "content_type", "").startswith(content_type) - for content_type in supported_types - ) + if isinstance(getattr(file_input, "content_type", None), str) + and any( + file_input.content_type.startswith(content_type) + for content_type in supported_types + ) }🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@lib/crewai/src/crewai/utilities/file_injection.py` around lines 33 - 39, The dict-comprehension filter in file_injection.py currently calls getattr(file_input, "content_type", "").startswith(content_type) without ensuring content_type is a string; change the predicate to first check that the extracted value is a str (e.g., isinstance(content_type_value, str)) before calling .startswith, so entries with content_type=None or other non-str types are skipped; update the comprehension that iterates over files/file_input and supported_types to use this guarded check (reference the variables file_input, content_type, supported_types in the comprehension).
🧹 Nitpick comments (1)
lib/crewai/tests/utilities/test_file_injection.py (1)
37-42: ⚡ Quick winAdd a regression case for invalid
content_typetypes.Please extend this test to include an unsupported value with
content_type=None(or non-string) to verify it is skipped safely instead of raising.Proposed test extension
def test_unsupported_file_values_are_skipped() -> None: - files = {"bad": object()} + class BadFile: + content_type = None + + files = {"bad": object(), "also_bad": BadFile()} llm = DummyLLM(model="openai/gpt-4o", multimodal=True) assert get_auto_injected_files(files, llm) == {}🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@lib/crewai/tests/utilities/test_file_injection.py` around lines 37 - 42, Extend the test_unsupported_file_values_are_skipped case to include a file entry whose metadata uses an invalid content_type (e.g., content_type=None or another non-string) and assert that get_auto_injected_files(files, llm) still returns an empty dict; specifically, add a second files variant (or expand the existing files) that contains a mapping like {"bad": {"content": <bytes-or-str>, "content_type": None}} and keep using DummyLLM(model="openai/gpt-4o", multimodal=True) to verify get_auto_injected_files safely skips entries with non-string content_type rather than raising.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Duplicate comments:
In `@lib/crewai/src/crewai/utilities/file_injection.py`:
- Around line 33-39: The dict-comprehension filter in file_injection.py
currently calls getattr(file_input, "content_type", "").startswith(content_type)
without ensuring content_type is a string; change the predicate to first check
that the extracted value is a str (e.g., isinstance(content_type_value, str))
before calling .startswith, so entries with content_type=None or other non-str
types are skipped; update the comprehension that iterates over files/file_input
and supported_types to use this guarded check (reference the variables
file_input, content_type, supported_types in the comprehension).
---
Nitpick comments:
In `@lib/crewai/tests/utilities/test_file_injection.py`:
- Around line 37-42: Extend the test_unsupported_file_values_are_skipped case to
include a file entry whose metadata uses an invalid content_type (e.g.,
content_type=None or another non-string) and assert that
get_auto_injected_files(files, llm) still returns an empty dict; specifically,
add a second files variant (or expand the existing files) that contains a
mapping like {"bad": {"content": <bytes-or-str>, "content_type": None}} and keep
using DummyLLM(model="openai/gpt-4o", multimodal=True) to verify
get_auto_injected_files safely skips entries with non-string content_type rather
than raising.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro Plus
Run ID: 8b347ee5-0c8c-4e16-8687-3d983bfd7908
📒 Files selected for processing (2)
lib/crewai/src/crewai/utilities/file_injection.pylib/crewai/tests/utilities/test_file_injection.py
Summary
input_filesfrom being attached directly to LLM messagesread_filetool pathFixes #5137
Tests
uv run pytest lib\crewai\tests\utilities\test_file_injection.pyuv run ruff check lib\crewai\src\crewai\utilities\file_injection.py lib\crewai\src\crewai\agents\crew_agent_executor.py lib\crewai\src\crewai\experimental\agent_executor.py lib\crewai\tests\utilities\test_file_injection.pySummary by CodeRabbit
New Features
Tests