Avoid injecting unsupported input files into LLM messages by eecczz · Pull Request #5799 · crewAIInc/crewAI

eecczz · 2026-05-13T19:24:59Z

Summary

Prevent unsupported input_files from being attached directly to LLM messages
Keep unsupported files available through the existing read_file tool path
Fix text files causing non-multimodal models to raise a vision-capable model error

Fixes #5137

Tests

uv run pytest lib\crewai\tests\utilities\test_file_injection.py
uv run ruff check lib\crewai\src\crewai\utilities\file_injection.py lib\crewai\src\crewai\agents\crew_agent_executor.py lib\crewai\src\crewai\experimental\agent_executor.py lib\crewai\tests\utilities\test_file_injection.py

Summary by CodeRabbit

New Features
- Intelligent file filtering: the system consults the model's multimodal capabilities and injects only supported files into conversations; if no supported files remain, injection is skipped.
Tests
- Added unit tests verifying behavior for multimodal vs non-multimodal models and handling of unsupported or non-file inputs.

coderabbitai · 2026-05-13T19:25:21Z

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro Plus

Run ID: 83a12d1c-e2d5-46a3-a229-63276e0acd7d

📥 Commits

Reviewing files that changed from the base of the PR and between 7e5daff and 538dfe0.

📒 Files selected for processing (2)

lib/crewai/src/crewai/utilities/file_injection.py
lib/crewai/tests/utilities/test_file_injection.py

🚧 Files skipped from review as they are similar to previous changes (1)

lib/crewai/src/crewai/utilities/file_injection.py

📝 Walkthrough

Walkthrough

A new utility filters input files by LLM multimodal capability and supported content types. Agent executors (CrewAgentExecutor and experimental agent_executor) call this helper before injecting files into the last user message and return early if no supported files remain. Tests cover multimodal and non-multimodal behaviors.

Changes

Multimodal file injection control

Layer / File(s)	Summary
File injection utility and LLM capability detection `lib/crewai/src/crewai/utilities/file_injection.py`	`get_auto_injected_files()` checks if the LLM supports multimodal input, derives provider and API config, queries supported content types, and returns a filtered mapping of input files whose `content_type` matches a supported prefix. Includes fallback `get_supported_content_types()` when `crewai_files` is unavailable.
File injection test coverage `lib/crewai/tests/utilities/test_file_injection.py`	Adds `DummyLLM` test double and tests that non-multimodal LLMs receive no injected files, unsupported non-file values are skipped, and multimodal LLMs receive only supported files (e.g., PNG ImageFile).
CrewAgentExecutor multimodal file injection `lib/crewai/src/crewai/agents/crew_agent_executor.py`	Imports and integrates `get_auto_injected_files()` into both `_inject_multimodal_files()` and `_ainject_multimodal_files()`, filtering merged crew/task/input files and returning early if no supported files remain.
Experimental agent_executor file injection `lib/crewai/src/crewai/experimental/agent_executor.py`	Imports and applies `get_auto_injected_files()` in `_inject_files_from_inputs()` to compute filtered files using the LLM before updating the last user message and returns early if empty.

Sequence Diagram

sequenceDiagram
  participant CrewAgentExecutor
  participant ExperimentalAgentExecutor
  participant get_auto_injected_files
  participant BaseLLM
  participant MessageHistory

  CrewAgentExecutor->>get_auto_injected_files: merged files + llm
  ExperimentalAgentExecutor->>get_auto_injected_files: input files + llm
  get_auto_injected_files->>BaseLLM: supports_multimodal()
  BaseLLM-->>get_auto_injected_files: True/False
  alt multimodal
    get_auto_injected_files->>get_auto_injected_files: get_supported_content_types(provider, api)
    get_auto_injected_files->>get_auto_injected_files: filter files by content_type
    get_auto_injected_files-->>Caller: filtered files
    Caller->>MessageHistory: inject filtered files into last user message
  else not multimodal or no supported files
    get_auto_injected_files-->>Caller: {}
    Caller-->>Caller: return early (no injection)
  end

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

🐰
Files hop in tidy rows,
The multimodal door now shows,
Plain text stays on page,
Images enter the stage,
Hooray — no vision-model woes! 🥕

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 36.36% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title clearly and specifically describes the main change: filtering unsupported input files before injection into LLM messages.
Linked Issues check	✅ Passed	The PR addresses all objectives from issue `#5137` by implementing file filtering based on LLM multimodal support to prevent TextFile injection errors.
Out of Scope Changes check	✅ Passed	All changes are scoped to implementing the file filtering mechanism for multimodal support detection and validation across relevant executor and utility modules.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

Tip

💬 Introducing Slack Agent: The best way for teams to turn conversations into code.

Slack Agent is built on CodeRabbit's deep understanding of your code, so your team can collaborate across the entire SDLC without losing context.

Generate code and open pull requests
Plan features and break down work
Investigate incidents and troubleshoot customer tickets together
Automate recurring tasks and respond to alerts with triggers
Summarize progress and report instantly

Built for teams:

Shared memory across your entire org—no repeating context
Per-thread sandboxes to safely plan and execute work
Governance built-in—scoped access, auditability, and budget controls

One agent for your entire SDLC. Right inside Slack.

👉 Get started

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 1

🧹 Nitpick comments (1)

lib/crewai/tests/utilities/test_file_injection.py (1)

30-43: ⚡ Quick win

Add a regression test for unsupported file values (no content_type).

Current tests don’t cover the skip-not-crash path for unsupported file objects. Add one case (e.g., {"bad": object()}) asserting {} to protect this behavior.

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@lib/crewai/tests/utilities/test_file_injection.py` around lines 30 - 43, Add
a regression test that ensures unsupported file values without a content_type
are skipped (not crash) by creating a new test function (e.g.,
test_unsupported_file_values_are_skipped) that constructs files = {"bad":
object()} and calls get_auto_injected_files(files, llm) with a multimodal
DummyLLM (e.g., DummyLLM(model="openai/gpt-4o", multimodal=True)), then asserts
the result is {} to protect the skip-not-crash behavior; place this with the
other tests (near test_text_files_are_not_injected_for_non_multimodal_llm and
test_only_supported_files_are_injected_for_multimodal_llm) and reuse existing
helper types (DummyLLM, get_auto_injected_files) for consistency.

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@lib/crewai/src/crewai/utilities/file_injection.py`:
- Around line 33-39: The dict comprehension that filters by
file_input.content_type can raise AttributeError for unsupported objects in
files; update the filter in the comprehension (the expression building the
return dict) to guard access to content_type (e.g., use getattr(file_input,
"content_type", "") or check hasattr(file_input, "content_type") before calling
.startswith) so unsupported entries are skipped instead of raising; locate the
comprehension that iterates over files.items() and replace the direct
file_input.content_type.startswith(...) call with a safe guard using getattr or
an explicit attribute check.

---

Nitpick comments:
In `@lib/crewai/tests/utilities/test_file_injection.py`:
- Around line 30-43: Add a regression test that ensures unsupported file values
without a content_type are skipped (not crash) by creating a new test function
(e.g., test_unsupported_file_values_are_skipped) that constructs files = {"bad":
object()} and calls get_auto_injected_files(files, llm) with a multimodal
DummyLLM (e.g., DummyLLM(model="openai/gpt-4o", multimodal=True)), then asserts
the result is {} to protect the skip-not-crash behavior; place this with the
other tests (near test_text_files_are_not_injected_for_non_multimodal_llm and
test_only_supported_files_are_injected_for_multimodal_llm) and reuse existing
helper types (DummyLLM, get_auto_injected_files) for consistency.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro Plus

Run ID: 21fbc214-5ac3-418a-8879-329470a81a86

📥 Commits

Reviewing files that changed from the base of the PR and between c36827b and d75baf4.

📒 Files selected for processing (4)

lib/crewai/src/crewai/agents/crew_agent_executor.py
lib/crewai/src/crewai/experimental/agent_executor.py
lib/crewai/src/crewai/utilities/file_injection.py
lib/crewai/tests/utilities/test_file_injection.py

coderabbitai

♻️ Duplicate comments (1)

lib/crewai/src/crewai/utilities/file_injection.py (1)

33-39: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Guard non-string content_type before calling .startswith().

Line 37 can still fail when an unsupported value has content_type=None (or any non-str), because .startswith() is called unconditionally on that value. Unsupported entries should be skipped, not crash filtering.

Proposed fix

     return {
         name: file_input
         for name, file_input in files.items()
-        if any(
-            getattr(file_input, "content_type", "").startswith(content_type)
-            for content_type in supported_types
-        )
+        if isinstance(getattr(file_input, "content_type", None), str)
+        and any(
+            file_input.content_type.startswith(content_type)
+            for content_type in supported_types
+        )
     }

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@lib/crewai/src/crewai/utilities/file_injection.py` around lines 33 - 39, The
dict-comprehension filter in file_injection.py currently calls
getattr(file_input, "content_type", "").startswith(content_type) without
ensuring content_type is a string; change the predicate to first check that the
extracted value is a str (e.g., isinstance(content_type_value, str)) before
calling .startswith, so entries with content_type=None or other non-str types
are skipped; update the comprehension that iterates over files/file_input and
supported_types to use this guarded check (reference the variables file_input,
content_type, supported_types in the comprehension).

🧹 Nitpick comments (1)

lib/crewai/tests/utilities/test_file_injection.py (1)

37-42: ⚡ Quick win

Add a regression case for invalid content_type types.

Please extend this test to include an unsupported value with content_type=None (or non-string) to verify it is skipped safely instead of raising.

Proposed test extension

 def test_unsupported_file_values_are_skipped() -> None:
-    files = {"bad": object()}
+    class BadFile:
+        content_type = None
+
+    files = {"bad": object(), "also_bad": BadFile()}
     llm = DummyLLM(model="openai/gpt-4o", multimodal=True)

     assert get_auto_injected_files(files, llm) == {}

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@lib/crewai/tests/utilities/test_file_injection.py` around lines 37 - 42,
Extend the test_unsupported_file_values_are_skipped case to include a file entry
whose metadata uses an invalid content_type (e.g., content_type=None or another
non-string) and assert that get_auto_injected_files(files, llm) still returns an
empty dict; specifically, add a second files variant (or expand the existing
files) that contains a mapping like {"bad": {"content": <bytes-or-str>,
"content_type": None}} and keep using DummyLLM(model="openai/gpt-4o",
multimodal=True) to verify get_auto_injected_files safely skips entries with
non-string content_type rather than raising.

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Duplicate comments:
In `@lib/crewai/src/crewai/utilities/file_injection.py`:
- Around line 33-39: The dict-comprehension filter in file_injection.py
currently calls getattr(file_input, "content_type", "").startswith(content_type)
without ensuring content_type is a string; change the predicate to first check
that the extracted value is a str (e.g., isinstance(content_type_value, str))
before calling .startswith, so entries with content_type=None or other non-str
types are skipped; update the comprehension that iterates over files/file_input
and supported_types to use this guarded check (reference the variables
file_input, content_type, supported_types in the comprehension).

---

Nitpick comments:
In `@lib/crewai/tests/utilities/test_file_injection.py`:
- Around line 37-42: Extend the test_unsupported_file_values_are_skipped case to
include a file entry whose metadata uses an invalid content_type (e.g.,
content_type=None or another non-string) and assert that
get_auto_injected_files(files, llm) still returns an empty dict; specifically,
add a second files variant (or expand the existing files) that contains a
mapping like {"bad": {"content": <bytes-or-str>, "content_type": None}} and keep
using DummyLLM(model="openai/gpt-4o", multimodal=True) to verify
get_auto_injected_files safely skips entries with non-string content_type rather
than raising.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro Plus

Run ID: 8b347ee5-0c8c-4e16-8687-3d983bfd7908

📥 Commits

Reviewing files that changed from the base of the PR and between d75baf4 and 7e5daff.

📒 Files selected for processing (2)

lib/crewai/src/crewai/utilities/file_injection.py
lib/crewai/tests/utilities/test_file_injection.py

Avoid injecting unsupported input files into LLM messages

d75baf4

coderabbitai Bot requested changes May 13, 2026

View reviewed changes

Comment thread lib/crewai/src/crewai/utilities/file_injection.py

Guard unsupported file values during injection

7e5daff

coderabbitai Bot reviewed May 14, 2026

View reviewed changes

coderabbitai Bot approved these changes May 14, 2026

View reviewed changes

eecczz added 2 commits May 14, 2026 14:06

Skip files with invalid content types

538dfe0

Merge branch 'main' into codex/filter-auto-injected-input-files

1ca647f

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Avoid injecting unsupported input files into LLM messages#5799

Avoid injecting unsupported input files into LLM messages#5799
eecczz wants to merge 4 commits into
crewAIInc:mainfrom
eecczz:codex/filter-auto-injected-input-files

eecczz commented May 13, 2026 •

edited by coderabbitai Bot

Loading

Uh oh!

coderabbitai Bot commented May 13, 2026 •

edited

Loading

Walkthrough

Changes

Sequence Diagram

Estimated code review effort

❌ Failed checks (1 warning)

Uh oh!

coderabbitai Bot left a comment

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

eecczz commented May 13, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Tests

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented May 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram

Estimated code review effort

❌ Failed checks (1 warning)

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

eecczz commented May 13, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented May 13, 2026 •

edited

Loading