Submission checklist
Package (Required)
Related Issues / PRs
No response
Reproduction Steps / Example Code (Python)
from langchain_core.messages import HumanMessage
msg = HumanMessage(content=[
{
"type": "file",
"file": {
"filename": "sheet.csv",
"file_data": "data:text/csv;base64,aGVsbG8=",
},
},
])
for block in msg.content_blocks:
print(block)
Error Message and Stack Trace (if applicable)
Description
In langchain_core/messages/block_translators/openai.py, _convert_openai_format_to_data_block has two base64 branches that look symmetrical: one for image_url, one for file.
The image branch reads the MIME type from the parsed data URI (parsed["mime_type"]). The file branch hard-codes "application/pdf".
The repro passes a CSV via the OpenAI base64 file block shape that the OpenAI docs prescribe. The resulting v1 content block has mime_type="application/pdf" instead of "text/csv", even though the data URI explicitly says text/csv. Any non-PDF file attached this way (CSV, plain text, spreadsheets, office docs) gets silently relabeled the same way.
Since _normalize_messages calls this translator on every chat model's input path, the wrong MIME type propagates to downstream integrations that consume content_blocks.
Expected: mime_type matches the data URI (text/csv in the example).
Actual: mime_type is always application/pdf.
_parse_data_uri already returns None if the MIME type is missing, so the fix is to use parsed["mime_type"] like the image branch does, no extra None check needed.
System Info
System Information
OS: Darwin
OS Version: Darwin Kernel Version 25.4.0: Thu Mar 19 19:33:25 PDT 2026; root:xnu-12377.101.15~1/RELEASE_ARM64_T6041
Python Version: 3.14.2 (main, Dec 5 2025, 16:49:16) [Clang 17.0.0 (clang-1700.4.4.1)]
Package Information
langchain_core: 1.3.0
langchain_community: 0.4.1
langsmith: 0.7.31
langchain_classic: 1.0.3
langchain_text_splitters: 1.1.1
langgraph_sdk: 0.3.13
Optional packages not installed
deepagents
deepagents-cli
Other Dependencies
aiohttp: 3.13.5
dataclasses-json: 0.6.7
google-adk: 1.30.0
httpx: 0.28.1
httpx-sse: 0.4.3
jsonpatch: 1.33
numpy: 2.4.4
opentelemetry-api: 1.38.0
opentelemetry-exporter-otlp-proto-http: 1.38.0
opentelemetry-sdk: 1.38.0
orjson: 3.11.8
packaging: 26.1
pydantic: 2.12.5
pydantic-settings: 2.13.1
pytest: 9.0.3
PyYAML: 6.0.3
pyyaml: 6.0.3
requests: 2.33.1
requests-toolbelt: 1.0.0
rich: 15.0.0
SQLAlchemy: 2.0.49
sqlalchemy: 2.0.49
tenacity: 9.1.4
typing-extensions: 4.15.0
uuid-utils: 0.14.1
vcrpy: 8.1.1
websockets: 15.0.1
wrapt: 1.17.3
xxhash: 3.6.0
zstandard: 0.25.0
Submission checklist
Package (Required)
Related Issues / PRs
No response
Reproduction Steps / Example Code (Python)
Error Message and Stack Trace (if applicable)
Description
In langchain_core/messages/block_translators/openai.py, _convert_openai_format_to_data_block has two base64 branches that look symmetrical: one for image_url, one for file.
The image branch reads the MIME type from the parsed data URI (parsed["mime_type"]). The file branch hard-codes "application/pdf".
The repro passes a CSV via the OpenAI base64 file block shape that the OpenAI docs prescribe. The resulting v1 content block has mime_type="application/pdf" instead of "text/csv", even though the data URI explicitly says text/csv. Any non-PDF file attached this way (CSV, plain text, spreadsheets, office docs) gets silently relabeled the same way.
Since _normalize_messages calls this translator on every chat model's input path, the wrong MIME type propagates to downstream integrations that consume content_blocks.
Expected: mime_type matches the data URI (text/csv in the example).
Actual: mime_type is always application/pdf.
_parse_data_uri already returns None if the MIME type is missing, so the fix is to use parsed["mime_type"] like the image branch does, no extra None check needed.
System Info
System Information
Package Information
Optional packages not installed
Other Dependencies