Skip to content

core: _convert_openai_format_to_data_block hard-codes mime_type on base64 file blocks #36939

@anmolg1997

Description

Submission checklist

  • This is a bug, not a usage question.
  • I added a clear and descriptive title that summarizes this issue.
  • I used the GitHub search to find a similar question and didn't find it.
  • I am sure that this is a bug in LangChain rather than my code.
  • The bug is not resolved by updating to the latest stable version of LangChain (or the specific integration package).
  • This is not related to the langchain-community package.
  • I posted a self-contained, minimal, reproducible example. A maintainer can copy it and run it AS IS.

Package (Required)

  • langchain
  • langchain-openai
  • langchain-anthropic
  • langchain-classic
  • langchain-core
  • langchain-model-profiles
  • langchain-tests
  • langchain-text-splitters
  • langchain-chroma
  • langchain-deepseek
  • langchain-exa
  • langchain-fireworks
  • langchain-groq
  • langchain-huggingface
  • langchain-mistralai
  • langchain-nomic
  • langchain-ollama
  • langchain-openrouter
  • langchain-perplexity
  • langchain-qdrant
  • langchain-xai
  • Other / not sure / general

Related Issues / PRs

No response

Reproduction Steps / Example Code (Python)

from langchain_core.messages import HumanMessage

msg = HumanMessage(content=[
    {
        "type": "file",
        "file": {
            "filename": "sheet.csv",
            "file_data": "data:text/csv;base64,aGVsbG8=",
        },
    },
])

for block in msg.content_blocks:
    print(block)

Error Message and Stack Trace (if applicable)

Description

In langchain_core/messages/block_translators/openai.py, _convert_openai_format_to_data_block has two base64 branches that look symmetrical: one for image_url, one for file.

The image branch reads the MIME type from the parsed data URI (parsed["mime_type"]). The file branch hard-codes "application/pdf".

The repro passes a CSV via the OpenAI base64 file block shape that the OpenAI docs prescribe. The resulting v1 content block has mime_type="application/pdf" instead of "text/csv", even though the data URI explicitly says text/csv. Any non-PDF file attached this way (CSV, plain text, spreadsheets, office docs) gets silently relabeled the same way.

Since _normalize_messages calls this translator on every chat model's input path, the wrong MIME type propagates to downstream integrations that consume content_blocks.

Expected: mime_type matches the data URI (text/csv in the example).
Actual: mime_type is always application/pdf.

_parse_data_uri already returns None if the MIME type is missing, so the fix is to use parsed["mime_type"] like the image branch does, no extra None check needed.

System Info

System Information

OS: Darwin
OS Version: Darwin Kernel Version 25.4.0: Thu Mar 19 19:33:25 PDT 2026; root:xnu-12377.101.15~1/RELEASE_ARM64_T6041
Python Version: 3.14.2 (main, Dec 5 2025, 16:49:16) [Clang 17.0.0 (clang-1700.4.4.1)]

Package Information

langchain_core: 1.3.0
langchain_community: 0.4.1
langsmith: 0.7.31
langchain_classic: 1.0.3
langchain_text_splitters: 1.1.1
langgraph_sdk: 0.3.13

Optional packages not installed

deepagents
deepagents-cli

Other Dependencies

aiohttp: 3.13.5
dataclasses-json: 0.6.7
google-adk: 1.30.0
httpx: 0.28.1
httpx-sse: 0.4.3
jsonpatch: 1.33
numpy: 2.4.4
opentelemetry-api: 1.38.0
opentelemetry-exporter-otlp-proto-http: 1.38.0
opentelemetry-sdk: 1.38.0
orjson: 3.11.8
packaging: 26.1
pydantic: 2.12.5
pydantic-settings: 2.13.1
pytest: 9.0.3
PyYAML: 6.0.3
pyyaml: 6.0.3
requests: 2.33.1
requests-toolbelt: 1.0.0
rich: 15.0.0
SQLAlchemy: 2.0.49
sqlalchemy: 2.0.49
tenacity: 9.1.4
typing-extensions: 4.15.0
uuid-utils: 0.14.1
vcrpy: 8.1.1
websockets: 15.0.1
wrapt: 1.17.3
xxhash: 3.6.0
zstandard: 0.25.0

Metadata

Metadata

Labels

bugRelated to a bug, vulnerability, unexpected error with an existing featurecore`langchain-core` package issues & PRsexternal

Type

No fields configured for Bug.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions