Skip to content

ChatMistralAI: citation metadata from Mistral API response is silently dropped #36427

@JulienRabault

Description

@JulienRabault

Checked other resources

  • This is a feature request, not a bug report or usage question.
  • I added a clear and descriptive title that summarizes the feature request.
  • I used the GitHub search to find a similar feature request and didn't find it.
  • I checked the LangChain documentation and API reference to see if this feature already exists.
  • This is not related to the langchain-community package.

Package (Required)

  • langchain
  • langchain-openai
  • langchain-anthropic
  • langchain-classic
  • langchain-core
  • langchain-model-profiles
  • langchain-tests
  • langchain-text-splitters
  • langchain-chroma
  • langchain-deepseek
  • langchain-exa
  • langchain-fireworks
  • langchain-groq
  • langchain-huggingface
  • langchain-mistralai
  • langchain-nomic
  • langchain-ollama
  • langchain-openrouter
  • langchain-perplexity
  • langchain-qdrant
  • langchain-xai
  • Other / not sure / general

Feature Description

ChatMistralAI._convert_mistral_chat_message_to_message treats response content as a plain string. When calling Mistral's API with citations=True, content is a list of typed chunks (text and reference) the reference metadata (reference_ids, source mapping) gets silently dropped.

The citation data should be extracted and stored in response_metadata["citations"] so users doing RAG with Mistral can map answer fragments back to source documents.

Relevant code: _convert_mistral_chat_message_to_message in langchain_mistralai/chat_models.py, specifically:

content = _message.get("content", "") or ""

Use Case

RAG pipelines using Mistral models with native citation support. Mistral returns which parts of the answer come from which source documents, but there's currently no way to access that through ChatMistralAI. Users who need inline citations have to bypass langchain and call the Mistral SDK directly.

Proposed Solution

When content is a list, concatenate the text for backward compatibility and extract reference chunks into response_metadata:

citations = []
if isinstance(content, list):
    parts = []
    for chunk in content:
        parts.append(chunk.get("text", ""))
        if chunk.get("type") == "reference":
            citations.append(chunk)
    content = "".join(parts)

response_metadata = {"model_provider": "mistralai"}
if citations:
    response_metadata["citations"] = citations

content stays a string, citations are available via response_metadata. No breaking change.

Alternatives Considered

  • Calling the mistralai SDK directly instead of going through langchain — works but loses all the langchain integration (chains, callbacks, tracing)
  • Wrapping ChatMistralAI with a post-processing step that re-parses the raw API response — fragile, duplicates work

Additional Context

Mistral citation response format:

content = [
    {"type": "text", "text": "According to the document, "},
    {"type": "reference", "reference_ids": [0], "text": "the temperature is 20°C"},
    {"type": "text", "text": " on average."}
]

Docs: https://docs.mistral.ai/capabilities/citations/

Metadata

Metadata

Assignees

Labels

externalfeature requestRequest for an enhancement / additional functionalitymistralai`langchain-mistralai` package issues & PRs
No fields configured for Feature.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions