Skip to content

fix(tokenization): prevent IndexError in apply_chat_template on empty conversation list#46793

Open
Hasnaathussain wants to merge 2 commits into
huggingface:mainfrom
Hasnaathussain:fix/46752-apply-chat-template-empty-list
Open

fix(tokenization): prevent IndexError in apply_chat_template on empty conversation list#46793
Hasnaathussain wants to merge 2 commits into
huggingface:mainfrom
Hasnaathussain:fix/46752-apply-chat-template-empty-list

Conversation

@Hasnaathussain

Copy link
Copy Markdown

What does this PR do?

Resolves #46752.

When calling \�pply_chat_template\ with an empty list ([]), the tokenization pipeline raises an \IndexError\ because it attempts to check \isinstance(conversation[0], ...)\ without validating if the list is populated.

This PR adds a \len(conversation) > 0\ guard to prevent the crash in \ okenization_utils_base.py, \processing_utils.py, and \ okenization_mistral_common.py. Also adds a regression test in \ est_tokenization_utils.py.

Who can review?

@ArthurZucker @younesbelkada

Copilot AI review requested due to automatic review settings June 21, 2026 11:56

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR addresses an IndexError in apply_chat_template when called with an empty conversation list ([]) by guarding access to conversation[0] across tokenizer, processor, and Mistral-common implementations, and adds a regression test.

Changes:

  • Add len(conversation) > 0 guards before checking conversation[0] in apply_chat_template implementations.
  • Add a regression test covering apply_chat_template([]).

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 4 comments.

File Description
src/transformers/tokenization_utils_base.py Adds an empty-length guard before accessing conversation[0] in batched detection.
src/transformers/tokenization_mistral_common.py Adds the same guard in the Mistral-common override.
src/transformers/processing_utils.py Adds the same guard in ProcessorMixin.apply_chat_template.
tests/tokenization/test_tokenization_utils.py Adds a regression test for empty-conversation handling.

Comment on lines 3089 to 3091
if isinstance(conversation, (list, tuple)) and len(conversation) > 0 and (
isinstance(conversation[0], (list, tuple)) or hasattr(conversation[0], "messages")
):
Comment thread src/transformers/processing_utils.py Outdated
Comment on lines 2072 to 2074
if isinstance(conversation, (list, tuple)) and len(conversation) > 0 and (
isinstance(conversation[0], (list, tuple)) or hasattr(conversation[0], "content")
):
Comment on lines 1103 to 1105
if isinstance(conversation, (list, tuple)) and len(conversation) > 0 and (
isinstance(conversation[0], (list, tuple)) or hasattr(conversation[0], "messages")
):
Comment on lines +327 to +331
def test_apply_chat_template_empty_conversation(self):
tokenizer = AutoTokenizer.from_pretrained("HuggingFaceH4/zephyr-7b-beta")
# Should not raise IndexError
tokens = tokenizer.apply_chat_template([], tokenize=True, return_dict=False)
self.assertEqual(tokens, [])
@github-actions

Copy link
Copy Markdown
Contributor

CI Dashboard: View test results in Grafana

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

apply_chat_template crashes with IndexError when passed an empty list

2 participants