fix: handle vllm context length errors in nano v3 recipe by snowmanwwg · Pull Request #1752 · NVIDIA-NeMo/Gym

snowmanwwg · 2026-06-26T05:52:53Z

Summary

Detect vLLM context-length failures from chat completions and tokenization responses.
Return token/context length metadata in the error body when vLLM reports the limit.
Add focused tests for chat completions and tokenization context-length handling.

Testing

uv run --extra dev python -m pytest responses_api_models/vllm_model/tests/test_app.py (72 passed, 1 warning)

Cherry-picked from 7ad66da onto current main.

Signed-off-by: Wenwen Gao <wenweng@cw-dfw-cs-001-vscode-01.cm.cluster>

copy-pr-bot · 2026-06-26T05:52:57Z

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

claude · 2026-06-26T05:54:34Z

+    "max_model_len",
+    "max model len",
+    "max_tokens",
+    "maximum context length",


NOTE — "max_tokens" is the broadest substring here. A vLLM 400 whose body mentions max_tokens for a non-context-length reason (e.g., invalid value, type mismatch) would be silently swallowed and surfaced as finish_reason="length" instead of raising. This was already true in the old code so it's not a regression, but worth keeping in mind — if mysterious "length" finishes appear in logs, this matcher could be the culprit.

Not blocking; the substring approach is the pragmatic choice given vLLM's unstructured error surface.

claude · 2026-06-26T05:54:36Z

+        response_content = getattr(error, "response_content", b"")
+        if isinstance(response_content, bytes):
+            response_content_text = response_content.decode(errors="replace")
+        elif response_content is None:
+            response_content_text = ""
+        else:
+            response_content_text = str(response_content)
+
+        error_text = f"{error.message} {response_content_text}".lower()
+        return any(substring in error_text for substring in CONTEXT_LENGTH_ERROR_SUBSTRINGS)


Good defensive coding. The old code did e.response_content.decode() (no errors="replace", no guard for missing attribute). This handles bytes/None/str and uses errors="replace" per CLAUDE.md conventions. Nice fix.

claude · 2026-06-26T05:54:40Z

+            try:
+                tokenize_response = await client.create_tokenize(**tokenize_body_dict)
+            except ClientResponseError as e:
+                if self._is_context_length_error(e):
+                    res = self._create_empty_chat_completion()
+                    res.choices[0].finish_reason = "length"
+                    return res
+                raise


NOTE — When the tokenize call fails here, a successful chat completion (line 477) is silently discarded. This is the correct behavior: without prompt_token_ids the training pipeline can't build a loss mask, so "length" correctly tells the caller to skip this example. But it's worth a comment since a future reader will wonder why a successful generation is thrown away.

For pure evaluation (return_token_id_information=False), this path is never reached, so no eval score impact.

fix: handle vllm context length errors

9fdd6aa

Signed-off-by: Wenwen Gao <wenweng@cw-dfw-cs-001-vscode-01.cm.cluster>

snowmanwwg requested a review from yfw June 26, 2026 05:53

snowmanwwg changed the title ~~fix: handle vllm context length errors~~ fix: handle vllm context length errors in nano v3 recipe Jun 26, 2026

claude Bot reviewed Jun 26, 2026

View reviewed changes

snowmanwwg mentioned this pull request Jun 26, 2026

DRAFT fix: Nano-v3 recipe run fix. NVIDIA-NeMo/RL#2867

Open

4 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix: handle vllm context length errors in nano v3 recipe#1752

fix: handle vllm context length errors in nano v3 recipe#1752
snowmanwwg wants to merge 1 commit into
mainfrom
fix-vllm-context-length-errors

snowmanwwg commented Jun 26, 2026

Uh oh!

copy-pr-bot Bot commented Jun 26, 2026

Uh oh!

claude Bot Jun 26, 2026

Uh oh!

claude Bot Jun 26, 2026

Uh oh!

claude Bot Jun 26, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

snowmanwwg commented Jun 26, 2026

Summary

Testing

Uh oh!

copy-pr-bot Bot commented Jun 26, 2026

Uh oh!

claude Bot Jun 26, 2026

Choose a reason for hiding this comment

Uh oh!

claude Bot Jun 26, 2026

Choose a reason for hiding this comment

Uh oh!

claude Bot Jun 26, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant