Skip to content

fix: honor skip_special_tokens from extra_body on the vLLM completion path#1488

Open
DongjiGao wants to merge 1 commit into
mainfrom
fix/preserve-special-tokens
Open

fix: honor skip_special_tokens from extra_body on the vLLM completion path#1488
DongjiGao wants to merge 1 commit into
mainfrom
fix/preserve-special-tokens

Conversation

@DongjiGao

@DongjiGao DongjiGao commented Jun 15, 2026

Copy link
Copy Markdown
Collaborator

Recreated from an in-repo branch so CI can run on it (fork PRs do not receive CI secrets). Replaces #1478.

What

The vLLM text-completion request builder hardcoded skip_special_tokens: False at the top level, while _build_request_body could simultaneously carry a user-supplied skip_special_tokens inside extra_body. The field was sourced twice, so any extra_body.skip_special_tokens was silently dropped on the completion path.

This pops skip_special_tokens out of extra_body into the single top-level field (default False), so inference.extra_body.skip_special_tokens is honored on the completion path — matching how the chat path already behaves.

Why

skip_special_tokens=false keeps special tokens (e.g. speaker tags) in the decoded output, which multispeaker ASR needs. The chat endpoint already respects this via extra_body; this makes the completion endpoint consistent, without adding any new config surface.

Usage (just extra_body, no new options):

++inference.extra_body.skip_special_tokens=false

Test

tests/test_vllm_completion.py:

  • default → top-level skip_special_tokens=False, absent from extra_body
  • extra_body.skip_special_tokens=True → lifted to the single top-level field, not duplicated

Summary by CodeRabbit

  • New Features

    • The skip_special_tokens parameter in inference requests is now configurable via request options, providing greater control over token processing behavior.
  • Tests

    • Added tests to validate skip_special_tokens parameter handling and configuration.

… path

The text-completion request hardcoded `skip_special_tokens: False` at the top
level while `_build_request_body` could also carry a user-supplied
`skip_special_tokens` inside `extra_body`, so the field was sourced twice and
the extra_body value was silently dropped. Pop it from extra_body into the single
top-level field (default False) so `inference.extra_body.skip_special_tokens` is
honored on the completion path, matching how the chat path already behaves
(e.g. `=false` keeps speaker tags for multispeaker ASR).

Signed-off-by: Dongji Gao <dongjig@nvidia.com>
@coderabbitai

coderabbitai Bot commented Jun 15, 2026

Copy link
Copy Markdown
Contributor

Review Change Stack

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: c04ba3bb-20e8-4bdb-affc-59256eb92628

📥 Commits

Reviewing files that changed from the base of the PR and between da85a88 and 35bf527.

📒 Files selected for processing (2)
  • nemo_skills/inference/model/vllm.py
  • tests/test_vllm_completion.py

📝 Walkthrough

Walkthrough

VLLMModel._build_completion_request_params is refactored to build request_extra_body once via _build_request_body(...), then use pop("skip_special_tokens", False) to extract that field (defaulting to False) instead of hardcoding it. A new test module adds two unit tests verifying the default and override behaviors.

Changes

skip_special_tokens extraction from extra_body

Layer / File(s) Summary
Completion request assembly and unit tests
nemo_skills/inference/model/vllm.py, tests/test_vllm_completion.py
_build_completion_request_params now calls _build_request_body(...) once into request_extra_body, pops skip_special_tokens from it (defaulting to False), and sets extra_body to the same dict. Two unit tests assert that skip_special_tokens defaults to False with no duplication in extra_body, and that a caller-supplied skip_special_tokens=True is promoted to the top-level field and absent from extra_body.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~8 minutes

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 60.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The PR title 'fix: honor skip_special_tokens from extra_body on the vLLM completion path' directly and specifically describes the main change: fixing the vLLM completion endpoint to properly handle the skip_special_tokens parameter from extra_body.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch fix/preserve-special-tokens

Comment @coderabbitai help to get the list of available commands and usage tips.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant