fix: honor skip_special_tokens from extra_body on the vLLM completion path#1488
Open
DongjiGao wants to merge 1 commit into
Open
fix: honor skip_special_tokens from extra_body on the vLLM completion path#1488DongjiGao wants to merge 1 commit into
DongjiGao wants to merge 1 commit into
Conversation
… path The text-completion request hardcoded `skip_special_tokens: False` at the top level while `_build_request_body` could also carry a user-supplied `skip_special_tokens` inside `extra_body`, so the field was sourced twice and the extra_body value was silently dropped. Pop it from extra_body into the single top-level field (default False) so `inference.extra_body.skip_special_tokens` is honored on the completion path, matching how the chat path already behaves (e.g. `=false` keeps speaker tags for multispeaker ASR). Signed-off-by: Dongji Gao <dongjig@nvidia.com>
Contributor
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Path: .coderabbit.yaml Review profile: CHILL Plan: Enterprise Run ID: 📒 Files selected for processing (2)
📝 WalkthroughWalkthrough
Changesskip_special_tokens extraction from extra_body
Estimated code review effort🎯 2 (Simple) | ⏱️ ~8 minutes 🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Comment |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What
The vLLM text-completion request builder hardcoded
skip_special_tokens: Falseat the top level, while_build_request_bodycould simultaneously carry a user-suppliedskip_special_tokensinsideextra_body. The field was sourced twice, so anyextra_body.skip_special_tokenswas silently dropped on the completion path.This pops
skip_special_tokensout ofextra_bodyinto the single top-level field (defaultFalse), soinference.extra_body.skip_special_tokensis honored on the completion path — matching how the chat path already behaves.Why
skip_special_tokens=falsekeeps special tokens (e.g. speaker tags) in the decoded output, which multispeaker ASR needs. The chat endpoint already respects this viaextra_body; this makes the completion endpoint consistent, without adding any new config surface.Usage (just
extra_body, no new options):Test
tests/test_vllm_completion.py:skip_special_tokens=False, absent fromextra_bodyextra_body.skip_special_tokens=True→ lifted to the single top-level field, not duplicatedSummary by CodeRabbit
New Features
skip_special_tokensparameter in inference requests is now configurable via request options, providing greater control over token processing behavior.Tests
skip_special_tokensparameter handling and configuration.