Skip to content

fix(partial-rollout): cap max_new_tokens by prior response length#2122

Open
none0663 wants to merge 2 commits into
THUDM:mainfrom
none0663:fix-partial-rollout-rollout-max-response
Open

fix(partial-rollout): cap max_new_tokens by prior response length#2122
none0663 wants to merge 2 commits into
THUDM:mainfrom
none0663:fix-partial-rollout-rollout-max-response

Conversation

@none0663

Copy link
Copy Markdown
Contributor

Summary

In partial rollout, an aborted sample is resubmitted with its previously
generated tokens already attached (sample.response_length > 0). However,
sampling_params["max_new_tokens"] was still set to the full
rollout_max_response_len, allowing the engine to generate another full
budget of tokens. As a result, the total response length could exceed
rollout_max_response_len (nothing downstream clamps it).

This subtracts the already-generated length so the cumulative response stays
within rollout_max_response_len. When the budget is already exhausted,
max_new_tokens becomes 0 and the sample is marked TRUNCATED by the
existing guard.

Changes

  • rollout/sglang_rollout.py (generate)
  • rollout/sglang_streaming_rollout.py (generate_streaming)

Only affects runs with --partial-rollout and samples that already have
response tokens; fresh samples are unchanged.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant