fix(examples): preserve geo3k response budget by zhangdw156 · Pull Request #2140 · THUDM/slime

zhangdw156 · 2026-06-27T16:38:57Z

Summary

Fix geo3k VLM multi-turn rollout budgeting so max_new_tokens is treated as response budget, not total prompt+response budget.
When rollout_max_context_len is also set, use the tighter remaining context/response budget.
Add a CPU-only regression test for fresh prompts, existing response tokens, and combined context/response caps.

Why

sampling_params["max_new_tokens"] comes from rollout_max_response_len. The current multi-turn example subtracts len(sample.tokens), which includes the prompt, so long prompts can prematurely truncate or reduce the model's response budget before generation starts.

Tests

uvx --from pytest pytest tests/test_geo3k_vlm_multi_turn_budget.py -q
uvx ruff check examples/geo3k_vlm_multi_turn/rollout.py tests/test_geo3k_vlm_multi_turn_budget.py
git diff --check

fix(examples): preserve geo3k response budget

de81ee0

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix(examples): preserve geo3k response budget#2140

fix(examples): preserve geo3k response budget#2140
zhangdw156 wants to merge 1 commit into
THUDM:mainfrom
zhangdw156:fix/geo3k-response-budget

zhangdw156 commented Jun 27, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

zhangdw156 commented Jun 27, 2026

Summary

Why

Tests

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant