[sglang] fix seed collisions in deterministic GRPO rollouts by tntnnlrw · Pull Request #6857 · verl-project/verl

tntnnlrw · 2026-06-26T08:27:08Z

What this PR does

This PR fixes SGLang agent rollouts that need request-local deterministic sampling seeds.

derives a stable sampling_seed from (global_step, sample_index, rollout_n, base_seed) when SGLang deterministic inference is enabled
keeps different responses in the same GRPO/DAPO rollout group on distinct deterministic seeds, so deterministic mode remains reproducible without collapsing rollout diversity
computes trajectory_info on the full batch before splitting across agent-loop workers, so rollout_n stays globally unique across chunks
copies and normalizes SGLang engine_kwargs before mutating them for server launch
adds a CPU-only regression test for seed stability, chunk splitting, and SGLang kwargs normalization

Why it is needed

For GRPO/DAPO-style training, deterministic rollout does not mean every response for the same prompt should be identical. It means rerunning the same training step should reproduce the same set of rollout samples, while each response in the rollout group still receives a different, stable sampling seed.

When trajectory_info is computed independently inside each worker chunk, repeated prompts can get duplicate rollout_n values across chunks. If SGLang uses request-local sampling_seed, those duplicate (sample_index, rollout_n, step) tuples can collide and produce duplicate samples within a rollout group. That breaks the intended deterministic SGLang training behavior: reproducible across reruns, but still diverse across rollout responses.

This PR makes SGLang deterministic inference usable for real GRPO/DAPO deterministic training by assigning stable, globally unique per-response seeds before chunking work across agent-loop workers.

Tests

PYTHONPATH=. pytest tests/experimental/agent_loop/test_sglang_sampling_seed_on_cpu.py -q
python -m ruff check verl/experimental/agent_loop/agent_loop.py verl/workers/rollout/sglang_rollout/async_sglang_server.py tests/experimental/agent_loop/test_sglang_sampling_seed_on_cpu.py
python -m py_compile verl/experimental/agent_loop/agent_loop.py verl/workers/rollout/sglang_rollout/async_sglang_server.py tests/experimental/agent_loop/test_sglang_sampling_seed_on_cpu.py
git diff --check

gemini-code-assist

Code Review

This pull request introduces stable, deterministic sampling seed generation for SGLang rollouts within the agent loop. It adds helper functions to compute stable seeds based on step, sample index, rollout number, and base seed, and ensures these seeds are correctly distributed across parallel worker chunks to avoid duplicates. Additionally, it normalizes SGLang engine configuration arguments and enforces the PyTorch sampling backend when deterministic sampling is enabled. Unit tests are added to verify these behaviors. I have no further feedback to provide as there are no review comments.

Important

The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.

tntnnlrw · 2026-06-26T09:53:03Z

Hi @wuxibin89, could you take a look at this PR when you have a chance? This fixes seed collisions for SGLang deterministic GRPO rollouts, so reruns stay reproducible while different responses in the same rollout group keep distinct deterministic seeds.

tntnnlrw · 2026-06-26T09:59:43Z

Hi @wucong25, sorry to bother. If this is within your review scope, could you take a look when you have bandwidth? This is a small SGLang deterministic sampling fix with CPU-only regression coverage, preventing deterministic GRPO/DAPO rollout groups from collapsing to identical responses due to seed collisions.

fix(sglang): assign stable sampling seeds for agent rollouts

817e8f4

tntnnlrw requested review from ArronHZG, chenhaiq and wuxibin89 as code owners June 26, 2026 08:27

gemini-code-assist Bot reviewed Jun 26, 2026

View reviewed changes

tntnnlrw changed the title ~~[sglang] fix stable sampling seeds for agent rollouts~~ [sglang] fix seed collisions in deterministic GRPO rollouts Jun 26, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[sglang] fix seed collisions in deterministic GRPO rollouts#6857

[sglang] fix seed collisions in deterministic GRPO rollouts#6857
tntnnlrw wants to merge 1 commit into
verl-project:mainfrom
tntnnlrw:codex/sglang-sampling-seed

tntnnlrw commented Jun 26, 2026 •

edited

Loading

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

tntnnlrw commented Jun 26, 2026

Uh oh!

tntnnlrw commented Jun 26, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

tntnnlrw commented Jun 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What this PR does

Why it is needed

Tests

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

tntnnlrw commented Jun 26, 2026

Uh oh!

tntnnlrw commented Jun 26, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

tntnnlrw commented Jun 26, 2026 •

edited

Loading