fix(ppo): preserve raw KL metric tensor by EazyReal · Pull Request #1493 · radixark/miles

EazyReal · 2026-06-27T02:12:28Z

Port of THUDM/slime#2114 for miles' refactored advantage helper.

Summary:

build PPO token-level rewards out of place instead of mutating the kl tensor list
preserve the raw approximate KL tensor used by rollout/kl logging
add a CPU test directly on compute_advantages

Validation:

uv run --with pytest --with torch --with numpy pytest --confcutdir=tests/fast/backends/training_utils/loss tests/fast/backends/training_utils/loss/test_ppo_kl_metric.py -q -> 1 passed

gemini-code-assist

Code Review

This pull request prevents the in-place mutation of the input kl list elements in compute_advantages by creating a new token_level_rewards variable instead of modifying the elements in-place. Additionally, a unit test is added to verify that the raw KL metric is preserved. I have no feedback to provide as there are no review comments.

Important

The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.

fix(ppo): preserve raw KL metric tensor

e5dca54

EazyReal requested review from Shi-Dong, Zhichenzzz, fzyzcjy, maocheng23 and yueming-yuan as code owners June 27, 2026 02:12

gemini-code-assist Bot reviewed Jun 27, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix(ppo): preserve raw KL metric tensor#1493

fix(ppo): preserve raw KL metric tensor#1493
EazyReal wants to merge 1 commit into
radixark:mainfrom
EazyReal:upstream-pr/ppo-kl-inplace-metric

EazyReal commented Jun 27, 2026 •

edited

Loading

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

EazyReal commented Jun 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

EazyReal commented Jun 27, 2026 •

edited

Loading