feat(loss): support pg_loss aggregation modes by EazyReal · Pull Request #1498 · radixark/miles

EazyReal · 2026-06-27T17:03:44Z

Description

Adds built-in pg_loss aggregation modes to Miles. Related implementations for comparison:

This adds --loss-aggregation {sample_mean,prompt_mean,token_mean,constant} while keeping sample_mean as the default. --calculate-per-token-loss remains the legacy spelling for token_mean, and custom pg-loss reducers still take precedence.

Mode	Denominator
`sample_mean`	per-sample active-token count
`prompt_mean`	per-prompt-group active-token count, then mean over prompt groups
`token_mean`	global active-token count
`constant`	fixed `--loss-aggregation-divisor`

The implementation keeps denominator ownership explicit:

token_mean policy loss rejects nonzero entropy or KL-loss coefficients that would mix token-normalized pg_loss with sample-normalized auxiliary loss terms.
Train logging carries per-key normalizers when needed, and aggregation rejects mixed legacy/normalized log dictionaries, key-order mismatches, and malformed value/normalizer lengths.
SFT/value losses use the token reducer when the legacy per-token path is active.
TIS/RS pg_loss uses the modified mask for loss reduction while mismatch metrics stay on the original mask.
prompt_mean DP splitting keeps prompt groups whole on each DP shard and rejects train steps whose prompt-group count cannot be distributed evenly.
Custom-config overrides rederive and revalidate global_batch_size before checking aggregation constraints.

Validation

uv run --with pytest --with torch --with numpy --with httpx --with pyyaml --with ray --with huggingface_hub --with transformers --with pydantic --with psutil pytest --confcutdir=tests/fast/backends/training_utils tests/fast/backends/training_utils/test_loss_aggregation.py tests/fast/backends/training_utils/loss/test_loss_snapshot.py -q
uv run --with ruff ruff check miles/backends/training_utils/cp_utils.py miles/backends/training_utils/data.py miles/backends/training_utils/log_utils.py miles/backends/training_utils/loss.py miles/backends/training_utils/loss_hub/losses.py miles/ray/rollout/train_data_conversion.py miles/utils/arguments.py miles/backends/megatron_utils/model.py miles/backends/experimental/fsdp_utils/actor.py tests/fast/backends/training_utils/test_loss_aggregation.py tests/fast/backends/training_utils/loss/test_loss_snapshot.py
uv run --with black black --check miles/backends/training_utils/cp_utils.py miles/backends/training_utils/data.py miles/backends/training_utils/log_utils.py miles/backends/training_utils/loss.py miles/backends/training_utils/loss_hub/losses.py miles/ray/rollout/train_data_conversion.py miles/utils/arguments.py miles/backends/megatron_utils/model.py miles/backends/experimental/fsdp_utils/actor.py tests/fast/backends/training_utils/test_loss_aggregation.py tests/fast/backends/training_utils/loss/test_loss_snapshot.py
git diff --check upstream/main

python3 train.py --help is not runnable in my local environment because sglang is not installed; the parser path for the new flags is covered by the focused tests.

EazyReal · 2026-06-27T17:04:13Z

Hi @yueming-yuan, reopening this fresh PR for visibility after syncing the loss aggregation behavior with THUDM/slime#2090.

This adds pg_loss aggregation modes in Miles and keeps prompt_mean aligned with slime's current implementation: prompt_mask_sums, n_samples_per_prompt reducer scaling, and global_batch_size % n_samples_per_prompt == 0 validation.

Please review when you have a chance.

gemini-code-assist

Code Review

This pull request introduces the --loss-aggregation command-line option to support multiple aggregation modes for pg_loss (sample_mean, prompt_mean, token_mean, and constant), along with validation logic, documentation, and a comprehensive test suite. The review feedback suggests appending new parameters to the end of get_sum_of_sample_mean to preserve backward compatibility for positional arguments, avoiding redundant computation of sample_denoms when constant_divisor is active, and optimizing the GPU tensor conversion of prompt_mask_sums by avoiding a loop over individual scalar tensors.

Important

The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.

EazyReal · 2026-06-30T09:00:15Z

Quick follow-up after rebasing onto current main: this makes pg_loss aggregation explicit and validated while keeping the legacy default. The main value is avoiding silent effective-LR changes when users need response- or prompt-level normalization.

EazyReal requested review from Shi-Dong, Zhichenzzz, fzyzcjy, guapisolo, jybsuper, maocheng23, yueming-yuan and yushengsu-thu as code owners June 27, 2026 17:03

gemini-code-assist Bot reviewed Jun 27, 2026

View reviewed changes

Comment thread miles/backends/training_utils/cp_utils.py Outdated

Comment thread miles/backends/training_utils/cp_utils.py Outdated

Comment thread miles/backends/training_utils/data.py Outdated

EazyReal force-pushed the upstream-pr/loss-aggregation-modes-v2 branch from d0be4cc to 5bba0f9 Compare June 27, 2026 17:17

EazyReal changed the title ~~feat(loss): add --loss-aggregation pg_loss modes~~ feat(loss): make pg_loss aggregation explicit Jun 27, 2026

EazyReal force-pushed the upstream-pr/loss-aggregation-modes-v2 branch 2 times, most recently from fd42cd9 to c64cbaa Compare June 27, 2026 17:29

EazyReal changed the title ~~feat(loss): make pg_loss aggregation explicit~~ feat(loss): support pg_loss aggregation modes Jun 27, 2026

EazyReal force-pushed the upstream-pr/loss-aggregation-modes-v2 branch 3 times, most recently from 4f1a9af to f73fc1f Compare June 27, 2026 20:23

EazyReal mentioned this pull request Jun 27, 2026

feat(loss): support pg_loss aggregation modes EazyReal/miles#2

Open

EazyReal force-pushed the upstream-pr/loss-aggregation-modes-v2 branch from 67e0503 to fe7d2bf Compare June 27, 2026 22:48

EazyReal added 2 commits June 30, 2026 01:38

feat(loss): support pg_loss aggregation modes

033c6c0

fix(loss): guard loss aggregation normalizers

8d994ad

EazyReal force-pushed the upstream-pr/loss-aggregation-modes-v2 branch from fe7d2bf to 8d994ad Compare June 30, 2026 08:38

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat(loss): support pg_loss aggregation modes#1498

feat(loss): support pg_loss aggregation modes#1498
EazyReal wants to merge 2 commits into
radixark:mainfrom
EazyReal:upstream-pr/loss-aggregation-modes-v2

EazyReal commented Jun 27, 2026 •

edited

Loading

Uh oh!

EazyReal commented Jun 27, 2026 •

edited

Loading

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

EazyReal commented Jun 30, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

EazyReal commented Jun 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Validation

Uh oh!

EazyReal commented Jun 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

Uh oh!

EazyReal commented Jun 30, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

EazyReal commented Jun 27, 2026 •

edited

Loading

EazyReal commented Jun 27, 2026 •

edited

Loading