[trainer] Fix process_validation_metrics crash on None-filled sparse reward keys by abinggo · Pull Request #6845 · verl-project/verl

abinggo · 2026-06-25T07:32:18Z

Summary

When a reward_extra_info key is emitted for only some validation samples (a
sparse key), _validate null-fills it to a uniform schema, so a per-uid group
passed to process_validation_metrics can contain None. The aggregation
skip-filter only caught empty lists and strings:

if not var_vals or isinstance(var_vals[0], str):
    continue

so np.mean([..., None, ...]) raised TypeError: unsupported operand type(s) for /: 'NoneType' and 'int' and crashed validation. Any reward function that
conditionally adds an extra-info key triggers it as soon as one sample omits it.
Thanks @giladfrid009 for the precise root cause and repro.

Fix

Drop None entries before aggregating. When predictions are present, the
(value, prediction) pairs are filtered together so the majority-voting
branch still zips 1:1; a uid whose only value was None is skipped entirely.
Groups with no None are untouched (no behavior change on the common path).

Tests

tests/trainer/ppo/test_metric_utils_on_cpu.py:

sparse key ([1.0, None, None]) aggregates over the present value; an
all-None key is skipped; the dense key is unaffected;
a sparse key with a None in the middle plus predictions
(score=[1.0, None, 2.0], pred=[A, B, A]) keeps preds aligned so maj@2
is still produced (regression guard against value/pred desync).

All TestProcessValidationMetrics tests pass locally.

…eward keys A reward_extra_info key emitted for only some validation samples is null-filled to a uniform schema in `_validate`, so a per-uid group can contain None. The aggregation skip-filter only caught empty lists and strings, so `np.mean([..., None, ...])` raised `TypeError: unsupported operand type(s) for /: 'NoneType' and 'int'` and crashed validation — any reward function that conditionally adds an extra-info key triggers it. Drop the None entries before aggregating. When predictions are present, filter the (value, prediction) pairs together so majority voting still zips 1:1; a uid whose only value was None is skipped entirely. No behavior change when a group has no None. Fixes verl-project#6830 Signed-off-by: abinggo <107740309+abinggo@users.noreply.github.com>

gemini-code-assist

Code Review

This pull request addresses an issue where sparse reward_extra_info keys containing null-filled values (None) caused crashes during metric aggregation. It introduces logic in metric_utils.py to drop None entries before aggregation and ensures that prediction alignments are maintained for majority voting. Additionally, two new unit tests have been added to verify these changes. I have no feedback to provide as the implementation is correct and well-tested.

Important

The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.

abinggo requested review from PeterSH6, eric-haibin-lin, tongyx361 and vermouth1992 as code owners June 25, 2026 07:32

gemini-code-assist Bot reviewed Jun 25, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[trainer] Fix process_validation_metrics crash on None-filled sparse reward keys#6845

[trainer] Fix process_validation_metrics crash on None-filled sparse reward keys#6845
abinggo wants to merge 1 commit into
verl-project:mainfrom
abinggo:fix/validation-metrics-none-sparse-key

abinggo commented Jun 25, 2026

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

abinggo commented Jun 25, 2026

Summary

Fix

Tests

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant