Skip to content

[trainer] Fix process_validation_metrics crash on None-filled sparse reward keys#6845

Open
abinggo wants to merge 1 commit into
verl-project:mainfrom
abinggo:fix/validation-metrics-none-sparse-key
Open

[trainer] Fix process_validation_metrics crash on None-filled sparse reward keys#6845
abinggo wants to merge 1 commit into
verl-project:mainfrom
abinggo:fix/validation-metrics-none-sparse-key

Conversation

@abinggo

@abinggo abinggo commented Jun 25, 2026

Copy link
Copy Markdown
Contributor

Summary

Fixes #6830.

When a reward_extra_info key is emitted for only some validation samples (a
sparse key), _validate null-fills it to a uniform schema, so a per-uid group
passed to process_validation_metrics can contain None. The aggregation
skip-filter only caught empty lists and strings:

if not var_vals or isinstance(var_vals[0], str):
    continue

so np.mean([..., None, ...]) raised TypeError: unsupported operand type(s) for /: 'NoneType' and 'int' and crashed validation. Any reward function that
conditionally adds an extra-info key triggers it as soon as one sample omits it.
Thanks @giladfrid009 for the precise root cause and repro.

Fix

Drop None entries before aggregating. When predictions are present, the
(value, prediction) pairs are filtered together so the majority-voting
branch still zips 1:1; a uid whose only value was None is skipped entirely.
Groups with no None are untouched (no behavior change on the common path).

Tests

tests/trainer/ppo/test_metric_utils_on_cpu.py:

  • sparse key ([1.0, None, None]) aggregates over the present value; an
    all-None key is skipped; the dense key is unaffected;
  • a sparse key with a None in the middle plus predictions
    (score=[1.0, None, 2.0], pred=[A, B, A]) keeps preds aligned so maj@2
    is still produced (regression guard against value/pred desync).

All TestProcessValidationMetrics tests pass locally.

…eward keys

A reward_extra_info key emitted for only some validation samples is null-filled
to a uniform schema in `_validate`, so a per-uid group can contain None. The
aggregation skip-filter only caught empty lists and strings, so
`np.mean([..., None, ...])` raised `TypeError: unsupported operand type(s) for
/: 'NoneType' and 'int'` and crashed validation — any reward function that
conditionally adds an extra-info key triggers it.

Drop the None entries before aggregating. When predictions are present, filter
the (value, prediction) pairs together so majority voting still zips 1:1; a uid
whose only value was None is skipped entirely. No behavior change when a group
has no None.

Fixes verl-project#6830

Signed-off-by: abinggo <107740309+abinggo@users.noreply.github.com>

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request addresses an issue where sparse reward_extra_info keys containing null-filled values (None) caused crashes during metric aggregation. It introduces logic in metric_utils.py to drop None entries before aggregation and ensures that prediction alignments are maintained for majority voting. Additionally, two new unit tests have been added to verify these changes. I have no feedback to provide as the implementation is correct and well-tested.

Important

The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

process_validation_metrics() crashes on None-filled sparse reward_extra_info keys (metrics emitted for only some samples)

1 participant