[trainer] Fix process_validation_metrics crash on None-filled sparse reward keys#6845
[trainer] Fix process_validation_metrics crash on None-filled sparse reward keys#6845abinggo wants to merge 1 commit into
Conversation
…eward keys A reward_extra_info key emitted for only some validation samples is null-filled to a uniform schema in `_validate`, so a per-uid group can contain None. The aggregation skip-filter only caught empty lists and strings, so `np.mean([..., None, ...])` raised `TypeError: unsupported operand type(s) for /: 'NoneType' and 'int'` and crashed validation — any reward function that conditionally adds an extra-info key triggers it. Drop the None entries before aggregating. When predictions are present, filter the (value, prediction) pairs together so majority voting still zips 1:1; a uid whose only value was None is skipped entirely. No behavior change when a group has no None. Fixes verl-project#6830 Signed-off-by: abinggo <107740309+abinggo@users.noreply.github.com>
There was a problem hiding this comment.
Code Review
This pull request addresses an issue where sparse reward_extra_info keys containing null-filled values (None) caused crashes during metric aggregation. It introduces logic in metric_utils.py to drop None entries before aggregation and ensures that prediction alignments are maintained for majority voting. Additionally, two new unit tests have been added to verify these changes. I have no feedback to provide as the implementation is correct and well-tested.
Important
The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.
Summary
Fixes #6830.
When a
reward_extra_infokey is emitted for only some validation samples (asparse key),
_validatenull-fills it to a uniform schema, so a per-uid grouppassed to
process_validation_metricscan containNone. The aggregationskip-filter only caught empty lists and strings:
so
np.mean([..., None, ...])raisedTypeError: unsupported operand type(s) for /: 'NoneType' and 'int'and crashed validation. Any reward function thatconditionally adds an extra-info key triggers it as soon as one sample omits it.
Thanks @giladfrid009 for the precise root cause and repro.
Fix
Drop
Noneentries before aggregating. When predictions are present, the(value, prediction)pairs are filtered together so the majority-votingbranch still zips 1:1; a uid whose only value was
Noneis skipped entirely.Groups with no
Noneare untouched (no behavior change on the common path).Tests
tests/trainer/ppo/test_metric_utils_on_cpu.py:[1.0, None, None]) aggregates over the present value; anall-
Nonekey is skipped; the dense key is unaffected;Nonein the middle plus predictions(
score=[1.0, None, 2.0],pred=[A, B, A]) keeps preds aligned somaj@2is still produced (regression guard against value/pred desync).
All
TestProcessValidationMetricstests pass locally.