Skip to content

feat(resources): add shared MultiRewardVerifyResponse contract#1793

Open
rodboev wants to merge 2 commits into
NVIDIA-NeMo:mainfrom
rodboev:pr/shared-multireward-contract
Open

feat(resources): add shared MultiRewardVerifyResponse contract#1793
rodboev wants to merge 2 commits into
NVIDIA-NeMo:mainfrom
rodboev:pr/shared-multireward-contract

Conversation

@rodboev

@rodboev rodboev commented Jun 27, 2026

Copy link
Copy Markdown

Summary

Add a shared MultiRewardVerifyResponse contract in nemo_gym so multi-reward verifiers can expose reward_components through an importable shared type instead of an example-local subclass, and update the checked-in multi-reward docs to teach that shared contract.

Closes #1664

Follow-up to #1525, which introduced the example-local reward_components shape and the GDPO motivation that this shared contract now formalizes.

Background

The example tool-call multi-reward server already documents reward_components as the contract GDPO-style consumers should read, but current origin/main defines that field only on resources_servers/example_tool_call_multireward/app.py's local response subclass. That leaves new multi-reward environments without a shared nemo_gym type to inherit.

Changes

  • Add MultiRewardVerifyResponse(BaseVerifyResponse) in nemo_gym.base_resources_server with optional reward_components.
  • Update the example tool-call multi-reward response to inherit from the shared class while keeping its server-specific correctness, schema_valid, format, and predicted_calls fields.
  • Add focused regression coverage that proves the example verifier now returns an instance of the shared contract and preserves the existing component payload.
  • Update the example README, the latest multi-reward tutorial, and the latest API reference entry where they need to name the shared contract.

Scope

  • BaseVerifyResponse remains scalar-only for existing single-reward verifiers.
  • Trainer-side integration, NeMo-RL or VeRL wiring, version-pinned docs backfills, and new dependencies stay out of scope.
  • Reward math and top-level per-component example fields remain unchanged.

Before / After

  • Before: reward_components exists only on ToolCallMultiRewardVerifyResponse inside resources_servers/example_tool_call_multireward/app.py.
  • After: reward_components lives on shared MultiRewardVerifyResponse, the example response inherits that contract instead of defining it locally, and the latest tutorial/API reference now point users at the shared type.

Validation

  • ng_test +entrypoint=resources_servers/example_tool_call_multireward (Not run locally.)
  • pre-commit run --files nemo_gym/base_resources_server.py resources_servers/example_tool_call_multireward/app.py resources_servers/example_tool_call_multireward/tests/test_app.py resources_servers/example_tool_call_multireward/README.md (Not run locally.)
  • ruff check --config pyproject.toml nemo_gym/base_resources_server.py resources_servers/example_tool_call_multireward/app.py resources_servers/example_tool_call_multireward/tests/test_app.py
  • ruff format --config pyproject.toml --check nemo_gym/base_resources_server.py resources_servers/example_tool_call_multireward/app.py resources_servers/example_tool_call_multireward/tests/test_app.py
  • npm run check in fern/
  • Broader CI matrix, covered by pull_request workflows and not run locally

Notes

The latest docs now teach the shared contract directly. The version-pinned fern/versions/v0.3.0 API reference was left unchanged.

@copy-pr-bot

copy-pr-bot Bot commented Jun 27, 2026

Copy link
Copy Markdown

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@nemo-automation-bot nemo-automation-bot Bot added the community-request Issue reported or requested by someone from the community label Jun 27, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

community-request Issue reported or requested by someone from the community

Projects

None yet

Development

Successfully merging this pull request may close these issues.

feat: shared MultiRewardVerifyResponse (reward_components) as a trainer-agnostic multi-reward contract

1 participant