Skip to content

fix(config): seed no-model refs for prepare-data#1745

Open
rodboev wants to merge 1 commit into
NVIDIA-NeMo:mainfrom
rodboev:pr/prepare-data-no-model-refs
Open

fix(config): seed no-model refs for prepare-data#1745
rodboev wants to merge 1 commit into
NVIDIA-NeMo:mainfrom
rodboev:pr/prepare-data-no-model-refs

Conversation

@rodboev

@rodboev rodboev commented Jun 26, 2026

Copy link
Copy Markdown

Summary

ng_prepare_data still fails when the no-model parse path encounters additional response-model refs such as user_model or judge_model, because only policy_model gets a dummy server config.

This seeds missing referenced responses_api_models server blocks during the no-model parse path and leaves the existing dummy cleanup to handle populated real model configs.

Closes #997

Root cause

GlobalConfigDictParserConfig.NO_MODEL_GLOBAL_CONFIG_DICT only provides a dummy policy_model. After the parser merges configs, server-ref validation checks referenced model-server names against top-level server instances. When a config references responses_api_models/user_model and no top-level user_model exists, parsing raises ServerRefNotFoundError.

The existing dummy_model cleanup only rewrites model-server blocks that already exist. It cannot synthesize the missing top-level user_model or judge_model block.

Changes

  • Seed missing referenced responses_api_models server blocks only when the no-model parse path is active.
  • Reuse the existing dummy model shape and existing cleanup pass.
  • Add focused parser coverage for arbitrary non-policy model-server refs.
  • Add a direct prepare_data() entry-path regression test.

Scope

  • No schema redesign.
  • No benchmark-specific branches.
  • No changes to resources-server or agent-server refs.
  • No dependency, docs, or provider changes.

Validation

  • uv run pytest tests/unit_tests/test_global_config.py tests/unit_tests/test_train_data_utils.py -k "dummy_model or no_model_config_seeds_referenced_model_server_refs or prepare_data_no_model_config_seeds_non_policy_model_refs" -x -v - blocked before collection on Windows because uvloop==0.21.0 does not support Windows
  • uv run pre-commit run --files nemo_gym/global_config.py tests/unit_tests/test_global_config.py tests/unit_tests/test_train_data_utils.py - not run because uv run is blocked by the same Windows uvloop build limitation
  • ruff check --config pyproject.toml nemo_gym/global_config.py tests/unit_tests/test_global_config.py tests/unit_tests/test_train_data_utils.py
  • ruff format --config pyproject.toml --check nemo_gym/global_config.py tests/unit_tests/test_global_config.py tests/unit_tests/test_train_data_utils.py
  • Broader CI matrix - covered by pull_request workflows, not run locally

Notes

The new regression should fail on the current base with ServerRefNotFoundError for responses_api_models/'user_model', then pass after the parser seeds the missing dummy model-server block.

Signed-off-by: Rod Boev <rod.boev@gmail.com>
@copy-pr-bot

copy-pr-bot Bot commented Jun 26, 2026

Copy link
Copy Markdown

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@nemo-automation-bot nemo-automation-bot Bot added the community-request Issue reported or requested by someone from the community label Jun 26, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

community-request Issue reported or requested by someone from the community

Projects

None yet

Development

Successfully merging this pull request may close these issues.

ng_prepare_data fails for configs with multiple model servers (e.g. user_model)

1 participant