Skip to content

feat(conversion): add direct ModelConfig mapping#4543

Draft
yaoyu-33 wants to merge 6 commits into
mainfrom
codex/model-config-bridge-foundation
Draft

feat(conversion): add direct ModelConfig mapping#4543
yaoyu-33 wants to merge 6 commits into
mainfrom
codex/model-config-bridge-foundation

Conversation

@yaoyu-33

@yaoyu-33 yaoyu-33 commented Jun 28, 2026

Copy link
Copy Markdown
Contributor

What does this PR do?

Add the first direct ModelConfig / ModelBuilder foundation to the conversion API for RL and external-trainer integrations.

  • add AutoBridge.to_megatron_model_config()
  • add MegatronModelBridge.hf_config_to_model_config() as the canonical HF-config-to-ModelConfig path
  • reuse the existing CONFIG_MAPPING without routing the new path through provider_bridge()
  • preserve Bridge's established construction defaults where upstream ModelConfig defaults differ
  • cover stock GPT via Llama and Hybrid via Nemotron-H
  • keep the provider path compatible while its callers migrate

Related to #4294. This also incorporates the useful direction from the incomplete reference draft #2810, rebuilt against current main with focused parity tests.

Design

The dependency direction is intentionally:

HF config -> CONFIG_MAPPING -> hf_config_to_model_config() -> ModelConfig
                         \-> hf_config_to_provider_kwargs() -> provider (compatibility path)

to_megatron_model_config() is configuration-only: it does not load weights, initialize distributed state, finalize the config, or build a model.

Model families opt in explicitly with MODEL_CONFIG_CLASS and may specialize hf_config_to_model_config() for architecture defaults. Llama maps directly to GPTModelConfig; Nemotron-H maps directly to HybridModelConfig. Tests patch provider_bridge() and the provider-era mapping alias to fail if the new path accidentally depends on either one.

The shared CONFIG_MAPPING traversal is kept private. hf_config_to_provider_kwargs() remains as the provider compatibility hook, and provider_bridge() continues calling it so downstream subclasses that override the hook are not broken.

The direct path explicitly carries Bridge construction defaults such as loss/gradient fusions and pipeline output deallocation. Nemotron-H additionally preserves its Hybrid-specific RMSNorm, flash-attention, fusion, and MTP defaults. This avoids silently changing training behavior merely because upstream builder dataclasses use more conservative defaults.

This PR does not emit a runtime deprecation warning for provider_bridge() / to_megatron_provider() yet: current model construction and checkpoint import still depend on them. Formal deprecation should follow the builder-backed to_megatron_model() migration, when callers have a complete replacement.

Tests

Added focused coverage for:

  • Llama/GPT provider-to-ModelConfig parity across all overlapping transformer and model fields
  • a non-default Llama rotary base, Llama 2 RoPE defaults, and Llama 3 scaling preservation
  • Nemotron-H/Hybrid provider-to-ModelConfig parity across all overlapping fields
  • direct-path independence from provider_bridge() and hf_config_to_provider_kwargs()
  • AutoBridge.to_megatron_model_config() delegation

Validation:

  • nightly NeMo container focused pytest: 4 passed
  • uvx pre-commit run --all-files: passed
  • git diff --check: passed
  • changed-source compile check: passed

The focused pytest command covered the two Llama config tests, the Nemotron-H parity test, and AutoBridge delegation. The local macOS ARM environment cannot resolve the full project because nvidia-resiliency-ext is distributed only as Linux wheels; no dependency or lockfile changes were made.

Intentional limitations / follow-up

  • only explicitly opted-in GPT and Hybrid families are supported in this first PR
  • no overrides argument, aliases, or generic TypeVar helper are added
  • no provider mutation or duplicate provider_bridge abstraction is introduced
  • no MLA, VLM, metadata serialization, weight-loading, distributed initialization, or builder-backed construction changes
  • the next construction PR can migrate to_megatron_model() and then formally deprecate the public provider path

Checklist

  • Public APIs include type hints and docstrings
  • Focused parity coverage added and run in a Linux nightly container
  • DCO sign-off included
  • No dependency, CI workflow, or 3rdparty/Megatron-LM changes

Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>
@copy-pr-bot

copy-pr-bot Bot commented Jun 28, 2026

Copy link
Copy Markdown

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@yaoyu-33

Copy link
Copy Markdown
Contributor Author

/ok to test b0775aa

Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>
@yaoyu-33

Copy link
Copy Markdown
Contributor Author

/ok to test 381d193

@yaoyu-33

Copy link
Copy Markdown
Contributor Author

/ok to test 381d193

Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>
@yaoyu-33

Copy link
Copy Markdown
Contributor Author

/ok to test 3c453cc

Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>
@yaoyu-33

Copy link
Copy Markdown
Contributor Author

/ok to test d7a27b1fa68c66f7b92d8983e3a9246e80cb10af

@copy-pr-bot

copy-pr-bot Bot commented Jun 28, 2026

Copy link
Copy Markdown

/ok to test d7a27b1fa68c66f7b92d8983e3a9246e80cb10af

@yaoyu-33, there was an error processing your request: E2

See the following link for more information: https://docs.gha-runners.nvidia.com/cpr/e/2/

@yaoyu-33

Copy link
Copy Markdown
Contributor Author

/ok to test d7a27b1

Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>
@yaoyu-33

Copy link
Copy Markdown
Contributor Author

/ok to test c5154c1

@yaoyu-33 yaoyu-33 changed the title feat(conversion): add ModelConfig bridge foundation feat(conversion): add direct ModelConfig mapping Jun 28, 2026
Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>
@yaoyu-33

Copy link
Copy Markdown
Contributor Author

/ok to test 03c781b

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant