feat(conversion): add direct ModelConfig mapping by yaoyu-33 · Pull Request #4543 · NVIDIA-NeMo/Megatron-Bridge

yaoyu-33 · 2026-06-28T02:47:06Z

What does this PR do?

Add the first direct ModelConfig / ModelBuilder foundation to the conversion API for RL and external-trainer integrations.

add AutoBridge.to_megatron_model_config()
add MegatronModelBridge.hf_config_to_model_config() as the canonical HF-config-to-ModelConfig path
reuse the existing CONFIG_MAPPING without routing the new path through provider_bridge()
preserve Bridge's established construction defaults where upstream ModelConfig defaults differ
cover stock GPT via Llama and Hybrid via Nemotron-H
keep the provider path compatible while its callers migrate

Related to #4294. This also incorporates the useful direction from the incomplete reference draft #2810, rebuilt against current main with focused parity tests.

Design

The dependency direction is intentionally:

HF config -> CONFIG_MAPPING -> hf_config_to_model_config() -> ModelConfig
                         \-> hf_config_to_provider_kwargs() -> provider (compatibility path)

to_megatron_model_config() is configuration-only: it does not load weights, initialize distributed state, finalize the config, or build a model.

Model families opt in explicitly with MODEL_CONFIG_CLASS and may specialize hf_config_to_model_config() for architecture defaults. Llama maps directly to GPTModelConfig; Nemotron-H maps directly to HybridModelConfig. Tests patch provider_bridge() and the provider-era mapping alias to fail if the new path accidentally depends on either one.

The shared CONFIG_MAPPING traversal is kept private. hf_config_to_provider_kwargs() remains as the provider compatibility hook, and provider_bridge() continues calling it so downstream subclasses that override the hook are not broken.

The direct path explicitly carries Bridge construction defaults such as loss/gradient fusions and pipeline output deallocation. Nemotron-H additionally preserves its Hybrid-specific RMSNorm, flash-attention, fusion, and MTP defaults. This avoids silently changing training behavior merely because upstream builder dataclasses use more conservative defaults.

This PR does not emit a runtime deprecation warning for provider_bridge() / to_megatron_provider() yet: current model construction and checkpoint import still depend on them. Formal deprecation should follow the builder-backed to_megatron_model() migration, when callers have a complete replacement.

Tests

Added focused coverage for:

Llama/GPT provider-to-ModelConfig parity across all overlapping transformer and model fields
a non-default Llama rotary base, Llama 2 RoPE defaults, and Llama 3 scaling preservation
Nemotron-H/Hybrid provider-to-ModelConfig parity across all overlapping fields
direct-path independence from provider_bridge() and hf_config_to_provider_kwargs()
AutoBridge.to_megatron_model_config() delegation

Validation:

nightly NeMo container focused pytest: 4 passed
uvx pre-commit run --all-files: passed
git diff --check: passed
changed-source compile check: passed

The focused pytest command covered the two Llama config tests, the Nemotron-H parity test, and AutoBridge delegation. The local macOS ARM environment cannot resolve the full project because nvidia-resiliency-ext is distributed only as Linux wheels; no dependency or lockfile changes were made.

Intentional limitations / follow-up

only explicitly opted-in GPT and Hybrid families are supported in this first PR
no overrides argument, aliases, or generic TypeVar helper are added
no provider mutation or duplicate provider_bridge abstraction is introduced
no MLA, VLM, metadata serialization, weight-loading, distributed initialization, or builder-backed construction changes
the next construction PR can migrate to_megatron_model() and then formally deprecate the public provider path

Checklist

Public APIs include type hints and docstrings
Focused parity coverage added and run in a Linux nightly container
DCO sign-off included
No dependency, CI workflow, or 3rdparty/Megatron-LM changes

Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>

copy-pr-bot · 2026-06-28T02:47:09Z

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

yaoyu-33 · 2026-06-28T02:47:32Z

/ok to test b0775aa

Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>

yaoyu-33 · 2026-06-28T02:52:58Z

/ok to test 381d193

yaoyu-33 · 2026-06-28T03:07:53Z

/ok to test 381d193

Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>

yaoyu-33 · 2026-06-28T03:27:39Z

/ok to test 3c453cc

Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>

yaoyu-33 · 2026-06-28T03:28:29Z

/ok to test d7a27b1fa68c66f7b92d8983e3a9246e80cb10af

copy-pr-bot · 2026-06-28T03:28:32Z

/ok to test d7a27b1fa68c66f7b92d8983e3a9246e80cb10af

@yaoyu-33, there was an error processing your request: E2

See the following link for more information: https://docs.gha-runners.nvidia.com/cpr/e/2/

yaoyu-33 · 2026-06-28T03:28:44Z

/ok to test d7a27b1

Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>

yaoyu-33 · 2026-06-28T03:30:57Z

/ok to test c5154c1

Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>

yaoyu-33 · 2026-06-28T07:07:29Z

/ok to test 03c781b

feat(conversion): add ModelConfig bridge foundation

b0775aa

Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>

copy-pr-bot Bot temporarily deployed to public June 28, 2026 02:48 Inactive

copy-pr-bot Bot had a problem deploying to test June 28, 2026 02:48 Error

fix(model): preserve explicit override precedence

381d193

Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>

copy-pr-bot Bot temporarily deployed to test June 28, 2026 02:54 Inactive

copy-pr-bot Bot temporarily deployed to public June 28, 2026 02:54 Inactive

copy-pr-bot Bot temporarily deployed to public June 28, 2026 03:03 Inactive

copy-pr-bot Bot temporarily deployed to public June 28, 2026 03:04 Inactive

refactor(conversion): map HF configs directly

3c453cc

Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>

refactor(conversion): keep shared config mapping private

d7a27b1

Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>

copy-pr-bot Bot temporarily deployed to public June 28, 2026 03:29 Inactive

copy-pr-bot Bot had a problem deploying to test June 28, 2026 03:29 Error

fix(conversion): validate direct config types

c5154c1

Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>

yaoyu-33 changed the title ~~feat(conversion): add ModelConfig bridge foundation~~ feat(conversion): add direct ModelConfig mapping Jun 28, 2026

copy-pr-bot Bot temporarily deployed to public June 28, 2026 03:31 Inactive

copy-pr-bot Bot temporarily deployed to test June 28, 2026 03:31 Inactive

copy-pr-bot Bot temporarily deployed to public June 28, 2026 03:40 Inactive

copy-pr-bot Bot temporarily deployed to public June 28, 2026 03:41 Inactive

copy-pr-bot Bot temporarily deployed to public June 28, 2026 04:08 Inactive

fix(conversion): preserve builder config defaults

03c781b

Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>

copy-pr-bot Bot temporarily deployed to public June 28, 2026 07:07 Inactive

copy-pr-bot Bot temporarily deployed to test June 28, 2026 07:08 Inactive

copy-pr-bot Bot temporarily deployed to public June 28, 2026 07:17 Inactive

copy-pr-bot Bot temporarily deployed to public June 28, 2026 07:43 Inactive

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat(conversion): add direct ModelConfig mapping#4543

feat(conversion): add direct ModelConfig mapping#4543
yaoyu-33 wants to merge 6 commits into
mainfrom
codex/model-config-bridge-foundation

yaoyu-33 commented Jun 28, 2026 •

edited

Loading

Uh oh!

copy-pr-bot Bot commented Jun 28, 2026

Uh oh!

yaoyu-33 commented Jun 28, 2026

Uh oh!

yaoyu-33 commented Jun 28, 2026

Uh oh!

yaoyu-33 commented Jun 28, 2026

Uh oh!

yaoyu-33 commented Jun 28, 2026

Uh oh!

yaoyu-33 commented Jun 28, 2026

Uh oh!

copy-pr-bot Bot commented Jun 28, 2026

Uh oh!

yaoyu-33 commented Jun 28, 2026

Uh oh!

yaoyu-33 commented Jun 28, 2026

Uh oh!

yaoyu-33 commented Jun 28, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

yaoyu-33 commented Jun 28, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

Design

Tests

Intentional limitations / follow-up

Checklist

Uh oh!

copy-pr-bot Bot commented Jun 28, 2026

Uh oh!

yaoyu-33 commented Jun 28, 2026

Uh oh!

yaoyu-33 commented Jun 28, 2026

Uh oh!

yaoyu-33 commented Jun 28, 2026

Uh oh!

yaoyu-33 commented Jun 28, 2026

Uh oh!

yaoyu-33 commented Jun 28, 2026

Uh oh!

copy-pr-bot Bot commented Jun 28, 2026

Uh oh!

yaoyu-33 commented Jun 28, 2026

Uh oh!

yaoyu-33 commented Jun 28, 2026

Uh oh!

yaoyu-33 commented Jun 28, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

yaoyu-33 commented Jun 28, 2026 •

edited

Loading