Skip to content

Migrate GPT-OSS to HybridModel#4476

Open
Phlip79 wants to merge 4 commits into
philip/hybrid-flop-accountingfrom
philip/gpt-oss-hybrid
Open

Migrate GPT-OSS to HybridModel#4476
Phlip79 wants to merge 4 commits into
philip/hybrid-flop-accountingfrom
philip/gpt-oss-hybrid

Conversation

@Phlip79

@Phlip79 Phlip79 commented Jun 24, 2026

Copy link
Copy Markdown
Member

What does this PR do ?

Migrates GPT-OSS from using GPTModel to HybridModel. To accomplish this, I needed to update HybridModelProvider to support yarn, which HybridModel already supports in MCore.

Perf comparison results

Needs #4508 for correct TFLOPS calculation

Signed-off-by: Philip Petrakian <ppetrakian@nvidia.com>
@copy-pr-bot

copy-pr-bot Bot commented Jun 24, 2026

Copy link
Copy Markdown

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@Phlip79

Phlip79 commented Jun 24, 2026

Copy link
Copy Markdown
Member Author

/ok to test c5e00e4

Signed-off-by: Philip Petrakian <ppetrakian@nvidia.com>
@Phlip79

Phlip79 commented Jun 24, 2026

Copy link
Copy Markdown
Member Author

/ok to test 8e4d0fa

Signed-off-by: Philip Petrakian <ppetrakian@nvidia.com>
@Phlip79 Phlip79 force-pushed the philip/gpt-oss-hybrid branch from d3165d6 to 46fb300 Compare June 25, 2026 20:45
@Phlip79 Phlip79 changed the base branch from main to philip/hybrid-flop-accounting June 25, 2026 21:25
@Phlip79 Phlip79 marked this pull request as ready for review June 25, 2026 21:27
@Phlip79 Phlip79 requested a review from yaoyu-33 June 25, 2026 21:27
Comment thread src/megatron/bridge/models/gpt_oss/gpt_oss_bridge.py
Comment thread src/megatron/bridge/models/gpt_oss/gpt_oss_bridge.py
Comment thread tests/unit_tests/models/gpt_oss/test_gpt_oss_bridges.py
@claude

claude Bot commented Jun 25, 2026

Copy link
Copy Markdown
Contributor

Light Code Review

Overall the migration from GPTModel/GPTModelProvider to HybridModel/HybridModelProvider looks correct and well-tested for the core paths. A few observations:

  1. Missing test for megatron_to_hf_config - The new classmethod that recovers num_hidden_layers from the hybrid layer pattern has no unit test. A small test with a known pattern would guard against regressions in the Symbols.PIPE/MTP_SEPARATOR stripping logic.

  2. Dual model.norm.weight mapping - Intentional per the test, but deserves a brief inline comment explaining why both decoder.final_layernorm.weight and decoder.final_norm.weight are registered.

  3. YARN settings not verified in test - test_provider_bridge_maps_config does not assert position_embedding_type == yarn or any YARN-specific field, even though extending HybridModelProvider with YARN fields is a key part of this PR.


Suggested test cases: No perf/recipe config files are changed in this PR, but this migration affects all GPT-OSS perf tests since the underlying provider type changed. All GPT-OSS perf configs (20B and 120B, all GPU targets and precisions) should be validated: gpt_oss_20b_8gpu_pretrain_perf (all GPU/precision combos), gpt_oss_120b_pretrain_perf (all GPU/precision combos), test_gpt_oss_120b_perf_config_instantiation, L1 functional L1_Launch_recipes_gpt_oss (pretrain + finetune), L1 functional L1_Launch_models_gpt_oss (model conversion tests).

Signed-off-by: Philip Petrakian <ppetrakian@nvidia.com>
@yaoyu-33 yaoyu-33 added feature New capabilities, enhancements, or enablement work area:model Model implementations and HF bridge logic needs-review PR is ready for code review and waiting on a reviewer needs-more-tests Requires additional L0 and L1 test coverage before merge full-test-suite labels Jun 25, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area:model Model implementations and HF bridge logic feature New capabilities, enhancements, or enablement work full-test-suite needs-more-tests Requires additional L0 and L1 test coverage before merge needs-review PR is ready for code review and waiting on a reviewer

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants