Migrate GPT-OSS to HybridModel#4476
Conversation
Signed-off-by: Philip Petrakian <ppetrakian@nvidia.com>
|
/ok to test c5e00e4 |
Signed-off-by: Philip Petrakian <ppetrakian@nvidia.com>
|
/ok to test 8e4d0fa |
Signed-off-by: Philip Petrakian <ppetrakian@nvidia.com>
d3165d6 to
46fb300
Compare
Light Code ReviewOverall the migration from GPTModel/GPTModelProvider to HybridModel/HybridModelProvider looks correct and well-tested for the core paths. A few observations:
Suggested test cases: No perf/recipe config files are changed in this PR, but this migration affects all GPT-OSS perf tests since the underlying provider type changed. All GPT-OSS perf configs (20B and 120B, all GPU targets and precisions) should be validated: gpt_oss_20b_8gpu_pretrain_perf (all GPU/precision combos), gpt_oss_120b_pretrain_perf (all GPU/precision combos), test_gpt_oss_120b_perf_config_instantiation, L1 functional L1_Launch_recipes_gpt_oss (pretrain + finetune), L1 functional L1_Launch_models_gpt_oss (model conversion tests). |
Signed-off-by: Philip Petrakian <ppetrakian@nvidia.com>
What does this PR do ?
Migrates GPT-OSS from using
GPTModeltoHybridModel. To accomplish this, I needed to updateHybridModelProviderto support yarn, whichHybridModelalready supports in MCore.Perf comparison results
Needs #4508 for correct TFLOPS calculation