Enable expert-parallel language modules in MegatronMIMO#4485
Conversation
Signed-off-by: Li Ding <liding@nvidia.com>
|
/ok to test d3cfc95 |
|
/ok to test 6f1b8af |
|
Review Finding: CLI help text missing ep=N File: examples/conversion/convert_megatron_mimo.py, line 265 The ep=N key was added to _COMPONENT_KEY_TO_FIELD (line 66) and to the docstring/error message in _parse_component_spec (lines 75, 79), but the --component help text in _add_common_args still reads: name=tp=N[,pp=N,dp=N,cp=N,etp=N,rank_offset=N] It should include ep=N: name=tp=N[,pp=N,dp=N,cp=N,ep=N,etp=N,rank_offset=N] Everything else looks solid. The rank algebra (dense_model_parallel_size excluding EP/ETP), the expert factorization validation, the tiling validation upgrade (gaps + world_size coverage), the Phase 1 guards at both the config and builder layers, and the provider sync are all consistent and well-tested. The test coverage is thorough across all changed code paths. Suggested test cases: No perf tests impacted. |
Signed-off-by: Li Ding <liding@nvidia.com>
Signed-off-by: Li Ding <liding@nvidia.com>
6f1b8af to
b27324c
Compare
Summary
This PR enables MegatronMIMO to describe and build non-colocated language MoE modules with expert parallelism while keeping encoder modules dense.
Key changes:
expert_model_parallel_size/expert_tensor_parallel_sizehandling to MegatronMIMO parallelism config.tp, cp, dp, ppexpt_tp, ep, expt_dp, ppProcessGroupCollectionand MCore compatibility globals.MimoModelconstruction for sequence-parallel partitioning paths.epthrough the MegatronMIMO conversion CLI component parser.