Skip to content

Add Megatron-FSDP (mfsdp) optimizer backend#12

Open
ISEEKYAN wants to merge 1 commit into
devlitefrom
mlite-mfsdp-backend
Open

Add Megatron-FSDP (mfsdp) optimizer backend#12
ISEEKYAN wants to merge 1 commit into
devlitefrom
mlite-mfsdp-backend

Conversation

@ISEEKYAN

@ISEEKYAN ISEEKYAN commented Jun 8, 2026

Copy link
Copy Markdown
Owner

Ports the validated Megatron-FSDP delivery into megatron.lite as an optimizer backend (registered as BACKENDS['mfsdp']): backend, grad_norm, config, process groups, checkpoint keys, dtensor grad, optimizer, patches. Qwen3-MoE / Qwen3.5 protocols gain an optimizer='mfsdp' post-load path. Unit tests included; GPU smoke to follow.

Port the validated Megatron-FSDP delivery into megatron.lite as an optimizer
backend (registered as BACKENDS['mfsdp']): backend, grad_norm, config, process
groups, checkpoint keys, dtensor grad, optimizer, patches. Qwen3-MoE / Qwen3.5
protocols gain an optimizer='mfsdp' post-load path. Unit tests included.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant