feat(gemma4): add Gemma4 dense and MoE support by EazyReal · Pull Request #2135 · THUDM/slime

EazyReal · 2026-06-26T18:24:14Z

Summary

Adds Gemma4 model support for both dense and MoE variants in Slime:

Megatron model/provider support for Gemma4 dense and 26B-A4B MoE checkpoints
MBridge and Megatron-to-HF conversion support
Gemma4 loss-mask option and model config scripts
Slime-native GSM8K proof recipes for google/gemma-4-31B-it and google/gemma-4-26B-A4B-it
English and Chinese example docs with the validated topologies

The implementation covers Gemma4-specific attention and conversion behavior, including dual RoPE, global/sliding attention, K=V global attention handling, layer scalar broadcast, router/expert parameters, QKV packing, and CP indexing.

Relation to #1855

#1855 is an earlier Gemma4 integration branch. This PR keeps the same upstream goal but uses a refreshed, focused branch for review:

keeps the contribution Slime-native with GSM8K validation recipes rather than a task-specific generation flow
separates the colocated raw weight sync edge-case fix into fix: handle empty colocated weight buckets #2134
adds runnable dense and MoE proof scripts plus public W&B evidence
removes unrelated integration surface so this PR is just Gemma4 model support, conversion, loss masking, docs, scripts, and tests

Dependency

The large-model MoE proof also exercised the colocated raw weight sync edge case fixed in #2134. This PR intentionally keeps the Gemma4 model support separate from that bugfix; if maintainers run the 26B-A4B colocated recipe before #2134 lands, they should apply that fix as well.

Validation

Local validation on this Gemma4-only branch:

uv run --with pytest --with torch --with transformers --with safetensors --with numpy python -m pytest tests/gemma4 tests/utils/test_loss_mask_type_gemma4.py -q
- 56 passed, 14 skipped
uv run --with ruff ruff check slime_plugins/models/gemma4.py slime_plugins/models/gemma4_provider.py slime_plugins/mbridge/gemma4.py slime/backends/megatron_utils/megatron_to_hf/gemma4.py slime/utils/mask_utils.py tests/gemma4 tests/utils/test_loss_mask_type_gemma4.py
uv run --with black black --check slime_plugins/models/gemma4.py slime_plugins/models/gemma4_provider.py slime_plugins/mbridge/gemma4.py slime/backends/megatron_utils/megatron_to_hf/gemma4.py slime/utils/mask_utils.py tests/gemma4 tests/utils/test_loss_mask_type_gemma4.py
bash -n scripts/run-gemma4-31B-gsm8k.sh scripts/run-gemma4-26B-A4B-gsm8k.sh
git diff --check
Add Gemma4 dense and MoE model support EazyReal/slime#4: CodeRabbit review completed, pre-commit passed

Stacked validation with #2134 included:

uv run --with pytest --with torch --with transformers --with safetensors --with numpy python -m pytest tests/gemma4 tests/utils/test_loss_mask_type_gemma4.py tests/test_empty_colocated_weight_bucket.py -q
- 58 passed, 14 skipped

Slime-native GSM8K proof runs:

Dense google/gemma-4-31B-it, Megatron TP2 PP4 CP1, SGLang TP8: https://wandb.ai/augustinevmax-vmax/slime-gemma4-official-proof/runs/51l66o7o
MoE google/gemma-4-26B-A4B-it, Megatron TP2 PP2 EP2 CP1, SGLang TP8: https://wandb.ai/augustinevmax-vmax/slime-gemma4-official-proof/runs/eeere4c3
Report: https://wandb.ai/augustinevmax-vmax/slime-gemma4-official-proof/reports/Gemma4-Slime-GSM8K-Proof-Runs--VmlldzoxNzM1MTY2Mg==?accessToken=s43ay8st8n7w19ep73y531l898faxpdtjfvgsqd8etcrr4kszvjw6zlze6ehbmio

EazyReal added 7 commits June 26, 2026 18:22

feat(gemma4): add Gemma4 dense and MoE support

91f8526

fix: match Megatron MoE forward contract

68f737d

docs(gemma4): add GSM8K validation recipes

84c3dc1

docs(gemma4): clarify bridge scope

86425cd

docs(gemma4): name proof checkpoints by topology

ca97548

style(gemma4): keep code comments ascii

ae972e3

docs(gemma4): keep validation recipe generic

7fa85ad

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat(gemma4): add Gemma4 dense and MoE support#2135

feat(gemma4): add Gemma4 dense and MoE support#2135
EazyReal wants to merge 7 commits into
THUDM:mainfrom
EazyReal:codex/gemma4-official-upstream-20260626

EazyReal commented Jun 26, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

EazyReal commented Jun 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Relation to #1855

Dependency

Validation

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

EazyReal commented Jun 26, 2026 •

edited

Loading