Skip to content

feat(gemma4): add Gemma4 dense and MoE support#2135

Open
EazyReal wants to merge 7 commits into
THUDM:mainfrom
EazyReal:codex/gemma4-official-upstream-20260626
Open

feat(gemma4): add Gemma4 dense and MoE support#2135
EazyReal wants to merge 7 commits into
THUDM:mainfrom
EazyReal:codex/gemma4-official-upstream-20260626

Conversation

@EazyReal

@EazyReal EazyReal commented Jun 26, 2026

Copy link
Copy Markdown
Contributor

Summary

Adds Gemma4 model support for both dense and MoE variants in Slime:

  • Megatron model/provider support for Gemma4 dense and 26B-A4B MoE checkpoints
  • MBridge and Megatron-to-HF conversion support
  • Gemma4 loss-mask option and model config scripts
  • Slime-native GSM8K proof recipes for google/gemma-4-31B-it and google/gemma-4-26B-A4B-it
  • English and Chinese example docs with the validated topologies

The implementation covers Gemma4-specific attention and conversion behavior, including dual RoPE, global/sliding attention, K=V global attention handling, layer scalar broadcast, router/expert parameters, QKV packing, and CP indexing.

Relation to #1855

#1855 is an earlier Gemma4 integration branch. This PR keeps the same upstream goal but uses a refreshed, focused branch for review:

  • keeps the contribution Slime-native with GSM8K validation recipes rather than a task-specific generation flow
  • separates the colocated raw weight sync edge-case fix into fix: handle empty colocated weight buckets #2134
  • adds runnable dense and MoE proof scripts plus public W&B evidence
  • removes unrelated integration surface so this PR is just Gemma4 model support, conversion, loss masking, docs, scripts, and tests

Dependency

The large-model MoE proof also exercised the colocated raw weight sync edge case fixed in #2134. This PR intentionally keeps the Gemma4 model support separate from that bugfix; if maintainers run the 26B-A4B colocated recipe before #2134 lands, they should apply that fix as well.

Validation

Local validation on this Gemma4-only branch:

  • uv run --with pytest --with torch --with transformers --with safetensors --with numpy python -m pytest tests/gemma4 tests/utils/test_loss_mask_type_gemma4.py -q
    • 56 passed, 14 skipped
  • uv run --with ruff ruff check slime_plugins/models/gemma4.py slime_plugins/models/gemma4_provider.py slime_plugins/mbridge/gemma4.py slime/backends/megatron_utils/megatron_to_hf/gemma4.py slime/utils/mask_utils.py tests/gemma4 tests/utils/test_loss_mask_type_gemma4.py
  • uv run --with black black --check slime_plugins/models/gemma4.py slime_plugins/models/gemma4_provider.py slime_plugins/mbridge/gemma4.py slime/backends/megatron_utils/megatron_to_hf/gemma4.py slime/utils/mask_utils.py tests/gemma4 tests/utils/test_loss_mask_type_gemma4.py
  • bash -n scripts/run-gemma4-31B-gsm8k.sh scripts/run-gemma4-26B-A4B-gsm8k.sh
  • git diff --check
  • Add Gemma4 dense and MoE model support EazyReal/slime#4: CodeRabbit review completed, pre-commit passed

Stacked validation with #2134 included:

  • uv run --with pytest --with torch --with transformers --with safetensors --with numpy python -m pytest tests/gemma4 tests/utils/test_loss_mask_type_gemma4.py tests/test_empty_colocated_weight_bucket.py -q
    • 58 passed, 14 skipped

Slime-native GSM8K proof runs:

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant