Skip to content

[Bug] When making minimax m2.7 hf checkpoint to torch_dist format, ran into error #2129

Description

@Lynnzake

Bug Description

when i try to convert minimax m2.7's hf checkpoint to torch_dist format with the following command,

source /nfs-152/disk6/tujie/train_env_slime/slime/scripts/models/minimax-m2.7.sh

echo "开始将 HuggingFace 格式转换为 torch-dist 格式..."
echo "输入路径: ${HF_MODEL_PATH}"
echo "输出路径: ${TORCH_DIST_OUTPUT_PATH}"

# 执行转换(单机 8 卡)
PYTHONPATH=/root/Megatron-LM/:$(pwd) torchrun \
   --nproc-per-node 8 \
   --master-addr localhost \
   --master-port 12345 \
   --nnodes=1 \
   --node-rank 0 \
   tools/convert_hf_to_torch_dist.py \
   ${MODEL_ARGS[@]} \
   --hf-checkpoint ${HF_MODEL_PATH} \
   --save ${TORCH_DIST_OUTPUT_PATH}

I ran into the following error:

[rank3]: Traceback (most recent call last):
[rank3]:   File "/nfs-152/disk6/tujie/train_env_slime/slime/tools/convert_hf_to_torch_dist.py", line 146, in <module>
[rank3]:     main()
[rank3]:   File "/nfs-152/disk6/tujie/train_env_slime/slime/tools/convert_hf_to_torch_dist.py", line 119, in main
[rank3]:     bridge = AutoBridge.from_pretrained(hf_model_path, trust_remote_code=True)
[rank3]:              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank3]:   File "/usr/local/lib/python3.12/dist-packages/mbridge/core/auto_bridge.py", line 30, in from_pretrained
[rank3]:     return cls.from_config(config, **kwargs)
[rank3]:            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank3]:   File "/usr/local/lib/python3.12/dist-packages/mbridge/core/auto_bridge.py", line 48, in from_config
[rank3]:     return _MODEL_REGISTRY[model_type](hf_config, **kwargs)
[rank3]:            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank3]:   File "/usr/local/lib/python3.12/dist-packages/mbridge/core/bridge.py", line 51, in __init__
[rank3]:     self.config = self._build_config()
[rank3]:                   ^^^^^^^^^^^^^^^^^^^^
[rank3]:   File "/nfs-152/disk6/tujie/train_env_slime/slime/slime_plugins/mbridge/minimax_m2.py", line 42, in _build_config
[rank3]:     return self._build_base_config(
[rank3]:            ^^^^^^^^^^^^^^^^^^^^^^^^
[rank3]:   File "/usr/local/lib/python3.12/dist-packages/mbridge/core/llm_bridge.py", line 108, in _build_base_config
[rank3]:     return self.TransformerConfigClass(**base_config)
[rank3]:            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank3]: TypeError: TransformerConfig.__init__() got an unexpected keyword argument 'rotary_percent'

i dock pull the latest version slimerl/slime:latest, and git clone the main branch of the repo, what's the possible cause of the problem, and how to solve? Appreciate the help!

Steps to Reproduce

  1. docker pull

  2. ran the conversion command

Expected Behavior

The conversion succeed

Actual Behavior

ran into error

Environment

  • slime version:
  • Python version:
  • PyTorch version:
  • CUDA/ROCm version:
  • GPU type and count:
  • OS:
  • SGLang version (if relevant):
  • Megatron-LM version (if relevant):

Logs

Additional Context

No response

Pre-submission Checklist

  • I have read the CONTRIBUTING.md and understand the collaboration scope.
  • I have read the documentation and my issue is not addressed there.
  • I have searched for existing issues and this is not a duplicate.
  • I have provided a minimal, reproducible example.

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions