[Bug] Multi-head MTP (`--mtp-num-layers > 1`) crashes at training-step logging

### Bug Description

slime's per-step MTP-loss logging hard-codes a **single-MTP-layer** assumption. When the model has more than one MTP head **and** MTP training is enabled, training crashes with logging below.


**Trigger conditions (both required):**

1. `--mtp-num-layers > 1` — a multi-head MTP model (e.g. MiMo-7B with 3 MTP heads).
2. `--enable-mtp-training` — this turns on the MTP-loss logging branch unconditionally (the crash is in the logging code, not the forward/backward pass).

With `--mtp-num-layers 1` (single head) the bug does **not** reproduce.

### Steps to Reproduce

Trigger: `--mtp-num-layers > 1` together with `--enable-mtp-training`.

Minimal command (only the relevant args are shown; the rest is standard GRPO config):

```bash

python train.py \
  --mtp-num-layers 3 \ #<-- >1 triggers the crash` 
  --enable-mtp-training \ # <-- enables the crashing log path
  --mtp-loss-scaling-factor 0.35
  ... (standard rollout/optimizer/perf args)
```

Reproduced with: **MiMo-7B-Base** converted to Megatron `torch_dist` with 3 MTP heads (MTP3), GRPO + MTP training, TP=4 / PP=1 / CP=1, single node.

Switching to `--mtp-num-layers 1` makes the crash disappear.

### Expected Behavior

Run and log normally

### Actual Behavior

Crash as logging below

### Environment

- slime version: 0.2.4 (pip) — source commit `4bd75ad1`
- Python version: 3.12.0
- PyTorch version: 2.9.1+cu128
- CUDA/ROCm version: CUDA 12.8 (driver 565.57.01)
- GPU type and count: NVIDIA RTX A6000 (single node, TP=4)
- OS: Linux (kernel 5.15.0-60-generic)
- SGLang version: 0.5.10.post1
- Megatron-LM version: local clone on PYTHONPATH (`core_v0.15.0rc7-548-g3714d81d4`)

### Logs

```shell
## Logs


Traceback (most recent call last):
  File "train.py", line 103, in <module>
    train(args)
  File "train.py", line 81, in train
    ray.get(actor_model.async_train(rollout_id, rollout_data_ref))
  File ".../site-packages/ray/_private/auto_init_hook.py", line 22, in auto_init_wrapper
    return fn(*args, **kwargs)
  File ".../site-packages/ray/_private/client_mode_hook.py", line 104, in wrapper
    return func(*args, **kwargs)
  File ".../site-packages/ray/_private/worker.py", line 2981, in get
    values, debugger_breakpoint = worker.get_objects(
  File ".../site-packages/ray/_private/worker.py", line 1012, in get_objects
    raise value.as_instance_of_cause()
ray.exceptions.RayTaskError(RuntimeError): ray::MegatronTrainRayActor.train() (pid=..., ip=..., actor_id=..., repr=<MegatronTrainRayActor object at 0x...>)
  File "slime/backends/megatron_utils/actor.py", line 416, in train
    self.train_actor(rollout_id, rollout_data, external_data=external_data)
  File "slime/backends/megatron_utils/actor.py", line 547, in train_actor
    train(
  File "slime/backends/megatron_utils/model.py", line 783, in train
    mtp_losses = (tracker["values"] * mtp_loss_scale).item()
RuntimeError: a Tensor with 3 elements cannot be converted to Scalar
```

### Additional Context

I plan to propose a PR to fix it,and I am still testing it.

### Pre-submission Checklist

- [x] I have read the [CONTRIBUTING.md](https://github.com/THUDM/slime/blob/main/CONTRIBUTING.md) and understand the collaboration scope.
- [x] I have read the [documentation](https://thudm.github.io/slime/) and my issue is not addressed there.
- [x] I have searched for [existing issues](https://github.com/THUDM/slime/issues) and this is not a duplicate.
- [x] I have provided a minimal, reproducible example.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Bug] Multi-head MTP (`--mtp-num-layers > 1`) crashes at training-step logging #2131

Bug Description

Steps to Reproduce

Expected Behavior

Actual Behavior

Environment

Logs

Additional Context

Pre-submission Checklist

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Uh oh!

[Bug] Multi-head MTP (--mtp-num-layers > 1) crashes at training-step logging #2131

Description

Bug Description

Steps to Reproduce

Expected Behavior

Actual Behavior

Environment

Logs

Additional Context

Pre-submission Checklist

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions

[Bug] Multi-head MTP (`--mtp-num-layers > 1`) crashes at training-step logging #2131