Bug Description
Traceback (most recent call last):
2026-03-18 05:25:02
2855e530…vdumckda
File "/root/slime_siqi/train.py", line 100, in
2026-03-18 05:25:02
2855e530…vdumckda
train(args)
2026-03-18 05:25:02
2855e530…vdumckda
File "/root/slime_siqi/train.py", line 69, in train
2026-03-18 05:25:02
2855e530…vdumckda
rollout_data_ref = ray.get(rollout_manager.generate.remote(rollout_id))
2026-03-18 05:25:02
2855e530…vdumckda
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
2026-03-18 05:25:02
2855e530…vdumckda
File "/usr/local/lib/python3.12/dist-packages/ray/_private/auto_init_hook.py", line 22, in auto_init_wrapper
2026-03-18 05:25:02
2855e530…vdumckda
return fn(*args, **kwargs)
2026-03-18 05:25:02
2855e530…vdumckda
^^^^^^^^^^^^^^^^^^^
2026-03-18 05:25:03
2855e530…vdumckda
File "/usr/local/lib/python3.12/dist-packages/ray/_private/client_mode_hook.py", line 104, in wrapper
2026-03-18 05:25:03
2855e530…vdumckda
return func(*args, **kwargs)
2026-03-18 05:25:03
2855e530…vdumckda
^^^^^^^^^^^^^^^^^^^^^
2026-03-18 05:25:03
2855e530…vdumckda
File "/usr/local/lib/python3.12/dist-packages/ray/_private/worker.py", line 2981, in get
2026-03-18 05:25:03
2855e530…vdumckda
values, debugger_breakpoint = worker.get_objects(
2026-03-18 05:25:03
2855e530…vdumckda
^^^^^^^^^^^^^^^^^^^
2026-03-18 05:25:03
2855e530…vdumckda
File "/usr/local/lib/python3.12/dist-packages/ray/_private/worker.py", line 1014, in get_objects
2026-03-18 05:25:03
2855e530…vdumckda
raise value
2026-03-18 05:25:03
2855e530…vdumckda
ray.exceptions.ActorUnavailableError: The actor ed2ff6eb7e211cc420c657b002000000 is unavailable: The actor is temporarily unavailable: RpcError: RPC error: Socket closed rpc_code: 14. The task may or may not have been executed on the actor.
Steps to Reproduce
I ran qwen3-1.7b training on math dataset, in step 9, this error occurrs
Expected Behavior
The program shouldn't collapse
Actual Behavior
the program exit
Environment
- slime version:
- Python version:
- PyTorch version:
- CUDA/ROCm version:
- GPU type and count:
- OS:
- SGLang version (if relevant):
- Megatron-LM version (if relevant):
Logs
Additional Context
No response
Pre-submission Checklist
Bug Description
Traceback (most recent call last):
2026-03-18 05:25:02
2855e530…vdumckda
File "/root/slime_siqi/train.py", line 100, in
2026-03-18 05:25:02
2855e530…vdumckda
train(args)
2026-03-18 05:25:02
2855e530…vdumckda
File "/root/slime_siqi/train.py", line 69, in train
2026-03-18 05:25:02
2855e530…vdumckda
rollout_data_ref = ray.get(rollout_manager.generate.remote(rollout_id))
2026-03-18 05:25:02
2855e530…vdumckda
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
2026-03-18 05:25:02
2855e530…vdumckda
File "/usr/local/lib/python3.12/dist-packages/ray/_private/auto_init_hook.py", line 22, in auto_init_wrapper
2026-03-18 05:25:02
2855e530…vdumckda
return fn(*args, **kwargs)
2026-03-18 05:25:02
2855e530…vdumckda
^^^^^^^^^^^^^^^^^^^
2026-03-18 05:25:03
2855e530…vdumckda
File "/usr/local/lib/python3.12/dist-packages/ray/_private/client_mode_hook.py", line 104, in wrapper
2026-03-18 05:25:03
2855e530…vdumckda
return func(*args, **kwargs)
2026-03-18 05:25:03
2855e530…vdumckda
^^^^^^^^^^^^^^^^^^^^^
2026-03-18 05:25:03
2855e530…vdumckda
File "/usr/local/lib/python3.12/dist-packages/ray/_private/worker.py", line 2981, in get
2026-03-18 05:25:03
2855e530…vdumckda
values, debugger_breakpoint = worker.get_objects(
2026-03-18 05:25:03
2855e530…vdumckda
^^^^^^^^^^^^^^^^^^^
2026-03-18 05:25:03
2855e530…vdumckda
File "/usr/local/lib/python3.12/dist-packages/ray/_private/worker.py", line 1014, in get_objects
2026-03-18 05:25:03
2855e530…vdumckda
raise value
2026-03-18 05:25:03
2855e530…vdumckda
ray.exceptions.ActorUnavailableError: The actor ed2ff6eb7e211cc420c657b002000000 is unavailable: The actor is temporarily unavailable: RpcError: RPC error: Socket closed rpc_code: 14. The task may or may not have been executed on the actor.
Steps to Reproduce
I ran qwen3-1.7b training on math dataset, in step 9, this error occurrs
Expected Behavior
The program shouldn't collapse
Actual Behavior
the program exit
Environment
Logs
Additional Context
No response
Pre-submission Checklist