Skip to content

[Bug] Actor unavailable error #1739

Description

@zhusq20

Bug Description

Traceback (most recent call last):
2026-03-18 05:25:02

2855e530…vdumckda
File "/root/slime_siqi/train.py", line 100, in
2026-03-18 05:25:02

2855e530…vdumckda
train(args)
2026-03-18 05:25:02

2855e530…vdumckda
File "/root/slime_siqi/train.py", line 69, in train
2026-03-18 05:25:02

2855e530…vdumckda
rollout_data_ref = ray.get(rollout_manager.generate.remote(rollout_id))
2026-03-18 05:25:02

2855e530…vdumckda
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
2026-03-18 05:25:02

2855e530…vdumckda
File "/usr/local/lib/python3.12/dist-packages/ray/_private/auto_init_hook.py", line 22, in auto_init_wrapper
2026-03-18 05:25:02

2855e530…vdumckda
return fn(*args, **kwargs)
2026-03-18 05:25:02

2855e530…vdumckda
^^^^^^^^^^^^^^^^^^^
2026-03-18 05:25:03

2855e530…vdumckda
File "/usr/local/lib/python3.12/dist-packages/ray/_private/client_mode_hook.py", line 104, in wrapper
2026-03-18 05:25:03

2855e530…vdumckda
return func(*args, **kwargs)
2026-03-18 05:25:03

2855e530…vdumckda
^^^^^^^^^^^^^^^^^^^^^
2026-03-18 05:25:03

2855e530…vdumckda
File "/usr/local/lib/python3.12/dist-packages/ray/_private/worker.py", line 2981, in get
2026-03-18 05:25:03

2855e530…vdumckda
values, debugger_breakpoint = worker.get_objects(
2026-03-18 05:25:03

2855e530…vdumckda
^^^^^^^^^^^^^^^^^^^
2026-03-18 05:25:03

2855e530…vdumckda
File "/usr/local/lib/python3.12/dist-packages/ray/_private/worker.py", line 1014, in get_objects
2026-03-18 05:25:03

2855e530…vdumckda
raise value
2026-03-18 05:25:03

2855e530…vdumckda
ray.exceptions.ActorUnavailableError: The actor ed2ff6eb7e211cc420c657b002000000 is unavailable: The actor is temporarily unavailable: RpcError: RPC error: Socket closed rpc_code: 14. The task may or may not have been executed on the actor.

Steps to Reproduce

I ran qwen3-1.7b training on math dataset, in step 9, this error occurrs

Expected Behavior

The program shouldn't collapse

Actual Behavior

the program exit

Environment

  • slime version:
  • Python version:
  • PyTorch version:
  • CUDA/ROCm version:
  • GPU type and count:
  • OS:
  • SGLang version (if relevant):
  • Megatron-LM version (if relevant):

Logs

Additional Context

No response

Pre-submission Checklist

  • I have read the CONTRIBUTING.md and understand the collaboration scope.
  • I have read the documentation and my issue is not addressed there.
  • I have searched for existing issues and this is not a duplicate.
  • I have provided a minimal, reproducible example.

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions