Skip to content

[Bug] solution_str = sample.prompt + sample.response #1829

Description

@yangninghua

Bug Description

/root/slime/examples/retool

bash examples/retool/retool_qwen3_4b_rl.sh


Traceback (most recent call last):                                                                                                                                                     
  File "/root/slime/train.py", line 106, in <module>                                                                                                                                   
    train(args)                                                                                                                                                                        
  File "/root/slime/train.py", line 66, in train                                                                                                                                       
    ray.get(rollout_manager.eval.remote(rollout_id))                                                                                                                                   
  File "/usr/local/lib/python3.12/dist-packages/ray/_private/auto_init_hook.py", line 22, in auto_init_wrapper                                                                         
    return fn(*args, **kwargs)                                                                                                                                                         
           ^^^^^^^^^^^^^^^^^^^                                                                                                                                                         
  File "/usr/local/lib/python3.12/dist-packages/ray/_private/client_mode_hook.py", line 104, in wrapper                                                                                
    return func(*args, **kwargs)                                                                                                                                                       
           ^^^^^^^^^^^^^^^^^^^^^                                                                                                                                                       
  File "/usr/local/lib/python3.12/dist-packages/ray/_private/worker.py", line 2967, in get                                                                                             
    values, debugger_breakpoint = worker.get_objects(                                                                                                                                  
                                  ^^^^^^^^^^^^^^^^^^^                                                                                                                                  
  File "/usr/local/lib/python3.12/dist-packages/ray/_private/worker.py", line 1015, in get_objects                                                                                     
    raise value.as_instanceof_cause()                                                                                                                                                  
ray.exceptions.RayTaskError(TypeError): ray::RolloutManager.eval() (pid=533367, ip=wings-f80b67aa53c5-master-0, actor_id=e81a9cf9a8b8dfa836c19b9f02000000, repr=<slime.ray.rollout.Roll
outManager object at 0x7f42adba5f10>)                                                                                                                                                  
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^                                                                                                                                        
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^                                                                                                                                             
  File "/root/slime/slime/ray/rollout.py", line 115, in eval                                                                                                                           
    result = call_rollout_fn(self.eval_generate_rollout, self.args, rollout_id, self.data_source, evaluation=True)                                                                     
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^                                                                     
  File "/root/slime/slime/rollout/base_types.py", line 20, in call_rollout_fn                                                                                                          
    output = fn(*args, **kwargs, evaluation=evaluation)                                                                                                                                
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^                                                                                                                                
  File "/root/slime/slime/rollout/sglang_rollout.py", line 581, in generate_rollout                                                                                                    
    output, aborted_samples = generate_abortable_samples(                                                                                                                              
                              ^^^^^^^^^^^^^^^^^^^^^^^^^^^                                                                                                                              
  File "/root/slime/slime/rollout/sglang_rollout.py", line 596, in generate_abortable_samples


  File "/root/slime/slime/rollout/sglang_rollout.py", line 596, in generate_abortable_samples                                                                                          
    return run(eval_rollout(args, rollout_id))                                                                                                                                         
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^                                                                                                                                         
  File "/root/slime/slime/utils/async_utils.py", line 36, in run                                                                                                                       
    return get_async_loop().run(coro)                                                                                                                                                  
           ^^^^^^^^^^^^^^^^^^^^^^^^^^                                                                                                                                                  
  File "/root/slime/slime/utils/async_utils.py", line 20, in run                                                                                                                       
    return asyncio.run_coroutine_threadsafe(coro, self.loop).result()                                                                                                                  
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^                                                                                                                  
  File "/usr/lib/python3.12/concurrent/futures/_base.py", line 456, in result                                                                                                          
    return self.__get_result()                                                                                                                                                         
           ^^^^^^^^^^^^^^^^^^^                                                                                                                                                         
  File "/usr/lib/python3.12/concurrent/futures/_base.py", line 401, in __get_result                                                                                                    
    raise self._exception                                                                                                                                                              
  File "/root/slime/slime/rollout/sglang_rollout.py", line 460, in eval_rollout                                                                                                        
    results_list = await asyncio.gather(*coros)                                                                                                                                        
                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^                                                                                                                                        
  File "/root/slime/slime/rollout/sglang_rollout.py", line 539, in eval_rollout_single_dataset                                                                                         
    sample = await coro                                                                                                                                                                
             ^^^^^^^^^^                                                                                                                                                                
  File "/usr/lib/python3.12/asyncio/tasks.py", line 631, in _wait_for_one                                                                                                              
    return f.result()  # May raise f.exception().                                                                                                                                      
           ^^^^^^^^^^                                                                                                                                                                  
  File "/root/slime/slime/rollout/sglang_rollout.py", line 258, in generate_and_rm                                                                                                     
    sample.reward = await async_rm(args, sample)                                                                                                                                       
                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^                                                                                                                                       
  File "/root/slime/slime/rollout/rm_hub/__init__.py", line 33, in async_rm                                                                                                            
    return await rm_function(args, sample, **kwargs)                                                                                                                                   
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^                                                                                                                                   
  File "/root/slime/examples/retool/generate_with_retool.py", line 360, in reward_func                                                                                                 
    solution_str = sample.prompt + sample.response                                                                                                                                     
                   ~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~                                                                                                                                     
TypeError: can only concatenate list (not "str") to list 


Status message: Job entrypoint command failed with exit code 1, last available logs (truncated to 20,000 chars):
  File "/root/slime/slime/rollout/rm_hub/__init__.py", line 33, in async_rm
    return await rm_function(args, sample, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/root/slime/examples/retool/generate_with_retool.py", line 360, in reward_func
    solution_str = sample.prompt + sample.response
                   ~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~
TypeError: can only concatenate list (not "str") to list

Steps to Reproduce

#!/bin/bash

# for rerun the task
pkill -9 sglang
sleep 3
ray stop --force
pkill -9 ray
pkill -9 python
sleep 3
pkill -9 ray
pkill -9 python

set -ex

# will prevent ray from buffering stdout/stderr
export PYTHONBUFFERED=16

NVLINK_COUNT=$(nvidia-smi topo -m 2>/dev/null | grep -o 'NV[0-9][0-9]*' | wc -l)
if [ "$NVLINK_COUNT" -gt 0 ]; then
    HAS_NVLINK=1
else
    HAS_NVLINK=0
fi
echo "HAS_NVLINK: $HAS_NVLINK (detected $NVLINK_COUNT NVLink references)"

SCRIPT_DIR="$(cd -- "$(dirname -- "${BASH_SOURCE[0]}")" &>/dev/null && pwd)"
source "/root/slime/scripts/models/qwen3-4B.sh"

CKPT_ARGS=(
   --hf-checkpoint ***/huggingface_modelscope/qwen3-4b-sft
   --ref-load ***/huggingface_modelscope/qwen3-4b-sft_torch_dist/release
   --save ***/huggingface_modelscope/qwen3-4b-sft-multi-turn/
   --save-interval 20
   --rotary-base 5000000
   --megatron-to-hf-mode bridge
)

ROLLOUT_ARGS=(
   --prompt-data ***/huggingface_modelscope/zhuzilin/dapo-math-17k/dapo-math-17k.jsonl
   --input-key prompt
   --label-key label
   --apply-chat-template
   --rollout-shuffle
   --reward-key score
   --num-rollout 3000
   --rollout-batch-size 32
   --n-samples-per-prompt 8
   --rollout-max-response-len 8192
   --rollout-temperature 0.8

   --global-batch-size 256
   --balance-data
)

EVAL_ARGS=(
   --eval-interval 20
   --eval-prompt-data aime  ***/huggingface_modelscope/zhuzilin/aime-2024/aime-2024.jsonl
   --n-samples-per-eval-prompt 16
   --eval-max-response-len 16384
   --eval-top-p 0.7
)

PERF_ARGS=(
   --tensor-model-parallel-size 2
   --sequence-parallel
   --pipeline-model-parallel-size 1
   --context-parallel-size 1
   --expert-model-parallel-size 1
   --expert-tensor-parallel-size 1

   --recompute-granularity full
   --recompute-method uniform
   --recompute-num-layers 1

   --micro-batch-size 1
   # --use-dynamic-batch-size
   --max-tokens-per-gpu 9216
)

GRPO_ARGS=(
   --advantage-estimator grpo
   --use-kl-loss
   --kl-loss-coef 0.00
   --kl-loss-type low_var_kl
   --entropy-coef 0.00
   --eps-clip 0.2
   --eps-clip-high 0.28
)

OPTIMIZER_ARGS=(
   --optimizer adam
   --lr 1e-6
   --lr-decay-style constant
   --weight-decay 0.1
   --adam-beta1 0.9
   --adam-beta2 0.98
)

WANDB_ARGS=(
   --use-tensorboard
   # --use-wandb
   # --wandb-project slime-dapo
   # --wandb-group qwen3-4B-test-multi-turn
   # --wandb-key ${WANDB_KEY}
)

SGLANG_ARGS=(
   --rollout-num-gpus-per-engine 2
   --sglang-mem-fraction-static 0.7
)

MISC_ARGS=(
   # default dropout in megatron is 0.1
   --attention-dropout 0.0
   --hidden-dropout 0.0
   # should be good for model performance
   --accumulate-allreduce-grads-in-fp32
   --attention-softmax-in-fp32
   # need to comment this when using model with MLA
   --attention-backend flash
)

CUSTOM_ARGS=(
   --custom-generate-function-path generate_with_retool.generate
   --custom-rm-path generate_with_retool.reward_func
)

# launch the master node of ray in container
export MASTER_ADDR=${MASTER_ADDR:-"127.0.0.1"}
ray start --head --node-ip-address ${MASTER_ADDR} --num-gpus 8 --disable-usage-stats --dashboard-host=0.0.0.0 --dashboard-port=8265

# Build the runtime environment JSON with proper variable substitution
RUNTIME_ENV_JSON="{
  \"env_vars\": {
    \"PYTHONPATH\": \"/root/Megatron-LM/:${SCRIPT_DIR}:/root/slime\",
    \"CUDA_DEVICE_MAX_CONNECTIONS\": \"1\",
    \"TENSORBOARD_DIR\": \"***/huggingface_modelscope/tensorboard_local\",
    \"NCCL_NVLS_ENABLE\": \"${HAS_NVLINK}\"
  }
}"

ray job submit --address="http://127.0.0.1:8265" \
   --runtime-env-json="${RUNTIME_ENV_JSON}" \
   -- python3 train.py \
   --actor-num-nodes 1 \
   --actor-num-gpus-per-node 8 \
   --colocate \
   ${MODEL_ARGS[@]} \
   ${CKPT_ARGS[@]} \
   ${ROLLOUT_ARGS[@]} \
   ${OPTIMIZER_ARGS[@]} \
   ${GRPO_ARGS[@]} \
   ${WANDB_ARGS[@]} \
   ${PERF_ARGS[@]} \
   ${EVAL_ARGS[@]} \
   ${SGLANG_ARGS[@]} \
   ${MISC_ARGS[@]} \
   ${CUSTOM_ARGS[@]}

Expected Behavior

docker
slimerl/slime:latest

Actual Behavior

cd /root/slime

bash examples/retool/retool_qwen3_4b_rl.sh

Environment

  • slime version:
  • Python version:
  • PyTorch version:
  • CUDA/ROCm version:
  • GPU type and count:
  • OS:
  • SGLang version (if relevant):
  • Megatron-LM version (if relevant):

Logs

Additional Context

No response

Pre-submission Checklist

  • I have read the CONTRIBUTING.md and understand the collaboration scope.
  • I have read the documentation and my issue is not addressed there.
  • I have searched for existing issues and this is not a duplicate.
  • I have provided a minimal, reproducible example.

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions