[NPU][Example] Add Qwen3-32B GRPO training script for Ascend by CalvinXKY · Pull Request #164 · vllm-project/vime

CalvinXKY · 2026-06-06T02:43:48Z

Summary

Add scripts/run-qwen3-32B-npu.sh, an end-to-end GRPO training example for Qwen3-32B on Ascend NPU (Atlas 800I A3).

This PR is scoped to the run script only. The Docker build and NPU dependency patches are already on the ascend branch (#163).

The script reuses the existing model config at scripts/models/qwen3-32B.sh and follows the same layout as other scripts under scripts/ (e.g. run-qwen3-4B.sh).

What the script covers

NPU / Ray environment: ASCEND_RT_VISIBLE_DEVICES, HCCL port ranges, RAY_EXPERIMENTAL_NOSET_ASCEND_RT_VISIBLE_DEVICES, CANN/Ascend runtime env for ray job submit
Training: GRPO on GSM8K (deepscaler reward), TP=8 + sequence parallel, CPU optimizer offload
Rollout: vLLM with --rollout-num-gpus-per-engine 2, --vllm-enforce-eager
Resource layout: 16 NPUs total — 8 for actor training, 8 for rollout (1 node × 8 NPUs each)

Prerequisites

Ascend NPU environment built from docker/Dockerfile.npu
Converted checkpoints:
- --hf-checkpoint: HuggingFace Qwen3-32B weights
- --ref-load: Megatron torch_dist checkpoint
Training data: GSM8K parquet (default path in script is environment-specific)

Paths to customize before running

Variable / arg	Default in script	Notes
`--hf-checkpoint`	`/data/local_models/Qwen3-32B`	HF model dir
`--ref-load`	`/data/local_models/Qwen3-32B_torch_dist`	Megatron dist ckpt
`--prompt-data`	`/data/nfs_87/xky/datasets/gsm8k/train.parquet`	GSM8K train set
`ASCEND_RT_VISIBLE_DEVICES`	`0–15`	Match available NPUs

Test plan

Smoke run: script starts Ray head, submits job, and training begins without import / HCCL / vLLM startup errors on A3 (16 NPUs)
End-to-end: complete at least 1 GRPO step (rollout → train → weight sync) on Qwen3-32B

Code Review

This pull request introduces a new bash script, scripts/run-qwen3-32B-npu.sh, designed to configure and launch Qwen3-32B training on NPU clusters using Ray. The code review feedback focuses on correcting a typo in the PYTHONBUFFERED environment variable (which should be PYTHONUNBUFFERED), improving script portability by replacing hardcoded /home/ma-user paths with the ${HOME} environment variable, and enhancing shell script robustness by properly double-quoting variables and array expansions to prevent word splitting.

gemini-code-assist · 2026-06-06T02:45:12Z

+
+SCRIPT_DIR="$(cd -- "$(dirname -- "${BASH_SOURCE[0]}")" &>/dev/null && pwd)"
+
+export PYTHONBUFFERED=16


The environment variable to disable Python's stdout/stderr buffering is PYTHONUNBUFFERED, not PYTHONBUFFERED. Because of this typo, Python output buffering will not be disabled, which can lead to delayed or missing logs during training. It should be set to PYTHONUNBUFFERED=1.

Suggested change

export PYTHONBUFFERED=16

export PYTHONUNBUFFERED=1

gemini-code-assist · 2026-06-06T02:45:12Z

+export RAY_DISABLE_SIGINT_OVERRIDE=1
+export HCCL_CONNECT_TIMEOUT=7200
+
+export PYTHONPATH="/home/ma-user/Megatron-LM:/home/ma-user/vllm:/home/ma-user/vime:${PYTHONPATH}"


Hardcoding /home/ma-user limits the script's portability to other environments or users. Consider using the ${HOME} environment variable instead.

Suggested change

export PYTHONPATH="/home/ma-user/Megatron-LM:/home/ma-user/vllm:/home/ma-user/vime:${PYTHONPATH}"

export PYTHONPATH="${HOME}/Megatron-LM:${HOME}/vllm:${HOME}/vime:${PYTHONPATH}"

gemini-code-assist · 2026-06-06T02:45:12Z

+PROMPT_SET=/data/nfs_87/xky/datasets/gsm8k/train.parquet
+
+ROLLOUT_ARGS=(
+   --prompt-data ${PROMPT_SET}


Double-quote ${PROMPT_SET} to prevent word splitting if the path contains spaces or special characters.

Suggested change

--prompt-data ${PROMPT_SET}

--prompt-data "${PROMPT_SET}"

gemini-code-assist · 2026-06-06T02:45:12Z

+)
+
+export MASTER_ADDR=${MASTER_ADDR:-"127.0.0.1"}
+ray start --head --node-ip-address ${MASTER_ADDR} --num-gpus 0 --resources '{"NPU": 16}' --disable-usage-stats --dashboard-host=0.0.0.0 --dashboard-port=8265


Double-quote ${MASTER_ADDR} to prevent word splitting.

Suggested change

ray start --head --node-ip-address ${MASTER_ADDR} --num-gpus 0 --resources '{"NPU": 16}' --disable-usage-stats --dashboard-host=0.0.0.0 --dashboard-port=8265

ray start --head --node-ip-address "${MASTER_ADDR}" --num-gpus 0 --resources '{"NPU": 16}' --disable-usage-stats --dashboard-host=0.0.0.0 --dashboard-port=8265

gemini-code-assist · 2026-06-06T02:45:12Z

+
+RUNTIME_ENV_JSON='{
+  "env_vars": {
+    "PYTHONPATH": "/home/ma-user/Megatron-LM:/home/ma-user/vllm:/home/ma-user/vime:/usr/local/Ascend/ascend-toolkit/latest/tools/ms_fmk_transplt/torch_npu_bridge:/usr/local/Ascend/ascend-toolkit/latest/python/site-packages:'"$PYTHONPATH"'",


Hardcoding /home/ma-user in the PYTHONPATH inside RUNTIME_ENV_JSON limits portability. Consider using ${HOME} instead.

Suggested change

"PYTHONPATH": "/home/ma-user/Megatron-LM:/home/ma-user/vllm:/home/ma-user/vime:/usr/local/Ascend/ascend-toolkit/latest/tools/ms_fmk_transplt/torch_npu_bridge:/usr/local/Ascend/ascend-toolkit/latest/python/site-packages:'"$PYTHONPATH"'",

"PYTHONPATH": "'"${HOME}"'/Megatron-LM:'"${HOME}"'/vllm:'"${HOME}"'/vime:/usr/local/Ascend/ascend-toolkit/latest/tools/ms_fmk_transplt/torch_npu_bridge:/usr/local/Ascend/ascend-toolkit/latest/python/site-packages:'"$PYTHONPATH"'",

gemini-code-assist · 2026-06-06T02:45:12Z

+    "ASCEND_HOME_PATH": "/usr/local/Ascend/ascend-toolkit/latest/",
+    "HYDRA_FULL_ERROR": "1",
+    "RAY_DEBUG_POST_MORTEM_DISABLED": "1",
+    "LD_LIBRARY_PATH": "/usr/local/Ascend/driver/lib64:/usr/local/Ascend/driver/lib64/common:/usr/local/Ascend/driver/lib64/driver:/usr/local/Ascend/ascend-toolkit/latest/lib64:/usr/local/Ascend/nnal/atb/latest/atb/cxx_abi_1/lib:/usr/local/Ascend/cann-8.5.2/lib64:'"$LD_LIBRARY_PATH"'"


Hardcoding the specific CANN version path /usr/local/Ascend/cann-8.5.2/lib64 makes the script fragile and less portable across different Ascend environments or upgrades. Consider using the latest symlink or dynamically resolving the path if possible.

gemini-code-assist · 2026-06-06T02:45:12Z

+  }
+}'
+
+cd /home/ma-user/vime


Use "${HOME}/vime" instead of the hardcoded /home/ma-user/vime path to improve portability.

Suggested change

cd /home/ma-user/vime

cd "${HOME}/vime"

gemini-code-assist · 2026-06-06T02:45:12Z

+cd /home/ma-user/vime
+ray job submit --address="http://127.0.0.1:8265" \
+   --runtime-env-json="${RUNTIME_ENV_JSON}" \
+   -- python3 /home/ma-user/vime/train.py \


Use "${HOME}/vime/train.py" instead of the hardcoded /home/ma-user/vime/train.py path to improve portability.

Suggested change

-- python3 /home/ma-user/vime/train.py \

-- python3 "${HOME}/vime/train.py" \

gemini-code-assist · 2026-06-06T02:45:12Z

+   ${MODEL_ARGS[@]} \
+   ${CKPT_ARGS[@]} \
+   ${ROLLOUT_ARGS[@]} \
+   ${OPTIMIZER_ARGS[@]} \
+   ${GRPO_ARGS[@]} \
+   ${PERF_ARGS[@]} \
+   ${VLLM_ARGS[@]} \
+   ${MISC_ARGS[@]}


Array expansions should be double-quoted (e.g., "${MODEL_ARGS[@]}") to prevent word splitting and glob expansion of individual elements. This ensures that arguments containing spaces or special characters are passed correctly to the python script.

Suggested change

${MODEL_ARGS[@]} \

${CKPT_ARGS[@]} \

${ROLLOUT_ARGS[@]} \

${OPTIMIZER_ARGS[@]} \

${GRPO_ARGS[@]} \

${PERF_ARGS[@]} \

${VLLM_ARGS[@]} \

${MISC_ARGS[@]}

"${MODEL_ARGS[@]}" \

"${CKPT_ARGS[@]}" \

"${ROLLOUT_ARGS[@]}" \

"${OPTIMIZER_ARGS[@]}" \

"${GRPO_ARGS[@]}" \

"${PERF_ARGS[@]}" \

"${VLLM_ARGS[@]}" \

"${MISC_ARGS[@]}"

CalvinXKY added 3 commits June 5, 2026 14:28

ascend: adapt vime for NPU

ae52797

add Qwen3-32B NPU GRPO training example script

64990bb

Add scripts/run-qwen3-32B-npu.sh for end-to-end GRPO on Atlas 800I A3 (16 NPUs).

gemini-code-assist Bot reviewed Jun 6, 2026

View reviewed changes

CalvinXKY mentioned this pull request Jun 6, 2026

[NPU][Spike] Steps and Test Results for Running Qwen3-32B on NPU（A3） #165

Open

CalvinXKY requested a review from Meihan-chen June 6, 2026 03:32

CalvinXKY force-pushed the ascend branch from f795f2f to 41f9ed3 Compare June 11, 2026 08:55

aoshen02 mentioned this pull request Jun 21, 2026

[RFC] VIME Roadmap #11

Open

14 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[NPU][Example] Add Qwen3-32B GRPO training script for Ascend#164

[NPU][Example] Add Qwen3-32B GRPO training script for Ascend#164
CalvinXKY wants to merge 3 commits into
ascendfrom
xky/ascend-qwen32b-example

CalvinXKY commented Jun 6, 2026

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

gemini-code-assist Bot Jun 6, 2026

Uh oh!

gemini-code-assist Bot Jun 6, 2026

Uh oh!

gemini-code-assist Bot Jun 6, 2026

Uh oh!

gemini-code-assist Bot Jun 6, 2026

Uh oh!

gemini-code-assist Bot Jun 6, 2026

Uh oh!

gemini-code-assist Bot Jun 6, 2026

Uh oh!

gemini-code-assist Bot Jun 6, 2026

Uh oh!

gemini-code-assist Bot Jun 6, 2026

Uh oh!

gemini-code-assist Bot Jun 6, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant


		SCRIPT_DIR="$(cd -- "$(dirname -- "${BASH_SOURCE[0]}")" &>/dev/null && pwd)"

		export PYTHONBUFFERED=16

	export PYTHONPATH="/home/ma-user/Megatron-LM:/home/ma-user/vllm:/home/ma-user/vime:${PYTHONPATH}"
	export PYTHONPATH="${HOME}/Megatron-LM:${HOME}/vllm:${HOME}/vime:${PYTHONPATH}"

	ray start --head --node-ip-address ${MASTER_ADDR} --num-gpus 0 --resources '{"NPU": 16}' --disable-usage-stats --dashboard-host=0.0.0.0 --dashboard-port=8265
	ray start --head --node-ip-address "${MASTER_ADDR}" --num-gpus 0 --resources '{"NPU": 16}' --disable-usage-stats --dashboard-host=0.0.0.0 --dashboard-port=8265

	"PYTHONPATH": "/home/ma-user/Megatron-LM:/home/ma-user/vllm:/home/ma-user/vime:/usr/local/Ascend/ascend-toolkit/latest/tools/ms_fmk_transplt/torch_npu_bridge:/usr/local/Ascend/ascend-toolkit/latest/python/site-packages:'"$PYTHONPATH"'",
	"PYTHONPATH": "'"${HOME}"'/Megatron-LM:'"${HOME}"'/vllm:'"${HOME}"'/vime:/usr/local/Ascend/ascend-toolkit/latest/tools/ms_fmk_transplt/torch_npu_bridge:/usr/local/Ascend/ascend-toolkit/latest/python/site-packages:'"$PYTHONPATH"'",

	-- python3 /home/ma-user/vime/train.py \
	-- python3 "${HOME}/vime/train.py" \

Uh oh!

Conversation

CalvinXKY commented Jun 6, 2026

Summary

What the script covers

Prerequisites

Paths to customize before running

Test plan

Related

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist Bot Jun 6, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Jun 6, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Jun 6, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Jun 6, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Jun 6, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Jun 6, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Jun 6, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Jun 6, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Jun 6, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant