Skip to content

[NPU][Example] Add Qwen3-32B GRPO training script for Ascend#164

Open
CalvinXKY wants to merge 3 commits into
ascendfrom
xky/ascend-qwen32b-example
Open

[NPU][Example] Add Qwen3-32B GRPO training script for Ascend#164
CalvinXKY wants to merge 3 commits into
ascendfrom
xky/ascend-qwen32b-example

Conversation

@CalvinXKY

Copy link
Copy Markdown
Collaborator

Summary

Add scripts/run-qwen3-32B-npu.sh, an end-to-end GRPO training example for Qwen3-32B on Ascend NPU (Atlas 800I A3).

This PR is scoped to the run script only. The Docker build and NPU dependency patches are already on the ascend branch (#163).

The script reuses the existing model config at scripts/models/qwen3-32B.sh and follows the same layout as other scripts under scripts/ (e.g. run-qwen3-4B.sh).

What the script covers

  • NPU / Ray environment: ASCEND_RT_VISIBLE_DEVICES, HCCL port ranges, RAY_EXPERIMENTAL_NOSET_ASCEND_RT_VISIBLE_DEVICES, CANN/Ascend runtime env for ray job submit
  • Training: GRPO on GSM8K (deepscaler reward), TP=8 + sequence parallel, CPU optimizer offload
  • Rollout: vLLM with --rollout-num-gpus-per-engine 2, --vllm-enforce-eager
  • Resource layout: 16 NPUs total — 8 for actor training, 8 for rollout (1 node × 8 NPUs each)

Prerequisites

  • Ascend NPU environment built from docker/Dockerfile.npu
  • Converted checkpoints:
    • --hf-checkpoint: HuggingFace Qwen3-32B weights
    • --ref-load: Megatron torch_dist checkpoint
  • Training data: GSM8K parquet (default path in script is environment-specific)

Paths to customize before running

Variable / arg Default in script Notes
--hf-checkpoint /data/local_models/Qwen3-32B HF model dir
--ref-load /data/local_models/Qwen3-32B_torch_dist Megatron dist ckpt
--prompt-data /data/nfs_87/xky/datasets/gsm8k/train.parquet GSM8K train set
ASCEND_RT_VISIBLE_DEVICES 0–15 Match available NPUs

Test plan

  • Smoke run: script starts Ray head, submits job, and training begins without import / HCCL / vLLM startup errors on A3 (16 NPUs)
  • End-to-end: complete at least 1 GRPO step (rollout → train → weight sync) on Qwen3-32B

Related

CalvinXKY added 3 commits June 5, 2026 14:28
* update docker patch.

* fix mindspeed.patch try-except formatting per review

Replace malformed features_manager hunks with proper try/except/pass blocks.

* add torch_npu.patch for NPU Docker build

Wrap eager_connect_single_device in try/except to avoid RuntimeError on A3.
Add scripts/run-qwen3-32B-npu.sh for end-to-end GRPO on Atlas 800I A3 (16 NPUs).

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a new bash script, scripts/run-qwen3-32B-npu.sh, designed to configure and launch Qwen3-32B training on NPU clusters using Ray. The code review feedback focuses on correcting a typo in the PYTHONBUFFERED environment variable (which should be PYTHONUNBUFFERED), improving script portability by replacing hardcoded /home/ma-user paths with the ${HOME} environment variable, and enhancing shell script robustness by properly double-quoting variables and array expansions to prevent word splitting.


SCRIPT_DIR="$(cd -- "$(dirname -- "${BASH_SOURCE[0]}")" &>/dev/null && pwd)"

export PYTHONBUFFERED=16

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The environment variable to disable Python's stdout/stderr buffering is PYTHONUNBUFFERED, not PYTHONBUFFERED. Because of this typo, Python output buffering will not be disabled, which can lead to delayed or missing logs during training. It should be set to PYTHONUNBUFFERED=1.

Suggested change
export PYTHONBUFFERED=16
export PYTHONUNBUFFERED=1

export RAY_DISABLE_SIGINT_OVERRIDE=1
export HCCL_CONNECT_TIMEOUT=7200

export PYTHONPATH="/home/ma-user/Megatron-LM:/home/ma-user/vllm:/home/ma-user/vime:${PYTHONPATH}"

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Hardcoding /home/ma-user limits the script's portability to other environments or users. Consider using the ${HOME} environment variable instead.

Suggested change
export PYTHONPATH="/home/ma-user/Megatron-LM:/home/ma-user/vllm:/home/ma-user/vime:${PYTHONPATH}"
export PYTHONPATH="${HOME}/Megatron-LM:${HOME}/vllm:${HOME}/vime:${PYTHONPATH}"

PROMPT_SET=/data/nfs_87/xky/datasets/gsm8k/train.parquet

ROLLOUT_ARGS=(
--prompt-data ${PROMPT_SET}

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Double-quote ${PROMPT_SET} to prevent word splitting if the path contains spaces or special characters.

Suggested change
--prompt-data ${PROMPT_SET}
--prompt-data "${PROMPT_SET}"

)

export MASTER_ADDR=${MASTER_ADDR:-"127.0.0.1"}
ray start --head --node-ip-address ${MASTER_ADDR} --num-gpus 0 --resources '{"NPU": 16}' --disable-usage-stats --dashboard-host=0.0.0.0 --dashboard-port=8265

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Double-quote ${MASTER_ADDR} to prevent word splitting.

Suggested change
ray start --head --node-ip-address ${MASTER_ADDR} --num-gpus 0 --resources '{"NPU": 16}' --disable-usage-stats --dashboard-host=0.0.0.0 --dashboard-port=8265
ray start --head --node-ip-address "${MASTER_ADDR}" --num-gpus 0 --resources '{"NPU": 16}' --disable-usage-stats --dashboard-host=0.0.0.0 --dashboard-port=8265


RUNTIME_ENV_JSON='{
"env_vars": {
"PYTHONPATH": "/home/ma-user/Megatron-LM:/home/ma-user/vllm:/home/ma-user/vime:/usr/local/Ascend/ascend-toolkit/latest/tools/ms_fmk_transplt/torch_npu_bridge:/usr/local/Ascend/ascend-toolkit/latest/python/site-packages:'"$PYTHONPATH"'",

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Hardcoding /home/ma-user in the PYTHONPATH inside RUNTIME_ENV_JSON limits portability. Consider using ${HOME} instead.

Suggested change
"PYTHONPATH": "/home/ma-user/Megatron-LM:/home/ma-user/vllm:/home/ma-user/vime:/usr/local/Ascend/ascend-toolkit/latest/tools/ms_fmk_transplt/torch_npu_bridge:/usr/local/Ascend/ascend-toolkit/latest/python/site-packages:'"$PYTHONPATH"'",
"PYTHONPATH": "'"${HOME}"'/Megatron-LM:'"${HOME}"'/vllm:'"${HOME}"'/vime:/usr/local/Ascend/ascend-toolkit/latest/tools/ms_fmk_transplt/torch_npu_bridge:/usr/local/Ascend/ascend-toolkit/latest/python/site-packages:'"$PYTHONPATH"'",

"ASCEND_HOME_PATH": "/usr/local/Ascend/ascend-toolkit/latest/",
"HYDRA_FULL_ERROR": "1",
"RAY_DEBUG_POST_MORTEM_DISABLED": "1",
"LD_LIBRARY_PATH": "/usr/local/Ascend/driver/lib64:/usr/local/Ascend/driver/lib64/common:/usr/local/Ascend/driver/lib64/driver:/usr/local/Ascend/ascend-toolkit/latest/lib64:/usr/local/Ascend/nnal/atb/latest/atb/cxx_abi_1/lib:/usr/local/Ascend/cann-8.5.2/lib64:'"$LD_LIBRARY_PATH"'"

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Hardcoding the specific CANN version path /usr/local/Ascend/cann-8.5.2/lib64 makes the script fragile and less portable across different Ascend environments or upgrades. Consider using the latest symlink or dynamically resolving the path if possible.

}
}'

cd /home/ma-user/vime

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Use "${HOME}/vime" instead of the hardcoded /home/ma-user/vime path to improve portability.

Suggested change
cd /home/ma-user/vime
cd "${HOME}/vime"

cd /home/ma-user/vime
ray job submit --address="http://127.0.0.1:8265" \
--runtime-env-json="${RUNTIME_ENV_JSON}" \
-- python3 /home/ma-user/vime/train.py \

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Use "${HOME}/vime/train.py" instead of the hardcoded /home/ma-user/vime/train.py path to improve portability.

Suggested change
-- python3 /home/ma-user/vime/train.py \
-- python3 "${HOME}/vime/train.py" \

Comment on lines +132 to +139
${MODEL_ARGS[@]} \
${CKPT_ARGS[@]} \
${ROLLOUT_ARGS[@]} \
${OPTIMIZER_ARGS[@]} \
${GRPO_ARGS[@]} \
${PERF_ARGS[@]} \
${VLLM_ARGS[@]} \
${MISC_ARGS[@]}

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Array expansions should be double-quoted (e.g., "${MODEL_ARGS[@]}") to prevent word splitting and glob expansion of individual elements. This ensures that arguments containing spaces or special characters are passed correctly to the python script.

Suggested change
${MODEL_ARGS[@]} \
${CKPT_ARGS[@]} \
${ROLLOUT_ARGS[@]} \
${OPTIMIZER_ARGS[@]} \
${GRPO_ARGS[@]} \
${PERF_ARGS[@]} \
${VLLM_ARGS[@]} \
${MISC_ARGS[@]}
"${MODEL_ARGS[@]}" \
"${CKPT_ARGS[@]}" \
"${ROLLOUT_ARGS[@]}" \
"${OPTIMIZER_ARGS[@]}" \
"${GRPO_ARGS[@]}" \
"${PERF_ARGS[@]}" \
"${VLLM_ARGS[@]}" \
"${MISC_ARGS[@]}"

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant