Add NPU-compatible select_device() and new test cases for data preprocessing#9512
Add NPU-compatible select_device() and new test cases for data preprocessing#9512Ginray wants to merge 10 commits into
Conversation
…processing to improve CI coverage
There was a problem hiding this comment.
Code Review
This pull request centralizes device environment setup across tests using a new setup_device_env utility, and introduces lightweight tests for data preprocessing, Megatron arguments, and the TransformersEngine. Key feedback includes casting device_ids to a string to prevent type errors, using a try...finally block to avoid test pollution when modifying template properties, adding defensive checks for empty choices in streaming responses, gracefully handling missing trl imports during test collection, and lazy-loading the TransformersEngine to prevent resource consumption during test discovery.
Important
The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.
…cessing, TransformersEngine inference, and Megatron args to improve CI coverage
# Conflicts: # tests/deploy/test_dataset.py # tests/deploy/test_logprobs.py # tests/eval/test_eval.py # tests/export/test_quant.py # tests/general/test_data_preprocess.py # tests/infer/test_agent.py # tests/infer/test_infer.py # tests/infer/test_logprobs.py # tests/infer/test_main.py # tests/infer/test_mllm.py # tests/infer/test_sglang.py # tests/infer/test_transformers_engine.py # tests/llm/test_ollama_export.py # tests/llm/test_run.py # tests/llm/test_template.py # tests/megatron/test_embedding.py # tests/megatron/test_export.py # tests/megatron/test_gkd.py # tests/megatron/test_grpo.py # tests/megatron/test_kto.py # tests/megatron/test_lora.py # tests/megatron/test_rlhf.py # tests/megatron/test_train.py # tests/models/test_llm.py # tests/models/test_mllm.py # tests/test_align/test_cls.py # tests/test_align/test_lmdeploy_vlm.py # tests/test_align/test_padding_side.py # tests/test_align/test_template/test_agent.py # tests/test_align/test_template/test_audio.py # tests/test_align/test_template/test_gene.py # tests/test_align/test_template/test_llm.py # tests/test_align/test_template/test_tool.py # tests/test_align/test_template/test_video.py # tests/test_align/test_template/test_vision.py # tests/test_align/test_vllm_vlm.py # tests/train/test_channel.py # tests/train/test_cls.py # tests/train/test_embedding.py # tests/train/test_freeze.py # tests/train/test_gkd.py # tests/train/test_grpo.py # tests/train/test_kto.py # tests/train/test_liger.py # tests/train/test_multilabel.py # tests/train/test_packing.py # tests/train/test_ppo.py # tests/train/test_pt.py # tests/train/test_resume_from_checkpoint.py # tests/train/test_rlhf.py # tests/train/test_sft.py # tests/train/test_train_eval.py # tests/train/test_vit_lr.py
…ers_engine to unittest.TestCase
|
We confirmed the UT failure comes from No module named 'swift.ray' and is not related to this PR. It is expected that PR pull/9471 and follow-up PRs will address the UT problem. |
|
不用管这些CI的 |
| import os | ||
| os.environ['CUDA_VISIBLE_DEVICES'] = '0' | ||
|
|
||
| from swift.utils import select_device |
There was a problem hiding this comment.
这里会初始化 cuda/npu,导致环境变量失效的。
A new round of testing has been performed on Swift test cases running on NPU, including manually triggered cases. Adjustments have been made in response to the test outcomes. Please review.
PR type
PR information
Swift's test suite faces two potential limitations. (1) CUDA_VISIBLE_DEVICES may not take effect on NPUs; (2) Core training, inference and preprocessing top-level functions may be undiscoverable by unittests, which could create CI coverage gaps.
Changes :
New tests/_test_utils.py : Provides select_device() , auto-setting ASCEND_RT_VISIBLE_DEVICES on NPU and CUDA_VISIBLE_DEVICES on GPU.
51 existing files : Only replaced os.environ['CUDA_VISIBLE_DEVICES'] with setup_device_env() , no other code changes.
2 new unittest.TestCase files (auto-discovered by CI):
1 new top-level function file (manual call, not in CI):
6 existing files with appended top-level functions (commented out by default):
Verification : All new cases executed and passed on NPU.
No impact on existing UTs :
Experiment results
Paste your experiment result here(if needed).
All new test cases passed, including manual ones: