Skip to content
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
58 commits
Select commit Hold shift + click to select a range
0140eaf
[Bug] Fix FlashInfer allreduce fusion workspace uninitialized error (…
wzhao18 Mar 20, 2026
bd8c4c0
[CI] Removing deprecated rlhf examples reference (#37585)
AndreasKaratzas Mar 20, 2026
dcee9be
[Model Runner V2] Fix draft logits not populated during cudagraph rep…
TheEpicDolphin Mar 20, 2026
ed359c4
[Model] Deprecate the score task (this will not affect users). (#37537)
noooop Mar 20, 2026
9cfd4eb
[ROCm][CI] Update GSM8K eval config to use fp8-and-mixed models list …
AndreasKaratzas Mar 20, 2026
37cd9fc
[ROCm][CI] Remove deepep DBO tests on gfx90a (#37614)
AndreasKaratzas Mar 20, 2026
5a4a179
[ROCm][CI] Fix granite_speech test for gfx90a by selecting compatible…
AndreasKaratzas Mar 20, 2026
6050b93
[Refactor] Move serve entrypoint tests under tests/entrypoints/serve/…
sfeng33 Mar 20, 2026
b4c1aef
[Refactor] Relocate tests from tests/v1/entrypoints/ to tests/entrypo…
sfeng33 Mar 20, 2026
0523449
[Misc] Use logger.info_once for auto tool choice log message (#37661)
chaunceyjiang Mar 20, 2026
dd20ee4
[UX] Enable torch_profiler_with_stack (#37571)
jeejeelee Mar 20, 2026
9f6d9dd
Fix attribute error in `isaac_patch_hf_runner` (#37685)
hmellor Mar 20, 2026
8b6c6b9
[Model] Add LFM2-ColBERT-350M support (#37528)
ieBoytsov Mar 20, 2026
44eea10
[ROCm][Quantization] make quark ocp mx dtype parser robust for weight…
xuebwang-amd Mar 20, 2026
1779c09
[ROCm] Enable wvSplitK skinny GEMM kernel for RDNA4/gfx1x decode (#34…
laudney Mar 20, 2026
56a62c3
[Bugfix] Reject channelwise quantization (group_size <= 0) in Exllama…
mgehre-amd Mar 20, 2026
5e806bc
[Bugfix] Fix ConchLinearKernel channelwise quantization (group_size=-…
mgehre-amd Mar 20, 2026
aa84e43
[Pixtral] Enable Pixtral language model support Eagle3 (#37182)
Flechman Mar 20, 2026
c0f5fae
[compile] Fix aot test failures with torch 2.12. (#37604)
zhxchen17 Mar 20, 2026
880be2b
[Metrics] Some small refactoring for better maintainability (#33898)
hickeyma Mar 20, 2026
2e089b9
[compile] Add compiled artifact counter for VLLM_USE_MEGA_AOT_ARTIFAC…
zhxchen17 Mar 20, 2026
6ade4bc
Fix various config related issues for Transformers v5 (#37681)
hmellor Mar 20, 2026
fb4e8bf
[ROCm][CI] Fix accuracy for llama-nemotron-vl pooling tests (#37613)
AndreasKaratzas Mar 20, 2026
d0532bf
[Perf] Eliminate redundant SparseMatrix creation in gpt_oss_triton_ke…
xyang16 Mar 20, 2026
e80cfe5
[MRV2] Avoid recompilation of _gather_block_tables_kernel (#37645)
WoosukKwon Mar 20, 2026
79eb936
fix CUDAGraph memory being counted twice (#37426)
panpan0000 Mar 20, 2026
e1d85e5
[Attention] Support distinguishing between short extends and decodes …
LucasWilkinson Mar 20, 2026
6ec5e9f
refactor: abstract deepgemm support into platform (#37519)
SherryC41 Mar 20, 2026
d7d2b5e
[Bugfix] Disable --calculate-kv-scales for hybrid GDN/Mamba+Attention…
Young-Leo Mar 20, 2026
37aadf6
[Model] Update Kimi-K25 and Isaac processors to fit HF-style (#37693)
DarkLight1337 Mar 20, 2026
12fd17e
[compile] Initialize passes at VllmBackend init (#35216)
angelayi Mar 20, 2026
4f16ebb
[Bugfix] Disable monolithic TRTLLM MoE for Renormalize routing (#3759…
vadiklyutiy Mar 20, 2026
8bc6b5c
[ROCm][CI] Setting some mi325_4 tests back to optional (in parity wit…
AndreasKaratzas Mar 20, 2026
85f671b
[Model Runner V2] Support Streaming Inputs (#37028)
santiramos27 Mar 20, 2026
b3d0b37
[Refactor] Remove unused dead code (#36171)
yewentao256 Mar 20, 2026
e5ed6c6
[BugFix] Allow qk_nope_head_dim=192 in FlashInfer MLA backend checks …
kjiang249 Mar 20, 2026
c57d38d
elastic_ep: Fix issues with repeated scale up/down cycles (#37131)
itayalroy Mar 20, 2026
1c472f8
Add get_device_uuid for rocm (#37694)
tmm77 Mar 21, 2026
c7f98b4
[Frontend] Remove librosa from audio dependency (#37058)
Isotr0py Mar 21, 2026
87bd918
[MoE Refactor] Mxfp4 oracle rebased (#37128)
zyongye Mar 21, 2026
3ffa520
[ROCm][CI] Guard CudaPlatform/RocmPlatform imports to fix test collec…
AndreasKaratzas Mar 21, 2026
1fa1e53
Revert "[compile] Initialize passes at VllmBackend init" (#37733)
simon-mo Mar 21, 2026
0d50fa1
[ROCm][CI] Mark gemma3 as large GPU test to avoid OOM on MI250 (#37610)
AndreasKaratzas Mar 21, 2026
17ee641
[Responses API] Add kv_transfer_params for PD disaggregation (#37424)
bongwoobak Mar 21, 2026
02eec7e
[ROCm][CI] Update GSM8K eval config to use fp8-and-mixed models list …
AndreasKaratzas Mar 21, 2026
3982bc2
[ROCm] Enable DeepEP ROCm as all2allbackend for AMD GPUs. (#34692)
lcskrishna Mar 21, 2026
298e510
[Hybrid] calling get_mamba_groups() once at MambaCopyBuffers.create()…
fuscof-ibm Mar 21, 2026
88f1b37
[Core] Enable allreduce fusion by default for SM 10.3 (B300/GB300) (#…
mmangkad Mar 21, 2026
61e381d
[Perf] Add SM 10.3 (B300/GB300) all-reduce communicator tuning (#37756)
mmangkad Mar 21, 2026
80b7088
Add tensor IPC transfer mechanism for multimodal data (#32104)
brandonpelfrey Mar 21, 2026
8cc700d
Consolidate AWQ quantization into single awq_marlin.py file
robertgshaw2-redhat Mar 21, 2026
5ad0446
Revert "Consolidate AWQ quantization into single awq_marlin.py file" …
robertgshaw2-redhat Mar 21, 2026
eeee5b2
[Quantization][Deprecation] Remove PTPC FP8 (#32700)
robertgshaw2-redhat Mar 21, 2026
6b2fa3a
[MoE] Move FlashInfer CuteDSL experts into fused_moe/experts/ (#37759)
robertgshaw2-redhat Mar 21, 2026
e78bc74
[ROCm][CI] close missing quote in kernels/moe block in run-amd-test.s…
AndreasKaratzas Mar 22, 2026
66f927f
[Bugfix] Fix pooling non-determinism from pinned prompt_lens aliasing…
AndreasKaratzas Mar 22, 2026
c86b17c
[ROCm][CI] Add large_gpu_mark to test_max_tokens_none for ROCm (#37717)
AndreasKaratzas Mar 22, 2026
c862481
[CI] Skip ISAAC multimodal tests due to broken upstream HF model weig…
AndreasKaratzas Mar 22, 2026
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 1 addition & 2 deletions .buildkite/scripts/hardware_ci/run-amd-test.sh
Original file line number Diff line number Diff line change
Expand Up @@ -326,8 +326,7 @@ apply_rocm_test_overrides() {
if [[ $cmds == *" kernels/moe"* ]]; then
cmds="${cmds} \
--ignore=kernels/moe/test_moe.py \
--ignore=kernels/moe/test_cutlass_moe.py \
--ignore=kernels/moe/test_triton_moe_ptpc_fp8.py"
--ignore=kernels/moe/test_cutlass_moe.py"
fi

# --- Entrypoint ignores ---
Expand Down
2 changes: 1 addition & 1 deletion .buildkite/scripts/hardware_ci/run-tpu-v1-test-part2.sh
Original file line number Diff line number Diff line change
Expand Up @@ -127,7 +127,7 @@ run_and_track_test() {

# --- Actual Test Execution ---
run_and_track_test 1 "test_struct_output_generate.py" \
"python3 -m pytest -s -v /workspace/vllm/tests/v1/entrypoints/llm/test_struct_output_generate.py -k \"not test_structured_output_with_reasoning_matrices\""
"python3 -m pytest -s -v /workspace/vllm/tests/entrypoints/llm/test_struct_output_generate.py -k \"not test_structured_output_with_reasoning_matrices\""
run_and_track_test 2 "test_moe_pallas.py" \
"python3 -m pytest -s -v /workspace/vllm/tests/tpu/test_moe_pallas.py"
run_and_track_test 3 "test_lora.py" \
Expand Down
99 changes: 36 additions & 63 deletions .buildkite/test-amd.yaml

Large diffs are not rendered by default.

5 changes: 2 additions & 3 deletions .buildkite/test_areas/distributed.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -27,14 +27,14 @@ steps:
- vllm/v1/engine/
- vllm/v1/worker/
- tests/v1/distributed
- tests/v1/entrypoints/openai/test_multi_api_servers.py
- tests/entrypoints/openai/test_multi_api_servers.py
commands:
# https://github.com/NVIDIA/nccl/issues/1838
- export NCCL_CUMEM_HOST_ENABLE=0
- TP_SIZE=1 DP_SIZE=2 pytest -v -s v1/distributed/test_async_llm_dp.py
- TP_SIZE=1 DP_SIZE=2 pytest -v -s v1/distributed/test_eagle_dp.py
- TP_SIZE=1 DP_SIZE=2 pytest -v -s v1/distributed/test_external_lb_dp.py
- DP_SIZE=2 pytest -v -s v1/entrypoints/openai/test_multi_api_servers.py
- DP_SIZE=2 pytest -v -s entrypoints/openai/test_multi_api_servers.py

- label: Distributed Compile + RPC Tests (2 GPUs)
timeout_in_minutes: 20
Expand Down Expand Up @@ -88,7 +88,6 @@ steps:
- vllm/distributed/
- tests/distributed/test_torchrun_example.py
- tests/distributed/test_torchrun_example_moe.py
- examples/offline_inference/rlhf.py
- examples/offline_inference/rlhf_colocate.py
- examples/rl/
- tests/examples/offline_inference/data_parallel.py
Expand Down
12 changes: 12 additions & 0 deletions .buildkite/test_areas/engine.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -70,3 +70,15 @@ steps:
device: mi325_4
depends_on:
- image-build-amd

- label: V1 e2e (4xH100)
timeout_in_minutes: 60
device: h100
num_devices: 4
optional: true
source_file_dependencies:
- vllm/v1/attention/backends/utils.py
- vllm/v1/worker/gpu_model_runner.py
- tests/v1/e2e/test_hybrid_chunked_prefill.py
commands:
- pytest -v -s v1/e2e/test_hybrid_chunked_prefill.py
21 changes: 4 additions & 17 deletions .buildkite/test_areas/entrypoints.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,7 @@ steps:
- tests/entrypoints/
commands:
- pytest -v -s entrypoints/openai/tool_parsers
- pytest -v -s entrypoints/ --ignore=entrypoints/llm --ignore=entrypoints/rpc --ignore=entrypoints/sleep --ignore=entrypoints/instrumentator --ignore=entrypoints/openai --ignore=entrypoints/offline_mode --ignore=entrypoints/test_chat_utils.py --ignore=entrypoints/pooling
- pytest -v -s entrypoints/ --ignore=entrypoints/llm --ignore=entrypoints/rpc --ignore=entrypoints/sleep --ignore=entrypoints/serve/instrumentator --ignore=entrypoints/openai --ignore=entrypoints/offline_mode --ignore=entrypoints/test_chat_utils.py --ignore=entrypoints/pooling

- label: Entrypoints Integration (LLM)
timeout_in_minutes: 40
Expand All @@ -34,7 +34,7 @@ steps:
- tests/entrypoints/test_chat_utils
commands:
- export VLLM_WORKER_MULTIPROC_METHOD=spawn
- pytest -v -s entrypoints/openai --ignore=entrypoints/openai/chat_completion/test_chat_with_tool_reasoning.py --ignore=entrypoints/openai/chat_completion/test_oot_registration.py --ignore=entrypoints/openai/completion/test_tensorizer_entrypoint.py --ignore=entrypoints/openai/correctness/ --ignore=entrypoints/openai/tool_parsers/ --ignore=entrypoints/openai/responses
- pytest -v -s entrypoints/openai --ignore=entrypoints/openai/chat_completion/test_chat_with_tool_reasoning.py --ignore=entrypoints/openai/chat_completion/test_oot_registration.py --ignore=entrypoints/openai/completion/test_tensorizer_entrypoint.py --ignore=entrypoints/openai/correctness/ --ignore=entrypoints/openai/tool_parsers/ --ignore=entrypoints/openai/responses --ignore=entrypoints/openai/test_multi_api_servers.py
- pytest -v -s entrypoints/test_chat_utils.py
mirror:
amd:
Expand All @@ -48,11 +48,11 @@ steps:
source_file_dependencies:
- vllm/
- tests/entrypoints/rpc
- tests/entrypoints/instrumentator
- tests/entrypoints/serve/instrumentator
- tests/tool_use
commands:
- export VLLM_WORKER_MULTIPROC_METHOD=spawn
- pytest -v -s entrypoints/instrumentator
- pytest -v -s entrypoints/serve/instrumentator
- PYTHONPATH=/vllm-workspace pytest -v -s entrypoints/rpc
- pytest -v -s tool_use

Expand All @@ -75,19 +75,6 @@ steps:
commands:
- pytest -v -s entrypoints/openai/responses

- label: Entrypoints V1
timeout_in_minutes: 50
source_file_dependencies:
- vllm/
- tests/v1
commands:
- pytest -v -s v1/entrypoints
mirror:
amd:
device: mi325_1
depends_on:
- image-build-amd

- label: OpenAI API Correctness
timeout_in_minutes: 30
source_file_dependencies:
Expand Down
16 changes: 16 additions & 0 deletions .buildkite/test_areas/lm_eval.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -45,6 +45,22 @@ steps:
commands:
- pytest -s -v evals/gsm8k/test_gsm8k_correctness.py --config-list-file=configs/models-blackwell.txt

- label: LM Eval Qwen3.5 Models (B200)
timeout_in_minutes: 120
device: b200
optional: true
num_devices: 2
source_file_dependencies:
- vllm/model_executor/models/qwen3_5.py
- vllm/model_executor/models/qwen3_5_mtp.py
- vllm/transformers_utils/configs/qwen3_5.py
- vllm/transformers_utils/configs/qwen3_5_moe.py
- vllm/model_executor/models/qwen3_next.py
- vllm/model_executor/models/qwen3_next_mtp.py
- vllm/model_executor/layers/fla/ops/
commands:
- pytest -s -v evals/gsm8k/test_gsm8k_correctness.py --config-list-file=configs/models-qwen35-blackwell.txt

- label: LM Eval Large Models (H200)
timeout_in_minutes: 60
device: h200
Expand Down
4 changes: 2 additions & 2 deletions .buildkite/test_areas/model_runner_v2.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -11,7 +11,7 @@ steps:
- vllm/v1/attention/
- tests/v1/engine/test_llm_engine.py
- tests/v1/e2e/
- tests/v1/entrypoints/llm/test_struct_output_generate.py
- tests/entrypoints/llm/test_struct_output_generate.py
commands:
- set -x
- export VLLM_USE_V2_MODEL_RUNNER=1
Expand All @@ -22,7 +22,7 @@ steps:
- pytest -v -s v1/e2e/general/test_context_length.py
- pytest -v -s v1/e2e/general/test_min_tokens.py
# Temporary hack filter to exclude ngram spec decoding based tests.
- pytest -v -s v1/entrypoints/llm/test_struct_output_generate.py -k "xgrammar and not speculative_config6 and not speculative_config7 and not speculative_config8 and not speculative_config0"
- pytest -v -s entrypoints/llm/test_struct_output_generate.py -k "xgrammar and not speculative_config6 and not speculative_config7 and not speculative_config8 and not speculative_config0"

- label: Model Runner V2 Examples
timeout_in_minutes: 45
Expand Down
2 changes: 1 addition & 1 deletion .github/CODEOWNERS
Original file line number Diff line number Diff line change
Expand Up @@ -75,7 +75,7 @@ CMakeLists.txt @tlrmchlsmth @LucasWilkinson
/tests/multimodal @DarkLight1337 @ywang96 @NickLucche
/tests/quantization @mgoin @robertgshaw2-redhat @yewentao256 @pavanimajety
/tests/test_inputs.py @DarkLight1337 @ywang96
/tests/v1/entrypoints/llm/test_struct_output_generate.py @mgoin @russellb @aarnphm
/tests/entrypoints/llm/test_struct_output_generate.py @mgoin @russellb @aarnphm
/tests/v1/structured_output @mgoin @russellb @aarnphm
/tests/v1/core @WoosukKwon @robertgshaw2-redhat @njhill @ywang96 @alexm-redhat @heheda12345 @ApostaC @orozery
/tests/weight_loading @mgoin @youkaichao @yewentao256
Expand Down
2 changes: 1 addition & 1 deletion .github/mergify.yml
Original file line number Diff line number Diff line change
Expand Up @@ -260,7 +260,7 @@ pull_request_rules:
- files=examples/offline_inference/structured_outputs.py
- files=examples/online_serving/structured_outputs/structured_outputs.py
- files~=^tests/v1/structured_output/
- files=tests/v1/entrypoints/llm/test_struct_output_generate.py
- files=tests/entrypoints/llm/test_struct_output_generate.py
- files~=^vllm/v1/structured_output/
actions:
label:
Expand Down
Loading