[Handoff to @Oseltamivir Claude /loop] [Klaud Cold] Add dsr1-fp8-mi300x-sglang-mtp recipe#1499
[Handoff to @Oseltamivir Claude /loop] [Klaud Cold] Add dsr1-fp8-mi300x-sglang-mtp recipe#1499functionstackx wants to merge 2 commits into
Conversation
|
Thanks for the contribution! For vLLM & SGLang, please ensure that your recipes is similar to the official vLLM recipes and/or the SGLang cookbook If it is not, please create a PR first before we can merge your single node PR into the master branch. Let's ensure that the documentation is first class such that the entire ML community can benefit from your hard work! Thank you PR authors are responsible for ensuring that after merging, all GitHub Action jobs fully pass. A lot of the time, failures are just flakes and simply re-running the failed jobs will fix it. If re-running failed jobs is attempted, PR authors are responsible for ensuring it passes. See GitHub's docs on re-running failed jobs: https://docs.github.com/en/actions/how-tos/manage-workflow-runs/re-run-workflows-and-jobs#re-running-failed-jobs-in-a-workflow As a rule of thumb, generally, PR authors should request a review & get a PR approval from the respective companies' CODEOWNERS before requesting a review from core maintainers. If additional help is needed, PR authors can reach out to core maintainers over Slack. |
|
see unofficial run visualizer at https://inferencex.semianalysis.com/inference?unofficialRun=26048204089 |
| #!/usr/bin/env bash | ||
|
|
||
| # DeepSeek-R1-0528 FP8 on MI300X with EAGLE/MTP speculative decoding. | ||
| # Mirrors dsr1_fp8_mi300x.sh and adds the speculative-* flags. | ||
|
|
There was a problem hiding this comment.
🔴 The newly added benchmarks/single_node/dsr1_fp8_mi300x_mtp.sh is unreachable: runners/launch_mi300x-amds.sh:41 hardcodes the bench-script path as benchmarks/single_node/${SCENARIO_SUBDIR}${EXP_NAME%%_*}_${PRECISION}_mi300x.sh with no SPEC_SUFFIX dispatch, so the dsr1-fp8-mi300x-sglang-mtp recipe will always invoke the existing non-MTP dsr1_fp8_mi300x.sh regardless of spec-decoding: mtp. The sweep will silently run the autoregressive recipe (no --speculative-* flags, SGLANG_ENABLE_SPEC_V2 unset) and publish vanilla numbers under the -mtp recipe name. Fix: add SPEC_SUFFIX=$([[ "$SPEC_DECODING" == "mtp" ]] && printf _mtp || printf "") to launch_mi300x-amds.sh and append it to the script path, mirroring launch_b200-cw.sh:6-16 / launch_mi355x-amds.sh:183,226.
Extended reasoning...
What's broken
The PR adds a new recipe dsr1-fp8-mi300x-sglang-mtp (.github/configs/amd-master.yaml:1823) with spec-decoding: mtp, plus a sibling bench script benchmarks/single_node/dsr1_fp8_mi300x_mtp.sh that wires up the --speculative-algorithm EAGLE … flags and SGLANG_ENABLE_SPEC_V2=1. But the launcher that routes mi300x runners has no path to ever execute that file.
Code path
The dispatcher .github/workflows/benchmark-tmpl.yml invokes runners/launch_mi300x-amds.sh for any recipe whose runner: mi300x. That launcher ends with a single hardcoded line at runners/launch_mi300x-amds.sh:41:
bash benchmarks/single_node/${SCENARIO_SUBDIR}${EXP_NAME%%_*}_${PRECISION}_mi300x.shEXP_NAME is built by utils/matrix_logic/generate_sweep_configs.py:290 as f"{model_code}_{seq_len_str}" where model_code is the recipe's model-prefix. For this PR's recipe (model-prefix: dsr1), EXP_NAME will be e.g. dsr1_1k1k or dsr1_8k1k, so ${EXP_NAME%%_*}=dsr1. With PRECISION=fp8 and no SCENARIO_SUBDIR (fixed-seq-len is the default, empty subdir), the launcher always resolves to:
benchmarks/single_node/dsr1_fp8_mi300x.sh
— which exists (the non-MTP script from origin/main) and runs cleanly, so there's no error to alert anyone.
Why every other launcher handles this
Every launcher that owns an MTP recipe computes a SPEC_SUFFIX and appends it to the script name. Examples from the tree:
runners/launch_mi355x-amds.sh:183,226-227—SPEC_SUFFIX=$([[ "$SPEC_DECODING" == "mtp" ]] && printf '_mtp' || printf '')andSCRIPT_BASE="${EXP_NAME%%_*}_${PRECISION}_mi355x"followed by${SCRIPT_BASE}_${FRAMEWORK}${SPEC_SUFFIX}.sh.runners/launch_b200-cw.sh:8,13,launch_b200-nb.sh:6,launch_b200-dgxc.sh:335,launch_b300-nv.sh:294,launch_h200-cw.sh:8,47,launch_h200-nb.sh:8,launch_h200-dgxc-slurm.sh:300— all carry the sameSPEC_SUFFIXpattern.
launch_mi300x-amds.sh and (separately) launch_mi325x-amds.sh are the only two launchers without it; until now there was no MTP recipe targeting either runner, so the omission was harmless. This PR is the first MTP recipe on mi300x, so the omission becomes load-bearing.
Impact
The label says this PR is full-sweep-enabled, so when the sweep runs:
- The server gets launched without
--speculative-algorithm EAGLE,--speculative-num-steps,--speculative-eagle-topk,--speculative-num-draft-tokens, and withoutSGLANG_ENABLE_SPEC_V2=1. - The benchmark client measures vanilla autoregressive decode latency/throughput on DeepSeek-R1-0528 fp8.
- Results are tagged
dsr1-fp8-mi300x-sglang-mtpand written to the perf history.
That publishes non-MTP numbers under the MTP recipe name — which is exactly the opposite of what the PR is trying to demonstrate (EAGLE speedup vs. baseline). The bench script that was added in this PR contributes zero observable behavior.
Step-by-step proof
- CI selects the recipe
dsr1-fp8-mi300x-sglang-mtpfromamd-master.yaml→ exportsMODEL=deepseek-ai/DeepSeek-R1-0528,PRECISION=fp8,MODEL_PREFIX=dsr1,SPEC_DECODING=mtp. generate_sweep_configs.py:290constructsEXP_NAME=dsr1_1k1k(ordsr1_8k1k)..github/workflows/benchmark-tmpl.ymlroutesrunner: mi300x→ invokesrunners/launch_mi300x-amds.sh.- Line 41 expands:
SCENARIO_SUBDIR="",${EXP_NAME%%_*}=dsr1,PRECISION=fp8→bash benchmarks/single_node/dsr1_fp8_mi300x.sh. - That file (origin/main, the non-MTP recipe) runs end-to-end and succeeds;
SPEC_DECODING=mtpwas never consulted by the launcher. - The new
dsr1_fp8_mi300x_mtp.shis never read.
Fix
Mirror launch_b200-cw.sh:6-16 (or launch_mi355x-amds.sh:183,226) in launch_mi300x-amds.sh: compute SPEC_SUFFIX=$([[ "$SPEC_DECODING" == "mtp" ]] && printf _mtp || printf "") early, then change line 41's path to ${EXP_NAME%%_*}_${PRECISION}_mi300x${SPEC_SUFFIX}.sh. Worth landing the same patch on launch_mi325x-amds.sh preemptively, since the same problem will resurface the next time someone adds an mi325x MTP recipe.
|
see unofficial run visualizer at https://inferencex.semianalysis.com/inference?unofficialRun=26048211958 |
MTP/EAGLE sibling of dsr1-fp8-mi300x-sglang. Same image (lmsysorg/sglang:v0.5.12-rocm700-mi30x), same model (deepseek-ai/DeepSeek-R1-0528), same TP=8 search-space. Launch script mirrors dsr1_fp8_mi300x.sh (aiter attention, mem-fraction 0.8, HSA_NO_SCRATCH_RECLAIM gate) and adds SGLANG_ENABLE_SPEC_V2=1 plus the standard EAGLE knobs (num-steps 3, eagle-topk 1, num-draft-tokens 4) plus --use-chat-template on the bench client per AGENTS.md. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
d9f101a to
e97f84f
Compare
|
see unofficial run visualizer at https://inferencex.semianalysis.com/inference?unofficialRun=26185764549 |
Summary
MTP/EAGLE sibling of the existing
dsr1-fp8-mi300x-sglangrecipe (DeepSeek-R1-0528). Same image (lmsysorg/sglang:v0.5.12-rocm700-mi30x, just merged via #1425), same model (deepseek-ai/DeepSeek-R1-0528), same TP=8 search-space (conc 4..64, 1k1k + 8k1k).Launch script
benchmarks/single_node/dsr1_fp8_mi300x_mtp.shmirrorsdsr1_fp8_mi300x.sh(aiter attention, MLA-persist, MEC-FW gate forHSA_NO_SCRATCH_RECLAIM) and adds:export SGLANG_ENABLE_SPEC_V2=1--speculative-algorithm EAGLE --speculative-num-steps 3 --speculative-eagle-topk 1 --speculative-num-draft-tokens 4--use-chat-templateon the bench client per AGENTS.mdTest plan
bash -nsyntax passes on the launch script.🤖 Generated with Claude Code