-
Notifications
You must be signed in to change notification settings - Fork 175
[Handoff to @Oseltamivir Claude /loop] [Klaud Cold] Add dsr1-fp8-mi300x-sglang-mtp recipe #1499
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
functionstackx
wants to merge
2
commits into
main
Choose a base branch
from
add-dsr1-fp8-mi300x-sglang-mtp
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Changes from all commits
Commits
Show all changes
2 commits
Select commit
Hold shift + click to select a range
File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,87 @@ | ||
| #!/usr/bin/env bash | ||
|
|
||
| # DeepSeek-R1-0528 FP8 on MI300X with EAGLE/MTP speculative decoding. | ||
| # Mirrors dsr1_fp8_mi300x.sh and adds the speculative-* flags. | ||
|
|
||
| source "$(dirname "$0")/../benchmark_lib.sh" | ||
|
|
||
| check_env_vars \ | ||
| MODEL \ | ||
| TP \ | ||
| CONC \ | ||
| ISL \ | ||
| OSL \ | ||
| RANDOM_RANGE_RATIO \ | ||
| RESULT_FILENAME \ | ||
| EP_SIZE | ||
|
|
||
| if [[ -n "$SLURM_JOB_ID" ]]; then | ||
| echo "JOB $SLURM_JOB_ID running on $SLURMD_NODENAME" | ||
| fi | ||
|
|
||
| if [[ "$MODEL" != /* ]]; then hf download "$MODEL"; fi | ||
|
|
||
| # If the machine runs a MEC FW older than 177, RCCL | ||
| # cannot reclaim some memory. Disable to avoid crashes. | ||
| version=`rocm-smi --showfw | grep MEC | head -n 1 | awk '{print $NF}'` | ||
| if [[ "$version" == "" || $version -lt 177 ]]; then | ||
| export HSA_NO_SCRATCH_RECLAIM=1 | ||
| fi | ||
|
|
||
| export SGLANG_USE_AITER=1 | ||
| export SGLANG_AITER_MLA_PERSIST=1 | ||
| export SGLANG_ENABLE_SPEC_V2=1 | ||
|
|
||
| SERVER_LOG=/workspace/server.log | ||
| PORT=${PORT:-8888} | ||
|
|
||
| EVAL_CONTEXT_ARGS="" | ||
| if [ "${EVAL_ONLY}" = "true" ]; then | ||
| setup_eval_context | ||
| EVAL_CONTEXT_ARGS="--context-length $EVAL_MAX_MODEL_LEN" | ||
| fi | ||
|
|
||
| start_gpu_monitor | ||
|
|
||
| set -x | ||
| python3 -m sglang.launch_server \ | ||
| --model-path=$MODEL --host=0.0.0.0 --port=$PORT --trust-remote-code \ | ||
| --tensor-parallel-size=$TP \ | ||
| --ep-size $EP_SIZE \ | ||
| --mem-fraction-static=0.8 \ | ||
| --cuda-graph-max-bs=128 \ | ||
| --chunked-prefill-size=131072 \ | ||
| --num-continuous-decode-steps=4 \ | ||
| --max-prefill-tokens=131072 \ | ||
| --kv-cache-dtype fp8_e4m3 \ | ||
| --attention-backend aiter \ | ||
| --speculative-algorithm EAGLE \ | ||
| --speculative-num-steps 3 \ | ||
| --speculative-eagle-topk 1 \ | ||
| --speculative-num-draft-tokens 4 \ | ||
| --disable-radix-cache $EVAL_CONTEXT_ARGS > $SERVER_LOG 2>&1 & | ||
|
|
||
| SERVER_PID=$! | ||
|
|
||
| wait_for_server_ready --port "$PORT" --server-log "$SERVER_LOG" --server-pid "$SERVER_PID" | ||
|
|
||
| run_benchmark_serving \ | ||
| --model "$MODEL" \ | ||
| --port "$PORT" \ | ||
| --backend vllm \ | ||
| --input-len "$ISL" \ | ||
| --output-len "$OSL" \ | ||
| --random-range-ratio "$RANDOM_RANGE_RATIO" \ | ||
| --num-prompts $(( $CONC * 10 )) \ | ||
| --max-concurrency "$CONC" \ | ||
| --result-filename "$RESULT_FILENAME" \ | ||
| --result-dir /workspace/ \ | ||
| --use-chat-template | ||
|
|
||
| if [ "${RUN_EVAL}" = "true" ]; then | ||
| run_eval --framework lm-eval --port "$PORT" | ||
| append_lm_eval_summary | ||
| fi | ||
|
|
||
| stop_gpu_monitor | ||
| set +x | ||
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🔴 The newly added
benchmarks/single_node/dsr1_fp8_mi300x_mtp.shis unreachable:runners/launch_mi300x-amds.sh:41hardcodes the bench-script path asbenchmarks/single_node/${SCENARIO_SUBDIR}${EXP_NAME%%_*}_${PRECISION}_mi300x.shwith noSPEC_SUFFIXdispatch, so thedsr1-fp8-mi300x-sglang-mtprecipe will always invoke the existing non-MTPdsr1_fp8_mi300x.shregardless ofspec-decoding: mtp. The sweep will silently run the autoregressive recipe (no--speculative-*flags,SGLANG_ENABLE_SPEC_V2unset) and publish vanilla numbers under the-mtprecipe name. Fix: addSPEC_SUFFIX=$([[ "$SPEC_DECODING" == "mtp" ]] && printf _mtp || printf "")tolaunch_mi300x-amds.shand append it to the script path, mirroringlaunch_b200-cw.sh:6-16/launch_mi355x-amds.sh:183,226.Extended reasoning...
What's broken
The PR adds a new recipe
dsr1-fp8-mi300x-sglang-mtp(.github/configs/amd-master.yaml:1823) withspec-decoding: mtp, plus a sibling bench scriptbenchmarks/single_node/dsr1_fp8_mi300x_mtp.shthat wires up the--speculative-algorithm EAGLE …flags andSGLANG_ENABLE_SPEC_V2=1. But the launcher that routes mi300x runners has no path to ever execute that file.Code path
The dispatcher
.github/workflows/benchmark-tmpl.ymlinvokesrunners/launch_mi300x-amds.shfor any recipe whoserunner: mi300x. That launcher ends with a single hardcoded line atrunners/launch_mi300x-amds.sh:41:EXP_NAMEis built byutils/matrix_logic/generate_sweep_configs.py:290asf"{model_code}_{seq_len_str}"wheremodel_codeis the recipe'smodel-prefix. For this PR's recipe (model-prefix: dsr1),EXP_NAMEwill be e.g.dsr1_1k1kordsr1_8k1k, so${EXP_NAME%%_*}=dsr1. WithPRECISION=fp8and noSCENARIO_SUBDIR(fixed-seq-len is the default, empty subdir), the launcher always resolves to:— which exists (the non-MTP script from origin/main) and runs cleanly, so there's no error to alert anyone.
Why every other launcher handles this
Every launcher that owns an MTP recipe computes a
SPEC_SUFFIXand appends it to the script name. Examples from the tree:runners/launch_mi355x-amds.sh:183,226-227—SPEC_SUFFIX=$([[ "$SPEC_DECODING" == "mtp" ]] && printf '_mtp' || printf '')andSCRIPT_BASE="${EXP_NAME%%_*}_${PRECISION}_mi355x"followed by${SCRIPT_BASE}_${FRAMEWORK}${SPEC_SUFFIX}.sh.runners/launch_b200-cw.sh:8,13,launch_b200-nb.sh:6,launch_b200-dgxc.sh:335,launch_b300-nv.sh:294,launch_h200-cw.sh:8,47,launch_h200-nb.sh:8,launch_h200-dgxc-slurm.sh:300— all carry the sameSPEC_SUFFIXpattern.launch_mi300x-amds.shand (separately)launch_mi325x-amds.share the only two launchers without it; until now there was no MTP recipe targeting either runner, so the omission was harmless. This PR is the first MTP recipe on mi300x, so the omission becomes load-bearing.Impact
The label says this PR is
full-sweep-enabled, so when the sweep runs:--speculative-algorithm EAGLE,--speculative-num-steps,--speculative-eagle-topk,--speculative-num-draft-tokens, and withoutSGLANG_ENABLE_SPEC_V2=1.dsr1-fp8-mi300x-sglang-mtpand written to the perf history.That publishes non-MTP numbers under the MTP recipe name — which is exactly the opposite of what the PR is trying to demonstrate (EAGLE speedup vs. baseline). The bench script that was added in this PR contributes zero observable behavior.
Step-by-step proof
dsr1-fp8-mi300x-sglang-mtpfromamd-master.yaml→ exportsMODEL=deepseek-ai/DeepSeek-R1-0528,PRECISION=fp8,MODEL_PREFIX=dsr1,SPEC_DECODING=mtp.generate_sweep_configs.py:290constructsEXP_NAME=dsr1_1k1k(ordsr1_8k1k)..github/workflows/benchmark-tmpl.ymlroutesrunner: mi300x→ invokesrunners/launch_mi300x-amds.sh.SCENARIO_SUBDIR="",${EXP_NAME%%_*}=dsr1,PRECISION=fp8→bash benchmarks/single_node/dsr1_fp8_mi300x.sh.SPEC_DECODING=mtpwas never consulted by the launcher.dsr1_fp8_mi300x_mtp.shis never read.Fix
Mirror
launch_b200-cw.sh:6-16(orlaunch_mi355x-amds.sh:183,226) inlaunch_mi300x-amds.sh: computeSPEC_SUFFIX=$([[ "$SPEC_DECODING" == "mtp" ]] && printf _mtp || printf "")early, then change line 41's path to${EXP_NAME%%_*}_${PRECISION}_mi300x${SPEC_SUFFIX}.sh. Worth landing the same patch onlaunch_mi325x-amds.shpreemptively, since the same problem will resurface the next time someone adds an mi325x MTP recipe.