Skip to content

[Handoff to @Oseltamivir Claude /loop] [Klaud Cold] Add dsr1-fp8-mi300x-sglang-mtp recipe#1499

Open
functionstackx wants to merge 2 commits into
mainfrom
add-dsr1-fp8-mi300x-sglang-mtp
Open

[Handoff to @Oseltamivir Claude /loop] [Klaud Cold] Add dsr1-fp8-mi300x-sglang-mtp recipe#1499
functionstackx wants to merge 2 commits into
mainfrom
add-dsr1-fp8-mi300x-sglang-mtp

Conversation

@functionstackx
Copy link
Copy Markdown
Collaborator

Summary

MTP/EAGLE sibling of the existing dsr1-fp8-mi300x-sglang recipe (DeepSeek-R1-0528). Same image (lmsysorg/sglang:v0.5.12-rocm700-mi30x, just merged via #1425), same model (deepseek-ai/DeepSeek-R1-0528), same TP=8 search-space (conc 4..64, 1k1k + 8k1k).

Launch script

benchmarks/single_node/dsr1_fp8_mi300x_mtp.sh mirrors dsr1_fp8_mi300x.sh (aiter attention, MLA-persist, MEC-FW gate for HSA_NO_SCRATCH_RECLAIM) and adds:

  • export SGLANG_ENABLE_SPEC_V2=1
  • --speculative-algorithm EAGLE --speculative-num-steps 3 --speculative-eagle-topk 1 --speculative-num-draft-tokens 4
  • --use-chat-template on the bench client per AGENTS.md

Test plan

  • YAML loads; bash -n syntax passes on the launch script.
  • full-sweep-enabled sweep finishes green on mi300x.

🤖 Generated with Claude Code

@github-actions
Copy link
Copy Markdown
Contributor

Thanks for the contribution! For vLLM & SGLang, please ensure that your recipes is similar to the official vLLM recipes and/or the SGLang cookbook

If it is not, please create a PR first before we can merge your single node PR into the master branch. Let's ensure that the documentation is first class such that the entire ML community can benefit from your hard work! Thank you

PR authors are responsible for ensuring that after merging, all GitHub Action jobs fully pass. A lot of the time, failures are just flakes and simply re-running the failed jobs will fix it. If re-running failed jobs is attempted, PR authors are responsible for ensuring it passes. See GitHub's docs on re-running failed jobs: https://docs.github.com/en/actions/how-tos/manage-workflow-runs/re-run-workflows-and-jobs#re-running-failed-jobs-in-a-workflow

As a rule of thumb, generally, PR authors should request a review & get a PR approval from the respective companies' CODEOWNERS before requesting a review from core maintainers.

If additional help is needed, PR authors can reach out to core maintainers over Slack.

@github-actions
Copy link
Copy Markdown
Contributor

Comment on lines +1 to +5
#!/usr/bin/env bash

# DeepSeek-R1-0528 FP8 on MI300X with EAGLE/MTP speculative decoding.
# Mirrors dsr1_fp8_mi300x.sh and adds the speculative-* flags.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔴 The newly added benchmarks/single_node/dsr1_fp8_mi300x_mtp.sh is unreachable: runners/launch_mi300x-amds.sh:41 hardcodes the bench-script path as benchmarks/single_node/${SCENARIO_SUBDIR}${EXP_NAME%%_*}_${PRECISION}_mi300x.sh with no SPEC_SUFFIX dispatch, so the dsr1-fp8-mi300x-sglang-mtp recipe will always invoke the existing non-MTP dsr1_fp8_mi300x.sh regardless of spec-decoding: mtp. The sweep will silently run the autoregressive recipe (no --speculative-* flags, SGLANG_ENABLE_SPEC_V2 unset) and publish vanilla numbers under the -mtp recipe name. Fix: add SPEC_SUFFIX=$([[ "$SPEC_DECODING" == "mtp" ]] && printf _mtp || printf "") to launch_mi300x-amds.sh and append it to the script path, mirroring launch_b200-cw.sh:6-16 / launch_mi355x-amds.sh:183,226.

Extended reasoning...

What's broken

The PR adds a new recipe dsr1-fp8-mi300x-sglang-mtp (.github/configs/amd-master.yaml:1823) with spec-decoding: mtp, plus a sibling bench script benchmarks/single_node/dsr1_fp8_mi300x_mtp.sh that wires up the --speculative-algorithm EAGLE … flags and SGLANG_ENABLE_SPEC_V2=1. But the launcher that routes mi300x runners has no path to ever execute that file.

Code path

The dispatcher .github/workflows/benchmark-tmpl.yml invokes runners/launch_mi300x-amds.sh for any recipe whose runner: mi300x. That launcher ends with a single hardcoded line at runners/launch_mi300x-amds.sh:41:

bash benchmarks/single_node/${SCENARIO_SUBDIR}${EXP_NAME%%_*}_${PRECISION}_mi300x.sh

EXP_NAME is built by utils/matrix_logic/generate_sweep_configs.py:290 as f"{model_code}_{seq_len_str}" where model_code is the recipe's model-prefix. For this PR's recipe (model-prefix: dsr1), EXP_NAME will be e.g. dsr1_1k1k or dsr1_8k1k, so ${EXP_NAME%%_*}=dsr1. With PRECISION=fp8 and no SCENARIO_SUBDIR (fixed-seq-len is the default, empty subdir), the launcher always resolves to:

benchmarks/single_node/dsr1_fp8_mi300x.sh

— which exists (the non-MTP script from origin/main) and runs cleanly, so there's no error to alert anyone.

Why every other launcher handles this

Every launcher that owns an MTP recipe computes a SPEC_SUFFIX and appends it to the script name. Examples from the tree:

  • runners/launch_mi355x-amds.sh:183,226-227SPEC_SUFFIX=$([[ "$SPEC_DECODING" == "mtp" ]] && printf '_mtp' || printf '') and SCRIPT_BASE="${EXP_NAME%%_*}_${PRECISION}_mi355x" followed by ${SCRIPT_BASE}_${FRAMEWORK}${SPEC_SUFFIX}.sh.
  • runners/launch_b200-cw.sh:8,13, launch_b200-nb.sh:6, launch_b200-dgxc.sh:335, launch_b300-nv.sh:294, launch_h200-cw.sh:8,47, launch_h200-nb.sh:8, launch_h200-dgxc-slurm.sh:300 — all carry the same SPEC_SUFFIX pattern.

launch_mi300x-amds.sh and (separately) launch_mi325x-amds.sh are the only two launchers without it; until now there was no MTP recipe targeting either runner, so the omission was harmless. This PR is the first MTP recipe on mi300x, so the omission becomes load-bearing.

Impact

The label says this PR is full-sweep-enabled, so when the sweep runs:

  1. The server gets launched without --speculative-algorithm EAGLE, --speculative-num-steps, --speculative-eagle-topk, --speculative-num-draft-tokens, and without SGLANG_ENABLE_SPEC_V2=1.
  2. The benchmark client measures vanilla autoregressive decode latency/throughput on DeepSeek-R1-0528 fp8.
  3. Results are tagged dsr1-fp8-mi300x-sglang-mtp and written to the perf history.

That publishes non-MTP numbers under the MTP recipe name — which is exactly the opposite of what the PR is trying to demonstrate (EAGLE speedup vs. baseline). The bench script that was added in this PR contributes zero observable behavior.

Step-by-step proof

  1. CI selects the recipe dsr1-fp8-mi300x-sglang-mtp from amd-master.yaml → exports MODEL=deepseek-ai/DeepSeek-R1-0528, PRECISION=fp8, MODEL_PREFIX=dsr1, SPEC_DECODING=mtp.
  2. generate_sweep_configs.py:290 constructs EXP_NAME=dsr1_1k1k (or dsr1_8k1k).
  3. .github/workflows/benchmark-tmpl.yml routes runner: mi300x → invokes runners/launch_mi300x-amds.sh.
  4. Line 41 expands: SCENARIO_SUBDIR="", ${EXP_NAME%%_*}=dsr1, PRECISION=fp8bash benchmarks/single_node/dsr1_fp8_mi300x.sh.
  5. That file (origin/main, the non-MTP recipe) runs end-to-end and succeeds; SPEC_DECODING=mtp was never consulted by the launcher.
  6. The new dsr1_fp8_mi300x_mtp.sh is never read.

Fix

Mirror launch_b200-cw.sh:6-16 (or launch_mi355x-amds.sh:183,226) in launch_mi300x-amds.sh: compute SPEC_SUFFIX=$([[ "$SPEC_DECODING" == "mtp" ]] && printf _mtp || printf "") early, then change line 41's path to ${EXP_NAME%%_*}_${PRECISION}_mi300x${SPEC_SUFFIX}.sh. Worth landing the same patch on launch_mi325x-amds.sh preemptively, since the same problem will resurface the next time someone adds an mi325x MTP recipe.

@github-actions
Copy link
Copy Markdown
Contributor

@functionstackx functionstackx changed the title [Klaud Cold] Add dsr1-fp8-mi300x-sglang-mtp recipe [Handoff to @Oseltamivir Claude /loop] [Klaud Cold] Add dsr1-fp8-mi300x-sglang-mtp recipe May 20, 2026
functionstackx and others added 2 commits May 20, 2026 15:42
MTP/EAGLE sibling of dsr1-fp8-mi300x-sglang. Same image
(lmsysorg/sglang:v0.5.12-rocm700-mi30x), same model
(deepseek-ai/DeepSeek-R1-0528), same TP=8 search-space. Launch script
mirrors dsr1_fp8_mi300x.sh (aiter attention, mem-fraction 0.8,
HSA_NO_SCRATCH_RECLAIM gate) and adds SGLANG_ENABLE_SPEC_V2=1 plus the
standard EAGLE knobs (num-steps 3, eagle-topk 1, num-draft-tokens 4)
plus --use-chat-template on the bench client per AGENTS.md.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@functionstackx functionstackx force-pushed the add-dsr1-fp8-mi300x-sglang-mtp branch from d9f101a to e97f84f Compare May 20, 2026 19:42
@github-actions
Copy link
Copy Markdown
Contributor

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

Status: No status

Development

Successfully merging this pull request may close these issues.

1 participant