[Handoff to @Oseltamivir Claude /loop] [Klaud Cold] Add dsr1-fp8-mi300x-sglang-mtp recipe by functionstackx · Pull Request #1499 · SemiAnalysisAI/InferenceX

functionstackx · 2026-05-18T17:04:00Z

Summary

MTP/EAGLE sibling of the existing dsr1-fp8-mi300x-sglang recipe (DeepSeek-R1-0528). Same image (lmsysorg/sglang:v0.5.12-rocm700-mi30x, just merged via #1425), same model (deepseek-ai/DeepSeek-R1-0528), same TP=8 search-space (conc 4..64, 1k1k + 8k1k).

Launch script

benchmarks/single_node/dsr1_fp8_mi300x_mtp.sh mirrors dsr1_fp8_mi300x.sh (aiter attention, MLA-persist, MEC-FW gate for HSA_NO_SCRATCH_RECLAIM) and adds:

export SGLANG_ENABLE_SPEC_V2=1
--speculative-algorithm EAGLE --speculative-num-steps 3 --speculative-eagle-topk 1 --speculative-num-draft-tokens 4
--use-chat-template on the bench client per AGENTS.md

Test plan

YAML loads; bash -n syntax passes on the launch script.
full-sweep-enabled sweep finishes green on mi300x.

🤖 Generated with Claude Code

github-actions · 2026-05-18T17:04:14Z

Thanks for the contribution! For vLLM & SGLang, please ensure that your recipes is similar to the official vLLM recipes and/or the SGLang cookbook

If it is not, please create a PR first before we can merge your single node PR into the master branch. Let's ensure that the documentation is first class such that the entire ML community can benefit from your hard work! Thank you

PR authors are responsible for ensuring that after merging, all GitHub Action jobs fully pass. A lot of the time, failures are just flakes and simply re-running the failed jobs will fix it. If re-running failed jobs is attempted, PR authors are responsible for ensuring it passes. See GitHub's docs on re-running failed jobs: https://docs.github.com/en/actions/how-tos/manage-workflow-runs/re-run-workflows-and-jobs#re-running-failed-jobs-in-a-workflow

As a rule of thumb, generally, PR authors should request a review & get a PR approval from the respective companies' CODEOWNERS before requesting a review from core maintainers.

If additional help is needed, PR authors can reach out to core maintainers over Slack.

github-actions · 2026-05-18T17:04:57Z

see unofficial run visualizer at https://inferencex.semianalysis.com/inference?unofficialRun=26048204089
see unofficial run visualizer at https://inferencex.semianalysis.com/evaluation?unofficialRun=26048204089

claude · 2026-05-18T17:15:22Z

+#!/usr/bin/env bash
+
+# DeepSeek-R1-0528 FP8 on MI300X with EAGLE/MTP speculative decoding.
+# Mirrors dsr1_fp8_mi300x.sh and adds the speculative-* flags.
+


🔴 The newly added benchmarks/single_node/dsr1_fp8_mi300x_mtp.sh is unreachable: runners/launch_mi300x-amds.sh:41 hardcodes the bench-script path as benchmarks/single_node/${SCENARIO_SUBDIR}${EXP_NAME%%_*}_${PRECISION}_mi300x.sh with no SPEC_SUFFIX dispatch, so the dsr1-fp8-mi300x-sglang-mtp recipe will always invoke the existing non-MTP dsr1_fp8_mi300x.sh regardless of spec-decoding: mtp. The sweep will silently run the autoregressive recipe (no --speculative-* flags, SGLANG_ENABLE_SPEC_V2 unset) and publish vanilla numbers under the -mtp recipe name. Fix: add SPEC_SUFFIX=$([[ "$SPEC_DECODING" == "mtp" ]] && printf _mtp || printf "") to launch_mi300x-amds.sh and append it to the script path, mirroring launch_b200-cw.sh:6-16 / launch_mi355x-amds.sh:183,226.

Extended reasoning...

What's broken

The PR adds a new recipe dsr1-fp8-mi300x-sglang-mtp (.github/configs/amd-master.yaml:1823) with spec-decoding: mtp, plus a sibling bench script benchmarks/single_node/dsr1_fp8_mi300x_mtp.sh that wires up the --speculative-algorithm EAGLE … flags and SGLANG_ENABLE_SPEC_V2=1. But the launcher that routes mi300x runners has no path to ever execute that file.

Code path

The dispatcher .github/workflows/benchmark-tmpl.yml invokes runners/launch_mi300x-amds.sh for any recipe whose runner: mi300x. That launcher ends with a single hardcoded line at runners/launch_mi300x-amds.sh:41:

bash benchmarks/single_node/${SCENARIO_SUBDIR}${EXP_NAME%%_*}_${PRECISION}_mi300x.sh

EXP_NAME is built by utils/matrix_logic/generate_sweep_configs.py:290 as f"{model_code}_{seq_len_str}" where model_code is the recipe's model-prefix. For this PR's recipe (model-prefix: dsr1), EXP_NAME will be e.g. dsr1_1k1k or dsr1_8k1k, so ${EXP_NAME%%_*}=dsr1. With PRECISION=fp8 and no SCENARIO_SUBDIR (fixed-seq-len is the default, empty subdir), the launcher always resolves to:

benchmarks/single_node/dsr1_fp8_mi300x.sh

— which exists (the non-MTP script from origin/main) and runs cleanly, so there's no error to alert anyone.

Why every other launcher handles this

Every launcher that owns an MTP recipe computes a SPEC_SUFFIX and appends it to the script name. Examples from the tree:

runners/launch_mi355x-amds.sh:183,226-227 — SPEC_SUFFIX=$([[ "$SPEC_DECODING" == "mtp" ]] && printf '_mtp' || printf '') and SCRIPT_BASE="${EXP_NAME%%_*}_${PRECISION}_mi355x" followed by ${SCRIPT_BASE}_${FRAMEWORK}${SPEC_SUFFIX}.sh.

runners/launch_b200-cw.sh:8,13, launch_b200-nb.sh:6, launch_b200-dgxc.sh:335, launch_b300-nv.sh:294, launch_h200-cw.sh:8,47, launch_h200-nb.sh:8, launch_h200-dgxc-slurm.sh:300 — all carry the same SPEC_SUFFIX pattern.

launch_mi300x-amds.sh and (separately) launch_mi325x-amds.sh are the only two launchers without it; until now there was no MTP recipe targeting either runner, so the omission was harmless. This PR is the first MTP recipe on mi300x, so the omission becomes load-bearing.

Impact

The label says this PR is full-sweep-enabled, so when the sweep runs:

The server gets launched without --speculative-algorithm EAGLE, --speculative-num-steps, --speculative-eagle-topk, --speculative-num-draft-tokens, and without SGLANG_ENABLE_SPEC_V2=1.

The benchmark client measures vanilla autoregressive decode latency/throughput on DeepSeek-R1-0528 fp8.

Results are tagged dsr1-fp8-mi300x-sglang-mtp and written to the perf history.

That publishes non-MTP numbers under the MTP recipe name — which is exactly the opposite of what the PR is trying to demonstrate (EAGLE speedup vs. baseline). The bench script that was added in this PR contributes zero observable behavior.

Step-by-step proof

CI selects the recipe dsr1-fp8-mi300x-sglang-mtp from amd-master.yaml → exports MODEL=deepseek-ai/DeepSeek-R1-0528, PRECISION=fp8, MODEL_PREFIX=dsr1, SPEC_DECODING=mtp.

generate_sweep_configs.py:290 constructs EXP_NAME=dsr1_1k1k (or dsr1_8k1k).

.github/workflows/benchmark-tmpl.yml routes runner: mi300x → invokes runners/launch_mi300x-amds.sh.

Line 41 expands: SCENARIO_SUBDIR="", ${EXP_NAME%%_*}=dsr1, PRECISION=fp8 → bash benchmarks/single_node/dsr1_fp8_mi300x.sh.

That file (origin/main, the non-MTP recipe) runs end-to-end and succeeds; SPEC_DECODING=mtp was never consulted by the launcher.

The new dsr1_fp8_mi300x_mtp.sh is never read.

Fix

Mirror launch_b200-cw.sh:6-16 (or launch_mi355x-amds.sh:183,226) in launch_mi300x-amds.sh: compute SPEC_SUFFIX=$([[ "$SPEC_DECODING" == "mtp" ]] && printf _mtp || printf "") early, then change line 41's path to ${EXP_NAME%%_*}_${PRECISION}_mi300x${SPEC_SUFFIX}.sh. Worth landing the same patch on launch_mi325x-amds.sh preemptively, since the same problem will resurface the next time someone adds an mi325x MTP recipe.

github-actions · 2026-05-19T13:40:25Z

see unofficial run visualizer at https://inferencex.semianalysis.com/inference?unofficialRun=26048211958
see unofficial run visualizer at https://inferencex.semianalysis.com/evaluation?unofficialRun=26048211958

MTP/EAGLE sibling of dsr1-fp8-mi300x-sglang. Same image (lmsysorg/sglang:v0.5.12-rocm700-mi30x), same model (deepseek-ai/DeepSeek-R1-0528), same TP=8 search-space. Launch script mirrors dsr1_fp8_mi300x.sh (aiter attention, mem-fraction 0.8, HSA_NO_SCRATCH_RECLAIM gate) and adds SGLANG_ENABLE_SPEC_V2=1 plus the standard EAGLE knobs (num-steps 3, eagle-topk 1, num-draft-tokens 4) plus --use-chat-template on the bench client per AGENTS.md. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

github-actions · 2026-05-20T22:46:03Z

see unofficial run visualizer at https://inferencex.semianalysis.com/inference?unofficialRun=26185764549
see unofficial run visualizer at https://inferencex.semianalysis.com/evaluation?unofficialRun=26185764549

functionstackx requested a review from a team May 18, 2026 17:04

functionstackx added the full-sweep-enabled label May 18, 2026

functionstackx requested review from billishyahao, chunfangamd, seungrokj and yctseng0211 as code owners May 18, 2026 17:04

github-project-automation Bot added this to InferenceMAX Board May 18, 2026

functionstackx requested a review from 1am9trash as a code owner May 18, 2026 17:04

functionstackx added a commit that referenced this pull request May 18, 2026

chore: fill pr-link for #1499

d9f101a

claude Bot reviewed May 18, 2026

View reviewed changes

functionstackx changed the title ~~[Klaud Cold] Add dsr1-fp8-mi300x-sglang-mtp recipe~~ [Handoff to @Oseltamivir Claude /loop] [Klaud Cold] Add dsr1-fp8-mi300x-sglang-mtp recipe May 20, 2026

functionstackx mentioned this pull request May 20, 2026

[AI Generated] [Handoff] out of 70+ image updates, 13 stuck Klaud Cold PRs need upstream coordination / scope decisions #1511

Open

functionstackx and others added 2 commits May 20, 2026 15:42

chore: fill pr-link for #1499

e97f84f

functionstackx force-pushed the add-dsr1-fp8-mi300x-sglang-mtp branch from d9f101a to e97f84f Compare May 20, 2026 19:42

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Handoff to @Oseltamivir Claude /loop] [Klaud Cold] Add dsr1-fp8-mi300x-sglang-mtp recipe#1499

[Handoff to @Oseltamivir Claude /loop] [Klaud Cold] Add dsr1-fp8-mi300x-sglang-mtp recipe#1499
functionstackx wants to merge 2 commits into
mainfrom
add-dsr1-fp8-mi300x-sglang-mtp

functionstackx commented May 18, 2026

Uh oh!

github-actions Bot commented May 18, 2026

Uh oh!

github-actions Bot commented May 18, 2026

Uh oh!

claude Bot May 18, 2026

Uh oh!

github-actions Bot commented May 19, 2026

Uh oh!

github-actions Bot commented May 20, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

functionstackx commented May 18, 2026

Summary

Launch script

Test plan

Uh oh!

github-actions Bot commented May 18, 2026

Uh oh!

github-actions Bot commented May 18, 2026

Uh oh!

claude Bot May 18, 2026

Choose a reason for hiding this comment

What's broken

Code path

Why every other launcher handles this

Impact

Step-by-step proof

Fix

Uh oh!

github-actions Bot commented May 19, 2026

Uh oh!

github-actions Bot commented May 20, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant