[Handoff to @Oseltamivir Claude /loop] [Klaud Cold] Add glm5.1-fp4-mi355x-sglang-mtp recipe#1494
[Handoff to @Oseltamivir Claude /loop] [Klaud Cold] Add glm5.1-fp4-mi355x-sglang-mtp recipe#1494functionstackx wants to merge 3 commits into
Conversation
MTP/EAGLE sibling of the existing glm5.1-fp4-mi355x-sglang recipe. Same model (amd/GLM-5.1-MXFP4), same image (currently v0.5.10rc0 — will get bumped together with the off variant when #1441's v0.5.12 bump lands), same TP=2/TP=4 search-space. Launch script mirrors glm5.1_fp4_mi355x.sh and adds SGLANG_ENABLE_SPEC_V2=1 + the standard EAGLE knobs (num-steps 3, eagle-topk 1, num-draft-tokens 4) plus --use-chat-template on the bench client per AGENTS.md. Earlier attempt at this was the closed #1091 (sglang broken at the time) and the stale #1254 (targeting an older v0.5.10.post1-rocm700 tag) — both superseded by this fresh recipe on the current canonical image used by all live mi355x sglang-rocm recipes. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
lmsysorg/sglang (no -rocm suffix) is the active repo that ships clean un-suffixed mi35x tags. The -rocm sibling repo only has dated nightlies. Use the stable tag here so the new MTP recipe lands on the same v0.5.12 release everyone else is on.
|
Thanks for the contribution! For vLLM & SGLang, please ensure that your recipes is similar to the official vLLM recipes and/or the SGLang cookbook If it is not, please create a PR first before we can merge your single node PR into the master branch. Let's ensure that the documentation is first class such that the entire ML community can benefit from your hard work! Thank you PR authors are responsible for ensuring that after merging, all GitHub Action jobs fully pass. A lot of the time, failures are just flakes and simply re-running the failed jobs will fix it. If re-running failed jobs is attempted, PR authors are responsible for ensuring it passes. See GitHub's docs on re-running failed jobs: https://docs.github.com/en/actions/how-tos/manage-workflow-runs/re-run-workflows-and-jobs#re-running-failed-jobs-in-a-workflow As a rule of thumb, generally, PR authors should request a review & get a PR approval from the respective companies' CODEOWNERS before requesting a review from core maintainers. If additional help is needed, PR authors can reach out to core maintainers over Slack. |
3 similar comments
|
Thanks for the contribution! For vLLM & SGLang, please ensure that your recipes is similar to the official vLLM recipes and/or the SGLang cookbook If it is not, please create a PR first before we can merge your single node PR into the master branch. Let's ensure that the documentation is first class such that the entire ML community can benefit from your hard work! Thank you PR authors are responsible for ensuring that after merging, all GitHub Action jobs fully pass. A lot of the time, failures are just flakes and simply re-running the failed jobs will fix it. If re-running failed jobs is attempted, PR authors are responsible for ensuring it passes. See GitHub's docs on re-running failed jobs: https://docs.github.com/en/actions/how-tos/manage-workflow-runs/re-run-workflows-and-jobs#re-running-failed-jobs-in-a-workflow As a rule of thumb, generally, PR authors should request a review & get a PR approval from the respective companies' CODEOWNERS before requesting a review from core maintainers. If additional help is needed, PR authors can reach out to core maintainers over Slack. |
|
Thanks for the contribution! For vLLM & SGLang, please ensure that your recipes is similar to the official vLLM recipes and/or the SGLang cookbook If it is not, please create a PR first before we can merge your single node PR into the master branch. Let's ensure that the documentation is first class such that the entire ML community can benefit from your hard work! Thank you PR authors are responsible for ensuring that after merging, all GitHub Action jobs fully pass. A lot of the time, failures are just flakes and simply re-running the failed jobs will fix it. If re-running failed jobs is attempted, PR authors are responsible for ensuring it passes. See GitHub's docs on re-running failed jobs: https://docs.github.com/en/actions/how-tos/manage-workflow-runs/re-run-workflows-and-jobs#re-running-failed-jobs-in-a-workflow As a rule of thumb, generally, PR authors should request a review & get a PR approval from the respective companies' CODEOWNERS before requesting a review from core maintainers. If additional help is needed, PR authors can reach out to core maintainers over Slack. |
|
Thanks for the contribution! For vLLM & SGLang, please ensure that your recipes is similar to the official vLLM recipes and/or the SGLang cookbook If it is not, please create a PR first before we can merge your single node PR into the master branch. Let's ensure that the documentation is first class such that the entire ML community can benefit from your hard work! Thank you PR authors are responsible for ensuring that after merging, all GitHub Action jobs fully pass. A lot of the time, failures are just flakes and simply re-running the failed jobs will fix it. If re-running failed jobs is attempted, PR authors are responsible for ensuring it passes. See GitHub's docs on re-running failed jobs: https://docs.github.com/en/actions/how-tos/manage-workflow-runs/re-run-workflows-and-jobs#re-running-failed-jobs-in-a-workflow As a rule of thumb, generally, PR authors should request a review & get a PR approval from the respective companies' CODEOWNERS before requesting a review from core maintainers. If additional help is needed, PR authors can reach out to core maintainers over Slack. |
|
see unofficial run visualizer at https://inferencex.semianalysis.com/inference?unofficialRun=26018160535 |
There was a problem hiding this comment.
LGTM — straightforward MTP sibling recipe mirroring the existing glm5.1-fp4-mi355x-sglang pattern.
Extended reasoning...
Overview
This PR adds a new MTP/EAGLE speculative-decoding sibling of the existing glm5.1-fp4-mi355x-sglang recipe. Three files touched: a new YAML entry in .github/configs/amd-master.yaml, a new launch script benchmarks/single_node/glm5.1_fp4_mi355x_mtp.sh that mirrors the non-MTP launcher, and a perf-changelog.yaml entry. The YAML entry uses the canonical lmsysorg/sglang:v0.5.12-rocm720-mi35x stable tag with the same TP=2/TP=4 sweep as the non-MTP sibling, plus spec-decoding: mtp markers.
Security risks
None. This is benchmark recipe configuration and a shell launcher that runs inside a sandboxed benchmarking container. No auth, crypto, permissions, or external-input-handling code is touched.
Level of scrutiny
Low. This is a mechanical recipe addition following an established pattern — several other -mtp siblings already exist in the same YAML (glm5-fp8-mi355x-sglang-mtp, qwen3.5-bf16-mi355x-sglang-mtp, qwen3.5-fp8-mi355x-sglang-mtp, dsr1-fp8-mi355x-atom-mtp). The launch script is a clean mirror of glm5.1_fp4_mi355x.sh with the documented additions (SGLANG_ENABLE_SPEC_V2=1, --speculative-* flags, --use-chat-template).
Other factors
The bug hunting system found no issues. The PR description clearly explains why this supersedes prior attempts (#1091 closed, #1254 stale). The full-sweep CI check (full-sweep-enabled label) will validate runtime behavior independently. No CODEOWNER-protected paths are touched.
|
@claude /sweep test-config --config-files .github/configs/amd-master.yaml --config-keys glm5.1-fp4-mi355x-sglang-mtp |
|
Claude finished @chunfangamd's task in 0s —— View job I'll analyze this and get back to you. |
|
Handing off to @Oseltamivir — added as §10 in #1511. GSM8K accuracy regression (0.18 vs 0.85 threshold) on the MTP+EAGLE variant needs a recipe-tuning judgment call rather than another automated retry. AI-generated via Claude Code /loop. |
|
see unofficial run visualizer at https://inferencex.semianalysis.com/inference?unofficialRun=26018162860 |
|
Filed an upstream sglang bug report for this GSM8K accuracy regression: sgl-project/sglang#25742 Covers both the OFF variant (#1441, gsm8k=0.32) and the EAGLE-MTP variant (#1494, gsm8k=0.18) since they share the same root cause (GLM-5.1-MXFP4 on MI355X under v0.5.12-rocm720-mi35x is producing weak chain-of-thought output). AI-generated via Claude Code /loop. |
Summary
MTP/EAGLE sibling of the existing
glm5.1-fp4-mi355x-sglangrecipe. Modelamd/GLM-5.1-MXFP4, same TP=2 / TP=4 search-space, on the cleanlmsysorg/sglang:v0.5.12-rocm720-mi35xstable tag (lmsysorg/sglang— notlmsysorg/sglang-rocm; the latter only ships dated nightlies).Why this PR vs reviving older attempts
[SGLang broken]); sglang couldn't run MTP on this model when filed.lmsysorg/sglang-rocm:v0.5.10.post1-rocm700-mi35x-20260428(stale rocm700 dated nightly on the deprecated-rocmrepo).This PR lands a fresh recipe on the current canonical v0.5.12 stable tag. #1254 can be closed.
Launch script
benchmarks/single_node/glm5.1_fp4_mi355x_mtp.shmirrorsglm5.1_fp4_mi355x.sh(NSA/tilelang backends, ROCm tuning,pip install -U transformers) and adds:export SGLANG_ENABLE_SPEC_V2=1--speculative-algorithm EAGLE --speculative-num-steps 3 --speculative-eagle-topk 1 --speculative-num-draft-tokens 4--use-chat-templateon the bench client per AGENTS.mdTest plan
bash -nsyntax passes on the launch script.🤖 Generated with Claude Code