[Handoff to @Oseltamivir Claude /loop] [Klaud Cold] Add glm5.1-fp4-mi355x-sglang-mtp recipe by functionstackx · Pull Request #1494 · SemiAnalysisAI/InferenceX

functionstackx · 2026-05-18T06:51:14Z

Summary

MTP/EAGLE sibling of the existing glm5.1-fp4-mi355x-sglang recipe. Model amd/GLM-5.1-MXFP4, same TP=2 / TP=4 search-space, on the clean lmsysorg/sglang:v0.5.12-rocm720-mi35x stable tag (lmsysorg/sglang — not lmsysorg/sglang-rocm; the latter only ships dated nightlies).

Why this PR vs reviving older attempts

#1091 — closed ([SGLang broken]); sglang couldn't run MTP on this model when filed.
#1254 — open but targets lmsysorg/sglang-rocm:v0.5.10.post1-rocm700-mi35x-20260428 (stale rocm700 dated nightly on the deprecated -rocm repo).

This PR lands a fresh recipe on the current canonical v0.5.12 stable tag. #1254 can be closed.

Launch script

benchmarks/single_node/glm5.1_fp4_mi355x_mtp.sh mirrors glm5.1_fp4_mi355x.sh (NSA/tilelang backends, ROCm tuning, pip install -U transformers) and adds:

export SGLANG_ENABLE_SPEC_V2=1
--speculative-algorithm EAGLE --speculative-num-steps 3 --speculative-eagle-topk 1 --speculative-num-draft-tokens 4
--use-chat-template on the bench client per AGENTS.md

Test plan

YAML loads; bash -n syntax passes on the launch script.
full-sweep-enabled sweep finishes green on mi355x (TP=2 conc 4..256 + TP=4 conc 4..16, 1k1k + 8k1k).

🤖 Generated with Claude Code

MTP/EAGLE sibling of the existing glm5.1-fp4-mi355x-sglang recipe. Same model (amd/GLM-5.1-MXFP4), same image (currently v0.5.10rc0 — will get bumped together with the off variant when #1441's v0.5.12 bump lands), same TP=2/TP=4 search-space. Launch script mirrors glm5.1_fp4_mi355x.sh and adds SGLANG_ENABLE_SPEC_V2=1 + the standard EAGLE knobs (num-steps 3, eagle-topk 1, num-draft-tokens 4) plus --use-chat-template on the bench client per AGENTS.md. Earlier attempt at this was the closed #1091 (sglang broken at the time) and the stale #1254 (targeting an older v0.5.10.post1-rocm700 tag) — both superseded by this fresh recipe on the current canonical image used by all live mi355x sglang-rocm recipes. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

lmsysorg/sglang (no -rocm suffix) is the active repo that ships clean un-suffixed mi35x tags. The -rocm sibling repo only has dated nightlies. Use the stable tag here so the new MTP recipe lands on the same v0.5.12 release everyone else is on.

github-actions · 2026-05-18T06:51:23Z

Thanks for the contribution! For vLLM & SGLang, please ensure that your recipes is similar to the official vLLM recipes and/or the SGLang cookbook

If it is not, please create a PR first before we can merge your single node PR into the master branch. Let's ensure that the documentation is first class such that the entire ML community can benefit from your hard work! Thank you

PR authors are responsible for ensuring that after merging, all GitHub Action jobs fully pass. A lot of the time, failures are just flakes and simply re-running the failed jobs will fix it. If re-running failed jobs is attempted, PR authors are responsible for ensuring it passes. See GitHub's docs on re-running failed jobs: https://docs.github.com/en/actions/how-tos/manage-workflow-runs/re-run-workflows-and-jobs#re-running-failed-jobs-in-a-workflow

As a rule of thumb, generally, PR authors should request a review & get a PR approval from the respective companies' CODEOWNERS before requesting a review from core maintainers.

If additional help is needed, PR authors can reach out to core maintainers over Slack.

github-actions · 2026-05-18T06:51:23Z

Thanks for the contribution! For vLLM & SGLang, please ensure that your recipes is similar to the official vLLM recipes and/or the SGLang cookbook

If it is not, please create a PR first before we can merge your single node PR into the master branch. Let's ensure that the documentation is first class such that the entire ML community can benefit from your hard work! Thank you

PR authors are responsible for ensuring that after merging, all GitHub Action jobs fully pass. A lot of the time, failures are just flakes and simply re-running the failed jobs will fix it. If re-running failed jobs is attempted, PR authors are responsible for ensuring it passes. See GitHub's docs on re-running failed jobs: https://docs.github.com/en/actions/how-tos/manage-workflow-runs/re-run-workflows-and-jobs#re-running-failed-jobs-in-a-workflow

As a rule of thumb, generally, PR authors should request a review & get a PR approval from the respective companies' CODEOWNERS before requesting a review from core maintainers.

If additional help is needed, PR authors can reach out to core maintainers over Slack.

github-actions · 2026-05-18T06:51:23Z

Thanks for the contribution! For vLLM & SGLang, please ensure that your recipes is similar to the official vLLM recipes and/or the SGLang cookbook

If it is not, please create a PR first before we can merge your single node PR into the master branch. Let's ensure that the documentation is first class such that the entire ML community can benefit from your hard work! Thank you

PR authors are responsible for ensuring that after merging, all GitHub Action jobs fully pass. A lot of the time, failures are just flakes and simply re-running the failed jobs will fix it. If re-running failed jobs is attempted, PR authors are responsible for ensuring it passes. See GitHub's docs on re-running failed jobs: https://docs.github.com/en/actions/how-tos/manage-workflow-runs/re-run-workflows-and-jobs#re-running-failed-jobs-in-a-workflow

As a rule of thumb, generally, PR authors should request a review & get a PR approval from the respective companies' CODEOWNERS before requesting a review from core maintainers.

If additional help is needed, PR authors can reach out to core maintainers over Slack.

github-actions · 2026-05-18T06:51:24Z

Thanks for the contribution! For vLLM & SGLang, please ensure that your recipes is similar to the official vLLM recipes and/or the SGLang cookbook

If it is not, please create a PR first before we can merge your single node PR into the master branch. Let's ensure that the documentation is first class such that the entire ML community can benefit from your hard work! Thank you

PR authors are responsible for ensuring that after merging, all GitHub Action jobs fully pass. A lot of the time, failures are just flakes and simply re-running the failed jobs will fix it. If re-running failed jobs is attempted, PR authors are responsible for ensuring it passes. See GitHub's docs on re-running failed jobs: https://docs.github.com/en/actions/how-tos/manage-workflow-runs/re-run-workflows-and-jobs#re-running-failed-jobs-in-a-workflow

As a rule of thumb, generally, PR authors should request a review & get a PR approval from the respective companies' CODEOWNERS before requesting a review from core maintainers.

If additional help is needed, PR authors can reach out to core maintainers over Slack.

github-actions · 2026-05-18T06:51:47Z

see unofficial run visualizer at https://inferencex.semianalysis.com/inference?unofficialRun=26018160535
see unofficial run visualizer at https://inferencex.semianalysis.com/evaluation?unofficialRun=26018160535

claude

LGTM — straightforward MTP sibling recipe mirroring the existing glm5.1-fp4-mi355x-sglang pattern.

Extended reasoning...

Overview

This PR adds a new MTP/EAGLE speculative-decoding sibling of the existing glm5.1-fp4-mi355x-sglang recipe. Three files touched: a new YAML entry in .github/configs/amd-master.yaml, a new launch script benchmarks/single_node/glm5.1_fp4_mi355x_mtp.sh that mirrors the non-MTP launcher, and a perf-changelog.yaml entry. The YAML entry uses the canonical lmsysorg/sglang:v0.5.12-rocm720-mi35x stable tag with the same TP=2/TP=4 sweep as the non-MTP sibling, plus spec-decoding: mtp markers.

Security risks

None. This is benchmark recipe configuration and a shell launcher that runs inside a sandboxed benchmarking container. No auth, crypto, permissions, or external-input-handling code is touched.

Level of scrutiny

Low. This is a mechanical recipe addition following an established pattern — several other -mtp siblings already exist in the same YAML (glm5-fp8-mi355x-sglang-mtp, qwen3.5-bf16-mi355x-sglang-mtp, qwen3.5-fp8-mi355x-sglang-mtp, dsr1-fp8-mi355x-atom-mtp). The launch script is a clean mirror of glm5.1_fp4_mi355x.sh with the documented additions (SGLANG_ENABLE_SPEC_V2=1, --speculative-* flags, --use-chat-template).

Other factors

The bug hunting system found no issues. The PR description clearly explains why this supersedes prior attempts (#1091 closed, #1254 stale). The full-sweep CI check (full-sweep-enabled label) will validate runtime behavior independently. No CODEOWNER-protected paths are touched.

chunfangamd · 2026-05-18T08:53:27Z

@claude /sweep test-config --config-files .github/configs/amd-master.yaml --config-keys glm5.1-fp4-mi355x-sglang-mtp

Klaud-Cold · 2026-05-18T08:53:57Z

Claude finished @chunfangamd's task in 0s —— View job

I'll analyze this and get back to you.

functionstackx · 2026-05-18T21:15:29Z

Handing off to @Oseltamivir — added as §10 in #1511. GSM8K accuracy regression (0.18 vs 0.85 threshold) on the MTP+EAGLE variant needs a recipe-tuning judgment call rather than another automated retry.

AI-generated via Claude Code /loop.

github-actions · 2026-05-18T22:09:26Z

see unofficial run visualizer at https://inferencex.semianalysis.com/inference?unofficialRun=26018162860
see unofficial run visualizer at https://inferencex.semianalysis.com/evaluation?unofficialRun=26018162860

functionstackx · 2026-05-19T04:57:43Z

Filed an upstream sglang bug report for this GSM8K accuracy regression: sgl-project/sglang#25742

Covers both the OFF variant (#1441, gsm8k=0.32) and the EAGLE-MTP variant (#1494, gsm8k=0.18) since they share the same root cause (GLM-5.1-MXFP4 on MI355X under v0.5.12-rocm720-mi35x is producing weak chain-of-thought output).

AI-generated via Claude Code /loop.

functionstackx and others added 2 commits May 18, 2026 02:50

functionstackx requested a review from a team May 18, 2026 06:51

functionstackx added the full-sweep-enabled label May 18, 2026

functionstackx requested review from 1am9trash, billishyahao, chunfangamd, seungrokj and yctseng0211 as code owners May 18, 2026 06:51

github-project-automation Bot added this to InferenceMAX Board May 18, 2026

chore: fill pr-link for #1494

734ca0d

claude Bot reviewed May 18, 2026

View reviewed changes

functionstackx mentioned this pull request May 18, 2026

[AI Generated] [Handoff] out of 70+ image updates, 13 stuck Klaud Cold PRs need upstream coordination / scope decisions #1511

Open

functionstackx changed the title ~~[Klaud Cold] Add glm5.1-fp4-mi355x-sglang-mtp recipe~~ [Handoff to @Oseltamivir Claude /loop] [Klaud Cold] Add glm5.1-fp4-mi355x-sglang-mtp recipe May 18, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Handoff to @Oseltamivir Claude /loop] [Klaud Cold] Add glm5.1-fp4-mi355x-sglang-mtp recipe#1494

[Handoff to @Oseltamivir Claude /loop] [Klaud Cold] Add glm5.1-fp4-mi355x-sglang-mtp recipe#1494
functionstackx wants to merge 3 commits into
mainfrom
add-glm5.1-fp4-mi355x-sglang-mtp

functionstackx commented May 18, 2026

Uh oh!

github-actions Bot commented May 18, 2026

Uh oh!

github-actions Bot commented May 18, 2026

Uh oh!

github-actions Bot commented May 18, 2026

Uh oh!

github-actions Bot commented May 18, 2026

Uh oh!

github-actions Bot commented May 18, 2026

Uh oh!

claude Bot left a comment

Uh oh!

chunfangamd commented May 18, 2026

Uh oh!

Klaud-Cold commented May 18, 2026 •

edited

Loading

Uh oh!

functionstackx commented May 18, 2026

Uh oh!

github-actions Bot commented May 18, 2026

Uh oh!

functionstackx commented May 19, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

functionstackx commented May 18, 2026

Summary

Why this PR vs reviving older attempts

Launch script

Test plan

Uh oh!

github-actions Bot commented May 18, 2026

Uh oh!

github-actions Bot commented May 18, 2026

Uh oh!

github-actions Bot commented May 18, 2026

Uh oh!

github-actions Bot commented May 18, 2026

Uh oh!

github-actions Bot commented May 18, 2026

Uh oh!

claude Bot left a comment

Choose a reason for hiding this comment

Overview

Security risks

Level of scrutiny

Other factors

Uh oh!

chunfangamd commented May 18, 2026

Uh oh!

Klaud-Cold commented May 18, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

functionstackx commented May 18, 2026

Uh oh!

github-actions Bot commented May 18, 2026

Uh oh!

functionstackx commented May 19, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Klaud-Cold commented May 18, 2026 •

edited

Loading