Skip to content

[Handoff to @Oseltamivir Claude /loop] [Klaud Cold] Add glm5.1-fp4-mi355x-sglang-mtp recipe#1494

Open
functionstackx wants to merge 3 commits into
mainfrom
add-glm5.1-fp4-mi355x-sglang-mtp
Open

[Handoff to @Oseltamivir Claude /loop] [Klaud Cold] Add glm5.1-fp4-mi355x-sglang-mtp recipe#1494
functionstackx wants to merge 3 commits into
mainfrom
add-glm5.1-fp4-mi355x-sglang-mtp

Conversation

@functionstackx
Copy link
Copy Markdown
Collaborator

Summary

MTP/EAGLE sibling of the existing glm5.1-fp4-mi355x-sglang recipe. Model amd/GLM-5.1-MXFP4, same TP=2 / TP=4 search-space, on the clean lmsysorg/sglang:v0.5.12-rocm720-mi35x stable tag (lmsysorg/sglang — not lmsysorg/sglang-rocm; the latter only ships dated nightlies).

Why this PR vs reviving older attempts

  • #1091 — closed ([SGLang broken]); sglang couldn't run MTP on this model when filed.
  • #1254 — open but targets lmsysorg/sglang-rocm:v0.5.10.post1-rocm700-mi35x-20260428 (stale rocm700 dated nightly on the deprecated -rocm repo).

This PR lands a fresh recipe on the current canonical v0.5.12 stable tag. #1254 can be closed.

Launch script

benchmarks/single_node/glm5.1_fp4_mi355x_mtp.sh mirrors glm5.1_fp4_mi355x.sh (NSA/tilelang backends, ROCm tuning, pip install -U transformers) and adds:

  • export SGLANG_ENABLE_SPEC_V2=1
  • --speculative-algorithm EAGLE --speculative-num-steps 3 --speculative-eagle-topk 1 --speculative-num-draft-tokens 4
  • --use-chat-template on the bench client per AGENTS.md

Test plan

  • YAML loads; bash -n syntax passes on the launch script.
  • full-sweep-enabled sweep finishes green on mi355x (TP=2 conc 4..256 + TP=4 conc 4..16, 1k1k + 8k1k).

🤖 Generated with Claude Code

functionstackx and others added 2 commits May 18, 2026 02:50
MTP/EAGLE sibling of the existing glm5.1-fp4-mi355x-sglang recipe.
Same model (amd/GLM-5.1-MXFP4), same image (currently v0.5.10rc0 — will
get bumped together with the off variant when #1441's v0.5.12 bump
lands), same TP=2/TP=4 search-space. Launch script mirrors
glm5.1_fp4_mi355x.sh and adds SGLANG_ENABLE_SPEC_V2=1 + the standard
EAGLE knobs (num-steps 3, eagle-topk 1, num-draft-tokens 4) plus
--use-chat-template on the bench client per AGENTS.md.

Earlier attempt at this was the closed #1091 (sglang broken at the
time) and the stale #1254 (targeting an older v0.5.10.post1-rocm700
tag) — both superseded by this fresh recipe on the current canonical
image used by all live mi355x sglang-rocm recipes.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
lmsysorg/sglang (no -rocm suffix) is the active repo that ships clean
un-suffixed mi35x tags. The -rocm sibling repo only has dated nightlies.
Use the stable tag here so the new MTP recipe lands on the same v0.5.12
release everyone else is on.
@github-actions
Copy link
Copy Markdown
Contributor

Thanks for the contribution! For vLLM & SGLang, please ensure that your recipes is similar to the official vLLM recipes and/or the SGLang cookbook

If it is not, please create a PR first before we can merge your single node PR into the master branch. Let's ensure that the documentation is first class such that the entire ML community can benefit from your hard work! Thank you

PR authors are responsible for ensuring that after merging, all GitHub Action jobs fully pass. A lot of the time, failures are just flakes and simply re-running the failed jobs will fix it. If re-running failed jobs is attempted, PR authors are responsible for ensuring it passes. See GitHub's docs on re-running failed jobs: https://docs.github.com/en/actions/how-tos/manage-workflow-runs/re-run-workflows-and-jobs#re-running-failed-jobs-in-a-workflow

As a rule of thumb, generally, PR authors should request a review & get a PR approval from the respective companies' CODEOWNERS before requesting a review from core maintainers.

If additional help is needed, PR authors can reach out to core maintainers over Slack.

3 similar comments
@github-actions
Copy link
Copy Markdown
Contributor

Thanks for the contribution! For vLLM & SGLang, please ensure that your recipes is similar to the official vLLM recipes and/or the SGLang cookbook

If it is not, please create a PR first before we can merge your single node PR into the master branch. Let's ensure that the documentation is first class such that the entire ML community can benefit from your hard work! Thank you

PR authors are responsible for ensuring that after merging, all GitHub Action jobs fully pass. A lot of the time, failures are just flakes and simply re-running the failed jobs will fix it. If re-running failed jobs is attempted, PR authors are responsible for ensuring it passes. See GitHub's docs on re-running failed jobs: https://docs.github.com/en/actions/how-tos/manage-workflow-runs/re-run-workflows-and-jobs#re-running-failed-jobs-in-a-workflow

As a rule of thumb, generally, PR authors should request a review & get a PR approval from the respective companies' CODEOWNERS before requesting a review from core maintainers.

If additional help is needed, PR authors can reach out to core maintainers over Slack.

@github-actions
Copy link
Copy Markdown
Contributor

Thanks for the contribution! For vLLM & SGLang, please ensure that your recipes is similar to the official vLLM recipes and/or the SGLang cookbook

If it is not, please create a PR first before we can merge your single node PR into the master branch. Let's ensure that the documentation is first class such that the entire ML community can benefit from your hard work! Thank you

PR authors are responsible for ensuring that after merging, all GitHub Action jobs fully pass. A lot of the time, failures are just flakes and simply re-running the failed jobs will fix it. If re-running failed jobs is attempted, PR authors are responsible for ensuring it passes. See GitHub's docs on re-running failed jobs: https://docs.github.com/en/actions/how-tos/manage-workflow-runs/re-run-workflows-and-jobs#re-running-failed-jobs-in-a-workflow

As a rule of thumb, generally, PR authors should request a review & get a PR approval from the respective companies' CODEOWNERS before requesting a review from core maintainers.

If additional help is needed, PR authors can reach out to core maintainers over Slack.

@github-actions
Copy link
Copy Markdown
Contributor

Thanks for the contribution! For vLLM & SGLang, please ensure that your recipes is similar to the official vLLM recipes and/or the SGLang cookbook

If it is not, please create a PR first before we can merge your single node PR into the master branch. Let's ensure that the documentation is first class such that the entire ML community can benefit from your hard work! Thank you

PR authors are responsible for ensuring that after merging, all GitHub Action jobs fully pass. A lot of the time, failures are just flakes and simply re-running the failed jobs will fix it. If re-running failed jobs is attempted, PR authors are responsible for ensuring it passes. See GitHub's docs on re-running failed jobs: https://docs.github.com/en/actions/how-tos/manage-workflow-runs/re-run-workflows-and-jobs#re-running-failed-jobs-in-a-workflow

As a rule of thumb, generally, PR authors should request a review & get a PR approval from the respective companies' CODEOWNERS before requesting a review from core maintainers.

If additional help is needed, PR authors can reach out to core maintainers over Slack.

@github-actions
Copy link
Copy Markdown
Contributor

Copy link
Copy Markdown
Contributor

@claude claude Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM — straightforward MTP sibling recipe mirroring the existing glm5.1-fp4-mi355x-sglang pattern.

Extended reasoning...

Overview

This PR adds a new MTP/EAGLE speculative-decoding sibling of the existing glm5.1-fp4-mi355x-sglang recipe. Three files touched: a new YAML entry in .github/configs/amd-master.yaml, a new launch script benchmarks/single_node/glm5.1_fp4_mi355x_mtp.sh that mirrors the non-MTP launcher, and a perf-changelog.yaml entry. The YAML entry uses the canonical lmsysorg/sglang:v0.5.12-rocm720-mi35x stable tag with the same TP=2/TP=4 sweep as the non-MTP sibling, plus spec-decoding: mtp markers.

Security risks

None. This is benchmark recipe configuration and a shell launcher that runs inside a sandboxed benchmarking container. No auth, crypto, permissions, or external-input-handling code is touched.

Level of scrutiny

Low. This is a mechanical recipe addition following an established pattern — several other -mtp siblings already exist in the same YAML (glm5-fp8-mi355x-sglang-mtp, qwen3.5-bf16-mi355x-sglang-mtp, qwen3.5-fp8-mi355x-sglang-mtp, dsr1-fp8-mi355x-atom-mtp). The launch script is a clean mirror of glm5.1_fp4_mi355x.sh with the documented additions (SGLANG_ENABLE_SPEC_V2=1, --speculative-* flags, --use-chat-template).

Other factors

The bug hunting system found no issues. The PR description clearly explains why this supersedes prior attempts (#1091 closed, #1254 stale). The full-sweep CI check (full-sweep-enabled label) will validate runtime behavior independently. No CODEOWNER-protected paths are touched.

@chunfangamd
Copy link
Copy Markdown
Collaborator

@claude /sweep test-config --config-files .github/configs/amd-master.yaml --config-keys glm5.1-fp4-mi355x-sglang-mtp

@Klaud-Cold
Copy link
Copy Markdown
Collaborator

Klaud-Cold commented May 18, 2026

Claude finished @chunfangamd's task in 0s —— View job


I'll analyze this and get back to you.

@functionstackx functionstackx changed the title [Klaud Cold] Add glm5.1-fp4-mi355x-sglang-mtp recipe [Handoff to @Oseltamivir Claude /loop] [Klaud Cold] Add glm5.1-fp4-mi355x-sglang-mtp recipe May 18, 2026
@functionstackx
Copy link
Copy Markdown
Collaborator Author

Handing off to @Oseltamivir — added as §10 in #1511. GSM8K accuracy regression (0.18 vs 0.85 threshold) on the MTP+EAGLE variant needs a recipe-tuning judgment call rather than another automated retry.

AI-generated via Claude Code /loop.

@github-actions
Copy link
Copy Markdown
Contributor

@functionstackx
Copy link
Copy Markdown
Collaborator Author

Filed an upstream sglang bug report for this GSM8K accuracy regression: sgl-project/sglang#25742

Covers both the OFF variant (#1441, gsm8k=0.32) and the EAGLE-MTP variant (#1494, gsm8k=0.18) since they share the same root cause (GLM-5.1-MXFP4 on MI355X under v0.5.12-rocm720-mi35x is producing weak chain-of-thought output).

AI-generated via Claude Code /loop.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

Status: No status

Development

Successfully merging this pull request may close these issues.

3 participants