[Klaud Cold] minimaxm2.5-fp8-mi325x: add SHUFFLE_KV_CACHE_LAYOUT=1 + ROCM_AITER_FA backend#1549
Conversation
…AYOUT=1 + --attention-backend ROCM_AITER_FA Adopts the AITER recipe pattern (shuffle KV-cache layout + AITER FA attention backend) used by the mi355x sibling. Matches AMD's recommended recipe for MiniMax-M2.x on ROCm vLLM. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
Thanks for the contribution! For vLLM & SGLang, please ensure that your recipes is similar to the official vLLM recipes and/or the SGLang cookbook If it is not, please create a PR first before we can merge your single node PR into the master branch. Let's ensure that the documentation is first class such that the entire ML community can benefit from your hard work! Thank you PR authors are responsible for ensuring that after merging, all GitHub Action jobs fully pass. A lot of the time, failures are just flakes and simply re-running the failed jobs will fix it. If re-running failed jobs is attempted, PR authors are responsible for ensuring it passes. See GitHub's docs on re-running failed jobs: https://docs.github.com/en/actions/how-tos/manage-workflow-runs/re-run-workflows-and-jobs#re-running-failed-jobs-in-a-workflow As a rule of thumb, generally, PR authors should request a review & get a PR approval from the respective companies' CODEOWNERS before requesting a review from core maintainers. If additional help is needed, PR authors can reach out to core maintainers over Slack. |
| - minimaxm2.5-fp8-mi325x-vllm | ||
| description: | ||
| - "Add VLLM_ROCM_SHUFFLE_KV_CACHE_LAYOUT=1 + --attention-backend ROCM_AITER_FA (match AMD-recommended AITER recipe pattern used on mi355x)" | ||
| pr-link: https://github.com/SemiAnalysisAI/InferenceX/pull/1549 |
There was a problem hiding this comment.
🟡 The new perf-changelog.yaml entry at line 3076 has pr-link: https://github.com/SemiAnalysisAI/InferenceX/pull/XXX — the literal placeholder XXX was never replaced with 1549. This 404s and breaks any tooling that parses pr-link → merged PR id. Trivial fix: replace XXX with 1549.
Extended reasoning...
What the bug is
The newly added entry in perf-changelog.yaml (lines 3071-3076) for this PR uses the literal placeholder string XXX in its pr-link field rather than the actual PR number 1549:
- config-keys:
- minimaxm2.5-fp8-mi325x-vllm
description:
- "Add VLLM_ROCM_SHUFFLE_KV_CACHE_LAYOUT=1 + --attention-backend ROCM_AITER_FA (match AMD-recommended AITER recipe pattern used on mi355x)"
pr-link: https://github.com/SemiAnalysisAI/InferenceX/pull/XXXNote on apparent contradiction with the PR diff: The PR diff rendered above shows pull/1549 in the added line, but that is a substitution performed by the PR-rendering tooling — the actual on-disk / committed content is pull/XXX. This was verified two ways: (1) Read perf-changelog.yaml returns pull/XXX on line 3076, and (2) git show c813cb4 -- perf-changelog.yaml shows the committed hunk contains pull/XXX. The PR description itself correctly references #1549, so the author clearly intended 1549 and simply forgot to substitute the placeholder before committing.
Why the convention is clear
Every other entry in perf-changelog.yaml uses a real numeric PR id — pull/1548, pull/1546, pull/1521, pull/1411, etc. There is no precedent for XXX as an intentional sentinel anywhere in the file. This is unambiguously an unsubstituted template, not project convention.
Impact
- The changelog navigation link
https://github.com/SemiAnalysisAI/InferenceX/pull/XXX404s — anyone clicking the link in the changelog gets a broken page. - Any tooling that parses
pr-linkto associateconfig-keyswith merged PR numbers (e.g., for generating release notes, dashboards, or change-attribution reports) will either fail or produce garbage on this entry.
Step-by-step proof
- Open
perf-changelog.yamland jump to line 3076. The line reads exactly:pr-link: https://github.com/SemiAnalysisAI/InferenceX/pull/XXX. - Run
git show c813cb4 -- perf-changelog.yaml | tail. The added hunk includes+ pr-link: https://github.com/SemiAnalysisAI/InferenceX/pull/XXX— confirming the placeholder is in the committed blob, not a working-tree edit. - Try the URL
https://github.com/SemiAnalysisAI/InferenceX/pull/XXX— it 404s (XXXis not a valid PR number). - Compare with the prior entry at line 3070:
pr-link: https://github.com/SemiAnalysisAI/InferenceX/pull/1548— numeric, valid, matches the file-wide convention.
Fix
One-character (well, three-character) edit on perf-changelog.yaml:3076:
- pr-link: https://github.com/SemiAnalysisAI/InferenceX/pull/XXX
+ pr-link: https://github.com/SemiAnalysisAI/InferenceX/pull/1549Severity is nit — the recipe change itself (the .sh file) is fine; this is purely changelog hygiene. Worth flagging because the broken link is permanent in git history if not fixed before merge.
|
see unofficial run visualizer at https://inferencex.semianalysis.com/inference?unofficialRun=26242924801 |
Summary
Changes
benchmarks/single_node/minimaxm2.5_fp8_mi325x.sh+ export VLLM_ROCM_SHUFFLE_KV_CACHE_LAYOUT=1(next to existingVLLM_ROCM_USE_AITER=1)+ --attention-backend "ROCM_AITER_FA"on thevllm servelineperf-changelog.yamlentryTest plan
🤖 Generated with Claude Code