Skip to content

[Klaud Cold] minimaxm2.5-fp8-mi325x: add SHUFFLE_KV_CACHE_LAYOUT=1 + ROCM_AITER_FA backend#1549

Open
functionstackx wants to merge 2 commits into
mainfrom
klaud-cold/minimaxm2.5-fp8-mi325x-aiter-fa-shuffle-kv
Open

[Klaud Cold] minimaxm2.5-fp8-mi325x: add SHUFFLE_KV_CACHE_LAYOUT=1 + ROCM_AITER_FA backend#1549
functionstackx wants to merge 2 commits into
mainfrom
klaud-cold/minimaxm2.5-fp8-mi325x-aiter-fa-shuffle-kv

Conversation

@functionstackx
Copy link
Copy Markdown
Collaborator

Summary

  • Adopts the AITER recipe pattern (shuffle KV-cache layout + AITER FA attention backend) used by the mi355x sibling.
  • Matches the AMD-recommended recipe for MiniMax-M2.x on ROCm vLLM.

Changes

  • benchmarks/single_node/minimaxm2.5_fp8_mi325x.sh
    • + export VLLM_ROCM_SHUFFLE_KV_CACHE_LAYOUT=1 (next to existing VLLM_ROCM_USE_AITER=1)
    • + --attention-backend "ROCM_AITER_FA" on the vllm serve line
  • perf-changelog.yaml entry

Test plan

  • full-sweep-enabled sweep finishes green on mi325x.

🤖 Generated with Claude Code

…AYOUT=1 + --attention-backend ROCM_AITER_FA

Adopts the AITER recipe pattern (shuffle KV-cache layout + AITER FA
attention backend) used by the mi355x sibling. Matches AMD's recommended
recipe for MiniMax-M2.x on ROCm vLLM.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@github-actions
Copy link
Copy Markdown
Contributor

Thanks for the contribution! For vLLM & SGLang, please ensure that your recipes is similar to the official vLLM recipes and/or the SGLang cookbook

If it is not, please create a PR first before we can merge your single node PR into the master branch. Let's ensure that the documentation is first class such that the entire ML community can benefit from your hard work! Thank you

PR authors are responsible for ensuring that after merging, all GitHub Action jobs fully pass. A lot of the time, failures are just flakes and simply re-running the failed jobs will fix it. If re-running failed jobs is attempted, PR authors are responsible for ensuring it passes. See GitHub's docs on re-running failed jobs: https://docs.github.com/en/actions/how-tos/manage-workflow-runs/re-run-workflows-and-jobs#re-running-failed-jobs-in-a-workflow

As a rule of thumb, generally, PR authors should request a review & get a PR approval from the respective companies' CODEOWNERS before requesting a review from core maintainers.

If additional help is needed, PR authors can reach out to core maintainers over Slack.

Comment thread perf-changelog.yaml
- minimaxm2.5-fp8-mi325x-vllm
description:
- "Add VLLM_ROCM_SHUFFLE_KV_CACHE_LAYOUT=1 + --attention-backend ROCM_AITER_FA (match AMD-recommended AITER recipe pattern used on mi355x)"
pr-link: https://github.com/SemiAnalysisAI/InferenceX/pull/1549
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 The new perf-changelog.yaml entry at line 3076 has pr-link: https://github.com/SemiAnalysisAI/InferenceX/pull/XXX — the literal placeholder XXX was never replaced with 1549. This 404s and breaks any tooling that parses pr-link → merged PR id. Trivial fix: replace XXX with 1549.

Extended reasoning...

What the bug is

The newly added entry in perf-changelog.yaml (lines 3071-3076) for this PR uses the literal placeholder string XXX in its pr-link field rather than the actual PR number 1549:

- config-keys:
    - minimaxm2.5-fp8-mi325x-vllm
  description:
    - "Add VLLM_ROCM_SHUFFLE_KV_CACHE_LAYOUT=1 + --attention-backend ROCM_AITER_FA (match AMD-recommended AITER recipe pattern used on mi355x)"
  pr-link: https://github.com/SemiAnalysisAI/InferenceX/pull/XXX

Note on apparent contradiction with the PR diff: The PR diff rendered above shows pull/1549 in the added line, but that is a substitution performed by the PR-rendering tooling — the actual on-disk / committed content is pull/XXX. This was verified two ways: (1) Read perf-changelog.yaml returns pull/XXX on line 3076, and (2) git show c813cb4 -- perf-changelog.yaml shows the committed hunk contains pull/XXX. The PR description itself correctly references #1549, so the author clearly intended 1549 and simply forgot to substitute the placeholder before committing.

Why the convention is clear

Every other entry in perf-changelog.yaml uses a real numeric PR id — pull/1548, pull/1546, pull/1521, pull/1411, etc. There is no precedent for XXX as an intentional sentinel anywhere in the file. This is unambiguously an unsubstituted template, not project convention.

Impact

  1. The changelog navigation link https://github.com/SemiAnalysisAI/InferenceX/pull/XXX 404s — anyone clicking the link in the changelog gets a broken page.
  2. Any tooling that parses pr-link to associate config-keys with merged PR numbers (e.g., for generating release notes, dashboards, or change-attribution reports) will either fail or produce garbage on this entry.

Step-by-step proof

  1. Open perf-changelog.yaml and jump to line 3076. The line reads exactly: pr-link: https://github.com/SemiAnalysisAI/InferenceX/pull/XXX.
  2. Run git show c813cb4 -- perf-changelog.yaml | tail. The added hunk includes + pr-link: https://github.com/SemiAnalysisAI/InferenceX/pull/XXX — confirming the placeholder is in the committed blob, not a working-tree edit.
  3. Try the URL https://github.com/SemiAnalysisAI/InferenceX/pull/XXX — it 404s (XXX is not a valid PR number).
  4. Compare with the prior entry at line 3070: pr-link: https://github.com/SemiAnalysisAI/InferenceX/pull/1548 — numeric, valid, matches the file-wide convention.

Fix

One-character (well, three-character) edit on perf-changelog.yaml:3076:

-  pr-link: https://github.com/SemiAnalysisAI/InferenceX/pull/XXX
+  pr-link: https://github.com/SemiAnalysisAI/InferenceX/pull/1549

Severity is nit — the recipe change itself (the .sh file) is fine; this is purely changelog hygiene. Worth flagging because the broken link is permanent in git history if not fixed before merge.

@github-actions
Copy link
Copy Markdown
Contributor

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

Status: No status

Development

Successfully merging this pull request may close these issues.

1 participant