Yeswanth/minimax fp4 gb300 b300 dynamo vllm disagg by yeswanthk-26 · Pull Request #1560 · SemiAnalysisAI/InferenceX

yeswanthk-26 · 2026-05-23T01:15:40Z

Summary

Add config entries in .github/configs/nvidia-master.yaml for:
- minimaxm2.5-fp4-gb300-dynamo-vllm
- minimaxm2.5-fp4-b300-dynamo-vllm
Add recipe sets under:
- benchmarks/multi_node/srt-slurm-recipes/vllm/minimax-m2.5/
- benchmarks/multi_node/srt-slurm-recipes/vllm/minimax-m2.5-b300/
Wire runner logic in:
- runners/launch_gb300-nv.sh
- runners/launch_b300-nv.sh
Update perf-changelog.yaml with entries for both new configs.

Port PR57 MiniMax-M2.5 FP4 disaggregated multi-node vLLM recipes for GB300 and B300 to SA upstream, including runner wiring and changelog entries while keeping SA-default GB300 account/partition/sqsh paths.

Remove the minimax-specific SLURM account override in launch_b300-nv.sh so SA upstream runs with batch_1/benchmark defaults, per reviewer guidance.

claude · 2026-05-23T01:23:07Z

+# B300-only: full-node TP=8 decode (the 8 GPUs of a single B300 node).
+# Cousin of tp4-1p1d.yaml but exercises the wider TP that B300's per-node
+# GPU count makes available. Only the smallest concurrencies (1,4,8) —
+# this topology is decode-latency focused, not throughput.


🟡 The tp8-1p1d.yaml header comments claim 'Only the smallest concurrencies (1,4,8)' (1k1k) / 'Smallest concurrencies only (1,4,8)' (8k1k), but both files set concurrencies: "4" — only 4, not 1/4/8. The nvidia-master.yaml entries for tp8-1p1d list conc-list [4] for both ISLs, confirming the recipes are authoritative and the comments are stale. Either update the comments to say 'only conc 4' or expand the concurrencies field to actually include 1 and 8.

Extended reasoning...

What the bug is\n\nTwo new B300-only recipe files document a sweep over three concurrencies in their header comments but only run a single concurrency in the actual benchmark block.\n\nIn benchmarks/multi_node/srt-slurm-recipes/vllm/minimax-m2.5-b300/1k1k/tp8-1p1d.yaml (lines 3–6):\n\nyaml\n# B300-only: full-node TP=8 decode (the 8 GPUs of a single B300 node).\n# Cousin of tp4-1p1d.yaml but exercises the wider TP that B300's per-node\n# GPU count makes available. Only the smallest concurrencies (1,4,8) —\n# this topology is decode-latency focused, not throughput.\n\n\nBut the benchmark block at the bottom is:\n\nyaml\nbenchmark:\n type: "sa-bench"\n isl: 1024\n osl: 1024\n concurrencies: "4"\n random_range_ratio: 0.8\n\n\nThe 8k1k counterpart (benchmarks/multi_node/srt-slurm-recipes/vllm/minimax-m2.5-b300/8k1k/tp8-1p1d.yaml) has the same drift — its header says "Smallest concurrencies only (1,4,8)" while concurrencies: "4".\n\nWhich side is authoritative\n\nCross-referencing .github/configs/nvidia-master.yaml, the tp8-1p1d search-space entries for both ISLs only list conc-list: [4]:\n\nyaml\n# 1k1k\n- conc-list: [4]\n prefill: ...\n additional-settings:\n - "CONFIG_FILE=recipes/vllm/minimax-m2.5/1k1k/tp8-1p1d.yaml"\n# 8k1k\n- conc-list: [4]\n prefill: ...\n additional-settings:\n - "CONFIG_FILE=recipes/vllm/minimax-m2.5/8k1k/tp8-1p1d.yaml"\n\n\nSo the recipe concurrencies: "4" and the master conc-list: [4] agree, and the header comments are the outlier. The driver only iterates over what's in those two fields; the comment has no runtime effect.\n\nStep-by-step proof\n\n1. Open 1k1k/tp8-1p1d.yaml. Comment block (lines 3–6) says "Only the smallest concurrencies (1,4,8)".\n2. Scroll to line 75 — concurrencies: "4". The driver parses this "x"-delimited string, yielding the single value [4].\n3. .github/configs/nvidia-master.yaml has conc-list: [4] for this recipe — matches the recipe, not the comment.\n4. Same pattern repeats in 8k1k/tp8-1p1d.yaml.\n\nWhy existing code doesn't prevent it\n\nThere is no validator that checks free-form comments against the actual concurrencies field, so the drift slipped through.\n\nImpact\n\nNone at runtime — benchmarks run exactly the conc set defined in concurrencies and conc-list. The only consequence is that a future reader (or copy-paste donor) of these recipes will be misled into believing tp8-1p1d covers conc 1 and 8 in addition to 4. That's a minor doc-hygiene issue, not a correctness bug.\n\nHow to fix\n\nPick one of:\n\n- Update the header comments to # Only conc 4 — this topology is decode-latency focused, not throughput. (and the equivalent on the 8k1k file), which matches the current concurrencies field and the master conc-list, or\n- Expand the recipes to concurrencies: "1x4x8" and the master conc-list: [1, 4, 8] if covering 1 and 8 was the actual intent.

claude · 2026-05-23T01:23:07Z

+  pr-link: https://github.com/SemiAnalysisAI/InferenceX/pull/XXXX
+
+- config-keys:
+    - minimaxm2.5-fp4-gb300-dynamo-vllm
+  description:
+    - "Reactivate GB300 minimax disagg vLLM sweep with the pruned pareto-only search space from internal PR57"
+    - "Add minimax model routing and minimax srt-slurm fork path in runners/launch_gb300-nv.sh while keeping SA upstream defaults (SLURM_PARTITION=batch_1, SLURM_ACCOUNT=benchmark, SQUASH_FILE under /home/sa-shared/gharunners/squash/)"
+    - "Add 1k1k/8k1k minimax recipe set under benchmarks/multi_node/srt-slurm-recipes/vllm/minimax-m2.5/"
+  pr-link: https://github.com/SemiAnalysisAI/InferenceX/pull/XXXX


🟡 Both new perf-changelog.yaml entries (lines 3125 and 3133) ship with placeholder pr-link: https://github.com/SemiAnalysisAI/InferenceX/pull/XXXX URLs that will 404 once merged. The glm5 entry directly above (line 3115) was correctly populated with pull/1514 — these two should likewise be updated to reference this PR (#1560) before merging.

Extended reasoning...

What the bug is

perf-changelog.yaml gains two new changelog entries in this PR, one for minimaxm2.5-fp4-b300-dynamo-vllm and one for minimaxm2.5-fp4-gb300-dynamo-vllm. Both entries terminate with a literal placeholder URL:

pr-link: https://github.com/SemiAnalysisAI/InferenceX/pull/XXXX

The XXXX is meant to be replaced with the actual PR number prior to merging. Per the PR metadata, this is PR #1560, so both placeholders should be replaced with pull/1560.

How it manifests

Once merged, anyone consuming the changelog (UI surfaces that render pr-link, scripts that crawl the file to attribute regressions, humans clicking through history) will follow these URLs to https://github.com/SemiAnalysisAI/InferenceX/pull/XXXX, which is a non-numeric pull number and GitHub will return 404. The two new entries effectively lose their provenance back to the PR that introduced them.

Why existing code doesn't prevent it

The changelog is a hand-edited YAML file with no schema validation in this PR or CI that I can see — placeholders like XXXX parse as valid strings, so YAML lint passes and the entry merges cleanly. The convention is enforced by reviewer attention only, and the glm5 entry immediately above (line 3115, pull/1514) demonstrates the expected post-fix state.

Proof / step-by-step

Open perf-changelog.yaml at line 3115 — the prior glm5-fp4-gb300-dynamo-sglang entry has pr-link: https://github.com/SemiAnalysisAI/InferenceX/pull/1514, the real PR for that change.

Continue to line 3125 — the new minimaxm2.5-fp4-b300-dynamo-vllm entry has pr-link: https://github.com/SemiAnalysisAI/InferenceX/pull/XXXX.

Continue to line 3133 — the new minimaxm2.5-fp4-gb300-dynamo-vllm entry also has pr-link: https://github.com/SemiAnalysisAI/InferenceX/pull/XXXX.

The PR header reports this is PR Yeswanth/minimax fp4 gb300 b300 dynamo vllm disagg #1560.

Once merged, fetching https://github.com/SemiAnalysisAI/InferenceX/pull/XXXX will not resolve to a valid pull request (404), so the changelog ships with two dead links.

Fix

Replace both XXXX occurrences with 1560 before merging:

pr-link: https://github.com/SemiAnalysisAI/InferenceX/pull/1560

This is a documentation/metadata issue only — no runtime impact — but worth fixing so the changelog stays a reliable index back to the introducing PRs (the same way the glm5 entry just above already does).

…disagg

Removed redundant details from MiniMax-M2.5 NVFP4 B300 and GB300 descriptions in the changelog.

github-actions · 2026-05-23T04:19:53Z

see unofficial run visualizer at https://inferencex.semianalysis.com/inference?unofficialRun=26321728933
see unofficial run visualizer at https://inferencex.semianalysis.com/evaluation?unofficialRun=26321728933

github-actions · 2026-05-23T05:11:29Z

see unofficial run visualizer at https://inferencex.semianalysis.com/inference?unofficialRun=26321728933
see unofficial run visualizer at https://inferencex.semianalysis.com/evaluation?unofficialRun=26321728933

github-actions · 2026-05-23T06:46:26Z

see unofficial run visualizer at https://inferencex.semianalysis.com/inference?unofficialRun=26321728933
see unofficial run visualizer at https://inferencex.semianalysis.com/evaluation?unofficialRun=26321728933

yeswanthk-26 added 2 commits May 22, 2026 13:33

[GB300][B300][vLLM] Add MiniMax-M2.5 FP4 disagg Dynamo configs

5aa4520

Port PR57 MiniMax-M2.5 FP4 disaggregated multi-node vLLM recipes for GB300 and B300 to SA upstream, including runner wiring and changelog entries while keeping SA-default GB300 account/partition/sqsh paths.

Use SA default account for b300 minimax run

b887433

Remove the minimax-specific SLURM account override in launch_b300-nv.sh so SA upstream runs with batch_1/benchmark defaults, per reviewer guidance.

yeswanthk-26 requested a review from a team May 23, 2026 01:15

yeswanthk-26 requested review from jgangani and kedarpotdar-nv as code owners May 23, 2026 01:15

github-project-automation Bot added this to InferenceMAX Board May 23, 2026

claude Bot reviewed May 23, 2026

View reviewed changes

Ankur-singh added 2 commits May 22, 2026 19:57

Merge branch 'main' into yeswanth/minimax-fp4-gb300-b300-dynamo-vllm-…

6d14c33

…disagg

Simplify descriptions for MiniMax benchmarks

2f1c47c

Removed redundant details from MiniMax-M2.5 NVFP4 B300 and GB300 descriptions in the changelog.

Ankur-singh added the full-sweep-enabled label May 23, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Yeswanth/minimax fp4 gb300 b300 dynamo vllm disagg#1560

Yeswanth/minimax fp4 gb300 b300 dynamo vllm disagg#1560
yeswanthk-26 wants to merge 4 commits into
mainfrom
yeswanth/minimax-fp4-gb300-b300-dynamo-vllm-disagg

yeswanthk-26 commented May 23, 2026

Uh oh!

claude Bot May 23, 2026

Uh oh!

claude Bot May 23, 2026

Uh oh!

github-actions Bot commented May 23, 2026

Uh oh!

github-actions Bot commented May 23, 2026

Uh oh!

github-actions Bot commented May 23, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

yeswanthk-26 commented May 23, 2026

Summary

Uh oh!

claude Bot May 23, 2026

Choose a reason for hiding this comment

Uh oh!

claude Bot May 23, 2026

Choose a reason for hiding this comment

What the bug is

How it manifests

Why existing code doesn't prevent it

Proof / step-by-step

Fix

Uh oh!

github-actions Bot commented May 23, 2026

Uh oh!

github-actions Bot commented May 23, 2026

Uh oh!

github-actions Bot commented May 23, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants