Skip to content

Yeswanth/minimax fp4 gb300 b300 dynamo vllm disagg#1560

Open
yeswanthk-26 wants to merge 4 commits into
mainfrom
yeswanth/minimax-fp4-gb300-b300-dynamo-vllm-disagg
Open

Yeswanth/minimax fp4 gb300 b300 dynamo vllm disagg#1560
yeswanthk-26 wants to merge 4 commits into
mainfrom
yeswanth/minimax-fp4-gb300-b300-dynamo-vllm-disagg

Conversation

@yeswanthk-26
Copy link
Copy Markdown
Collaborator

Summary

  • Add config entries in .github/configs/nvidia-master.yaml for:
    • minimaxm2.5-fp4-gb300-dynamo-vllm
    • minimaxm2.5-fp4-b300-dynamo-vllm
  • Add recipe sets under:
    • benchmarks/multi_node/srt-slurm-recipes/vllm/minimax-m2.5/
    • benchmarks/multi_node/srt-slurm-recipes/vllm/minimax-m2.5-b300/
  • Wire runner logic in:
    • runners/launch_gb300-nv.sh
    • runners/launch_b300-nv.sh
  • Update perf-changelog.yaml with entries for both new configs.

Port PR57 MiniMax-M2.5 FP4 disaggregated multi-node vLLM recipes for GB300 and B300 to SA upstream, including runner wiring and changelog entries while keeping SA-default GB300 account/partition/sqsh paths.
Remove the minimax-specific SLURM account override in launch_b300-nv.sh so SA upstream runs with batch_1/benchmark defaults, per reviewer guidance.
Comment on lines +3 to +6
# B300-only: full-node TP=8 decode (the 8 GPUs of a single B300 node).
# Cousin of tp4-1p1d.yaml but exercises the wider TP that B300's per-node
# GPU count makes available. Only the smallest concurrencies (1,4,8) —
# this topology is decode-latency focused, not throughput.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 The tp8-1p1d.yaml header comments claim 'Only the smallest concurrencies (1,4,8)' (1k1k) / 'Smallest concurrencies only (1,4,8)' (8k1k), but both files set concurrencies: "4" — only 4, not 1/4/8. The nvidia-master.yaml entries for tp8-1p1d list conc-list [4] for both ISLs, confirming the recipes are authoritative and the comments are stale. Either update the comments to say 'only conc 4' or expand the concurrencies field to actually include 1 and 8.

Extended reasoning...

What the bug is\n\nTwo new B300-only recipe files document a sweep over three concurrencies in their header comments but only run a single concurrency in the actual benchmark block.\n\nIn benchmarks/multi_node/srt-slurm-recipes/vllm/minimax-m2.5-b300/1k1k/tp8-1p1d.yaml (lines 3–6):\n\nyaml\n# B300-only: full-node TP=8 decode (the 8 GPUs of a single B300 node).\n# Cousin of tp4-1p1d.yaml but exercises the wider TP that B300's per-node\n# GPU count makes available. Only the smallest concurrencies (1,4,8) —\n# this topology is decode-latency focused, not throughput.\n\n\nBut the benchmark block at the bottom is:\n\nyaml\nbenchmark:\n type: "sa-bench"\n isl: 1024\n osl: 1024\n concurrencies: "4"\n random_range_ratio: 0.8\n\n\nThe 8k1k counterpart (benchmarks/multi_node/srt-slurm-recipes/vllm/minimax-m2.5-b300/8k1k/tp8-1p1d.yaml) has the same drift — its header says "Smallest concurrencies only (1,4,8)" while concurrencies: "4".\n\nWhich side is authoritative\n\nCross-referencing .github/configs/nvidia-master.yaml, the tp8-1p1d search-space entries for both ISLs only list conc-list: [4]:\n\nyaml\n# 1k1k\n- conc-list: [4]\n prefill: ...\n additional-settings:\n - "CONFIG_FILE=recipes/vllm/minimax-m2.5/1k1k/tp8-1p1d.yaml"\n# 8k1k\n- conc-list: [4]\n prefill: ...\n additional-settings:\n - "CONFIG_FILE=recipes/vllm/minimax-m2.5/8k1k/tp8-1p1d.yaml"\n\n\nSo the recipe concurrencies: "4" and the master conc-list: [4] agree, and the header comments are the outlier. The driver only iterates over what's in those two fields; the comment has no runtime effect.\n\nStep-by-step proof\n\n1. Open 1k1k/tp8-1p1d.yaml. Comment block (lines 3–6) says "Only the smallest concurrencies (1,4,8)".\n2. Scroll to line 75 — concurrencies: "4". The driver parses this "x"-delimited string, yielding the single value [4].\n3. .github/configs/nvidia-master.yaml has conc-list: [4] for this recipe — matches the recipe, not the comment.\n4. Same pattern repeats in 8k1k/tp8-1p1d.yaml.\n\nWhy existing code doesn't prevent it\n\nThere is no validator that checks free-form comments against the actual concurrencies field, so the drift slipped through.\n\nImpact\n\nNone at runtime — benchmarks run exactly the conc set defined in concurrencies and conc-list. The only consequence is that a future reader (or copy-paste donor) of these recipes will be misled into believing tp8-1p1d covers conc 1 and 8 in addition to 4. That's a minor doc-hygiene issue, not a correctness bug.\n\nHow to fix\n\nPick one of:\n\n- Update the header comments to # Only conc 4 — this topology is decode-latency focused, not throughput. (and the equivalent on the 8k1k file), which matches the current concurrencies field and the master conc-list, or\n- Expand the recipes to concurrencies: "1x4x8" and the master conc-list: [1, 4, 8] if covering 1 and 8 was the actual intent.

Comment thread perf-changelog.yaml Outdated
Comment on lines +3125 to +3133
pr-link: https://github.com/SemiAnalysisAI/InferenceX/pull/XXXX

- config-keys:
- minimaxm2.5-fp4-gb300-dynamo-vllm
description:
- "Reactivate GB300 minimax disagg vLLM sweep with the pruned pareto-only search space from internal PR57"
- "Add minimax model routing and minimax srt-slurm fork path in runners/launch_gb300-nv.sh while keeping SA upstream defaults (SLURM_PARTITION=batch_1, SLURM_ACCOUNT=benchmark, SQUASH_FILE under /home/sa-shared/gharunners/squash/)"
- "Add 1k1k/8k1k minimax recipe set under benchmarks/multi_node/srt-slurm-recipes/vllm/minimax-m2.5/"
pr-link: https://github.com/SemiAnalysisAI/InferenceX/pull/XXXX
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 Both new perf-changelog.yaml entries (lines 3125 and 3133) ship with placeholder pr-link: https://github.com/SemiAnalysisAI/InferenceX/pull/XXXX URLs that will 404 once merged. The glm5 entry directly above (line 3115) was correctly populated with pull/1514 — these two should likewise be updated to reference this PR (#1560) before merging.

Extended reasoning...

What the bug is

perf-changelog.yaml gains two new changelog entries in this PR, one for minimaxm2.5-fp4-b300-dynamo-vllm and one for minimaxm2.5-fp4-gb300-dynamo-vllm. Both entries terminate with a literal placeholder URL:

  pr-link: https://github.com/SemiAnalysisAI/InferenceX/pull/XXXX

The XXXX is meant to be replaced with the actual PR number prior to merging. Per the PR metadata, this is PR #1560, so both placeholders should be replaced with pull/1560.

How it manifests

Once merged, anyone consuming the changelog (UI surfaces that render pr-link, scripts that crawl the file to attribute regressions, humans clicking through history) will follow these URLs to https://github.com/SemiAnalysisAI/InferenceX/pull/XXXX, which is a non-numeric pull number and GitHub will return 404. The two new entries effectively lose their provenance back to the PR that introduced them.

Why existing code doesn't prevent it

The changelog is a hand-edited YAML file with no schema validation in this PR or CI that I can see — placeholders like XXXX parse as valid strings, so YAML lint passes and the entry merges cleanly. The convention is enforced by reviewer attention only, and the glm5 entry immediately above (line 3115, pull/1514) demonstrates the expected post-fix state.

Proof / step-by-step

  1. Open perf-changelog.yaml at line 3115 — the prior glm5-fp4-gb300-dynamo-sglang entry has pr-link: https://github.com/SemiAnalysisAI/InferenceX/pull/1514, the real PR for that change.
  2. Continue to line 3125 — the new minimaxm2.5-fp4-b300-dynamo-vllm entry has pr-link: https://github.com/SemiAnalysisAI/InferenceX/pull/XXXX.
  3. Continue to line 3133 — the new minimaxm2.5-fp4-gb300-dynamo-vllm entry also has pr-link: https://github.com/SemiAnalysisAI/InferenceX/pull/XXXX.
  4. The PR header reports this is PR Yeswanth/minimax fp4 gb300 b300 dynamo vllm disagg #1560.
  5. Once merged, fetching https://github.com/SemiAnalysisAI/InferenceX/pull/XXXX will not resolve to a valid pull request (404), so the changelog ships with two dead links.

Fix

Replace both XXXX occurrences with 1560 before merging:

  pr-link: https://github.com/SemiAnalysisAI/InferenceX/pull/1560

This is a documentation/metadata issue only — no runtime impact — but worth fixing so the changelog stays a reliable index back to the introducing PRs (the same way the glm5 entry just above already does).

Removed redundant details from MiniMax-M2.5 NVFP4 B300 and GB300 descriptions in the changelog.
@github-actions
Copy link
Copy Markdown
Contributor

2 similar comments
@github-actions
Copy link
Copy Markdown
Contributor

@github-actions
Copy link
Copy Markdown
Contributor

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

Status: No status

Development

Successfully merging this pull request may close these issues.

2 participants