[NV] H100 (Agg): migrate model path by Ankur-singh · Pull Request #1537 · SemiAnalysisAI/InferenceX

Ankur-singh · 2026-05-20T22:29:36Z

Summary

Launcher (runners/launch_h100-dgxc-slurm.sh agg block): switch HF_HUB_CACHE_MOUNT to /mnt/numa1/shared/models/ (audit-recommended root), export HF_HUB_CACHE to container's ~/.cache/huggingface for eval dataset downloads, and rewrite MODEL from HF id (org/name) to absolute local path under the mount. Bind-mount target aligned to host path so the rewritten MODEL resolves inside the container.
Bench scripts (4 H100 fixed-seq-len scripts): drop hf download "$MODEL" line. Contract is now staged-or-fail.
Configs (4 H100 keys): reduce search-space to a single (isl=1024, osl=1024, conc=4) point to verify model-path wiring end-to-end. To revert to full sweep after wiring is confirmed.

In scope

gptoss-fp4-h100-vllm, minimaxm2.5-fp8-h100-vllm, qwen3.5-fp8-h100-sglang, qwen3.5-fp8-h100-sglang-mtp

Out of scope

Disagg/multi-node block in the launcher
Agentic configs and agentic/*.sh bench scripts
kimik2.5-int4-h100-vllm (agentic-only)
Other clusters (B200/B300/H200/GB200/GB300/MI3xx)

…ownload - launcher: switch HF_HUB_CACHE_MOUNT to /mnt/numa1/shared/models/, export HF_HUB_CACHE to container's ~/.cache/huggingface for eval datasets, and rewrite MODEL from HF id (org/name) to absolute local path under the mount; align the bind-mount target with the host path so the rewritten MODEL resolves inside the container. - bench scripts: drop "hf download $MODEL" from the four in-scope H100 scripts (gptoss, minimaxm2.5, qwen3.5, qwen3.5-mtp); contract is now staged-or-fail. - configs: reduce the same four H100 configs to a single (isl=1024, osl=1024, conc=4) test point to verify model-path wiring end-to-end without burning a full sweep. Out of scope: disagg/multi-node block, agentic configs/bench scripts, and kimik2.5-int4-h100-vllm (agentic-only).

github-actions · 2026-05-20T22:29:43Z

Thanks for the contribution! For vLLM & SGLang, please ensure that your recipes is similar to the official vLLM recipes and/or the SGLang cookbook

If it is not, please create a PR first before we can merge your single node PR into the master branch. Let's ensure that the documentation is first class such that the entire ML community can benefit from your hard work! Thank you

PR authors are responsible for ensuring that after merging, all GitHub Action jobs fully pass. A lot of the time, failures are just flakes and simply re-running the failed jobs will fix it. If re-running failed jobs is attempted, PR authors are responsible for ensuring it passes. See GitHub's docs on re-running failed jobs: https://docs.github.com/en/actions/how-tos/manage-workflow-runs/re-run-workflows-and-jobs#re-running-failed-jobs-in-a-workflow

As a rule of thumb, generally, PR authors should request a review & get a PR approval from the respective companies' CODEOWNERS before requesting a review from core maintainers.

If additional help is needed, PR authors can reach out to core maintainers over Slack.

github-actions · 2026-05-20T22:29:44Z

Thanks for the contribution! For vLLM & SGLang, please ensure that your recipes is similar to the official vLLM recipes and/or the SGLang cookbook

If it is not, please create a PR first before we can merge your single node PR into the master branch. Let's ensure that the documentation is first class such that the entire ML community can benefit from your hard work! Thank you

PR authors are responsible for ensuring that after merging, all GitHub Action jobs fully pass. A lot of the time, failures are just flakes and simply re-running the failed jobs will fix it. If re-running failed jobs is attempted, PR authors are responsible for ensuring it passes. See GitHub's docs on re-running failed jobs: https://docs.github.com/en/actions/how-tos/manage-workflow-runs/re-run-workflows-and-jobs#re-running-failed-jobs-in-a-workflow

As a rule of thumb, generally, PR authors should request a review & get a PR approval from the respective companies' CODEOWNERS before requesting a review from core maintainers.

If additional help is needed, PR authors can reach out to core maintainers over Slack.

claude · 2026-05-20T22:36:45Z

+    - "Migrate H100 agg model path to /mnt/numa1/shared/models/ (audit-recommended root); set HF_HUB_CACHE to container's ~/.cache/huggingface for eval dataset downloads"
+    - "Rewrite MODEL in launcher from HF id (org/name) to absolute local path under HF_HUB_CACHE_MOUNT; remove hf download from bench scripts so contract is staged-or-fail"
+    - "Reduce search-space to single (isl=1024, osl=1024, conc=4) point per config to verify model-path wiring end-to-end"
+  pr-link: https://github.com/SemiAnalysisAI/InferenceX/pull/XXXX


🔴 The new perf-changelog entry at line 3063 has pr-link: https://github.com/SemiAnalysisAI/InferenceX/pull/XXXX — the literal placeholder XXXX was never replaced with the real PR number (1537). Every other recent entry in the file uses a real PR id, and per AGENTS.md this file is the audit trail linking config changes back to the PR that triggered them. Please replace XXXX with 1537 before merging.

Extended reasoning...

What the bug is. perf-changelog.yaml line 3063 reads:

pr-link: https://github.com/SemiAnalysisAI/InferenceX/pull/XXXX

XXXX is a forgotten-substitution placeholder, not a real PR id. The author clearly intended to fill in a real number — the rest of the new entry (config-keys list and three-line description) is complete and substantive — but the pr-link field was left with its template value.

Why this matters. Every other recent entry in the same file uses a real PR id, e.g.:

line 3037 → pull/1411

line 3044 → pull/1528

line 3052 → pull/1475

AGENTS.md describes perf-changelog.yaml as the chronological audit trail linking benchmark config changes back to the PR that triggered them. An unfilled XXXX permanently breaks that linkage for the four H100 agg configs touched by this change (gptoss-fp4-h100-vllm, minimaxm2.5-fp8-h100-vllm, qwen3.5-fp8-h100-sglang, qwen3.5-fp8-h100-sglang-mtp) once merged — anyone later trying to understand why those four configs were narrowed to a single (isl=1024, osl=1024, conc=4) point will hit a dead 404 link instead of this PR.

Why nothing prevents it. There is no pre-commit or CI check that validates perf-changelog.yaml pr-link URLs resolve to a real PR; it relies on author discipline. The file is structured YAML but XXXX is still a syntactically valid string, so no schema validator will flag it either.

Impact. Once merged, the link https://github.com/SemiAnalysisAI/InferenceX/pull/XXXX will 404 forever (GitHub does not have a PR numbered XXXX). The audit trail for these four H100 configs is severed. It is also non-trivial to fix post-merge because (a) perf-changelog.yaml is append-only by convention, and (b) reviewers in 6 months won't know to look for XXXX specifically.

Step-by-step proof.

Look at the diff hunk for perf-changelog.yaml: the new entry ends at line 3063 with pr-link: https://github.com/SemiAnalysisAI/InferenceX/pull/XXXX.

The PR number is 1537 (from the PR metadata at the top of this review).

Visit https://github.com/SemiAnalysisAI/InferenceX/pull/XXXX — GitHub will return 404 because no PR is numbered XXXX.

Compare with line 3052: pr-link: https://github.com/SemiAnalysisAI/InferenceX/pull/1475 — that one resolves to a real PR. So the convention in this file is clearly to use a real PR id.

How to fix. One-character (well, four-character) substitution at perf-changelog.yaml:3063:

- pr-link: https://github.com/SemiAnalysisAI/InferenceX/pull/XXXX + pr-link: https://github.com/SemiAnalysisAI/InferenceX/pull/1537

github-actions · 2026-05-20T22:49:32Z

see unofficial run visualizer at https://inferencex.semianalysis.com/inference?unofficialRun=26193777469
see unofficial run visualizer at https://inferencex.semianalysis.com/evaluation?unofficialRun=26193777469

github-actions · 2026-05-20T22:55:39Z

see unofficial run visualizer at https://inferencex.semianalysis.com/inference?unofficialRun=26193777469
see unofficial run visualizer at https://inferencex.semianalysis.com/evaluation?unofficialRun=26193777469

Ankur-singh requested a review from a team May 20, 2026 22:29

Ankur-singh requested review from jgangani and kedarpotdar-nv as code owners May 20, 2026 22:29

github-project-automation Bot added this to InferenceMAX Board May 20, 2026

perf-changelog: set PR link to #1537

3ab2f5f

Ankur-singh changed the title ~~h100(agg): migrate model path to /mnt/numa1/shared/models/, drop hf download~~ [NV] H100 (Agg): migrate model path May 20, 2026

Ankur-singh added the sweep-enabled label May 20, 2026

claude Bot reviewed May 20, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[NV] H100 (Agg): migrate model path#1537

[NV] H100 (Agg): migrate model path#1537
Ankur-singh wants to merge 2 commits into
mainfrom
fix-h100-model-path

Ankur-singh commented May 20, 2026

Uh oh!

github-actions Bot commented May 20, 2026

Uh oh!

github-actions Bot commented May 20, 2026

Uh oh!

claude Bot May 20, 2026

Uh oh!

github-actions Bot commented May 20, 2026

Uh oh!

github-actions Bot commented May 20, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

Ankur-singh commented May 20, 2026

Summary

In scope

Out of scope

Uh oh!

github-actions Bot commented May 20, 2026

Uh oh!

github-actions Bot commented May 20, 2026

Uh oh!

claude Bot May 20, 2026

Choose a reason for hiding this comment

Uh oh!

github-actions Bot commented May 20, 2026

Uh oh!

github-actions Bot commented May 20, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant