[NV] H100 (Agg): migrate model path#1537
Conversation
…ownload - launcher: switch HF_HUB_CACHE_MOUNT to /mnt/numa1/shared/models/, export HF_HUB_CACHE to container's ~/.cache/huggingface for eval datasets, and rewrite MODEL from HF id (org/name) to absolute local path under the mount; align the bind-mount target with the host path so the rewritten MODEL resolves inside the container. - bench scripts: drop "hf download $MODEL" from the four in-scope H100 scripts (gptoss, minimaxm2.5, qwen3.5, qwen3.5-mtp); contract is now staged-or-fail. - configs: reduce the same four H100 configs to a single (isl=1024, osl=1024, conc=4) test point to verify model-path wiring end-to-end without burning a full sweep. Out of scope: disagg/multi-node block, agentic configs/bench scripts, and kimik2.5-int4-h100-vllm (agentic-only).
|
Thanks for the contribution! For vLLM & SGLang, please ensure that your recipes is similar to the official vLLM recipes and/or the SGLang cookbook If it is not, please create a PR first before we can merge your single node PR into the master branch. Let's ensure that the documentation is first class such that the entire ML community can benefit from your hard work! Thank you PR authors are responsible for ensuring that after merging, all GitHub Action jobs fully pass. A lot of the time, failures are just flakes and simply re-running the failed jobs will fix it. If re-running failed jobs is attempted, PR authors are responsible for ensuring it passes. See GitHub's docs on re-running failed jobs: https://docs.github.com/en/actions/how-tos/manage-workflow-runs/re-run-workflows-and-jobs#re-running-failed-jobs-in-a-workflow As a rule of thumb, generally, PR authors should request a review & get a PR approval from the respective companies' CODEOWNERS before requesting a review from core maintainers. If additional help is needed, PR authors can reach out to core maintainers over Slack. |
1 similar comment
|
Thanks for the contribution! For vLLM & SGLang, please ensure that your recipes is similar to the official vLLM recipes and/or the SGLang cookbook If it is not, please create a PR first before we can merge your single node PR into the master branch. Let's ensure that the documentation is first class such that the entire ML community can benefit from your hard work! Thank you PR authors are responsible for ensuring that after merging, all GitHub Action jobs fully pass. A lot of the time, failures are just flakes and simply re-running the failed jobs will fix it. If re-running failed jobs is attempted, PR authors are responsible for ensuring it passes. See GitHub's docs on re-running failed jobs: https://docs.github.com/en/actions/how-tos/manage-workflow-runs/re-run-workflows-and-jobs#re-running-failed-jobs-in-a-workflow As a rule of thumb, generally, PR authors should request a review & get a PR approval from the respective companies' CODEOWNERS before requesting a review from core maintainers. If additional help is needed, PR authors can reach out to core maintainers over Slack. |
| - "Migrate H100 agg model path to /mnt/numa1/shared/models/ (audit-recommended root); set HF_HUB_CACHE to container's ~/.cache/huggingface for eval dataset downloads" | ||
| - "Rewrite MODEL in launcher from HF id (org/name) to absolute local path under HF_HUB_CACHE_MOUNT; remove hf download from bench scripts so contract is staged-or-fail" | ||
| - "Reduce search-space to single (isl=1024, osl=1024, conc=4) point per config to verify model-path wiring end-to-end" | ||
| pr-link: https://github.com/SemiAnalysisAI/InferenceX/pull/XXXX |
There was a problem hiding this comment.
🔴 The new perf-changelog entry at line 3063 has pr-link: https://github.com/SemiAnalysisAI/InferenceX/pull/XXXX — the literal placeholder XXXX was never replaced with the real PR number (1537). Every other recent entry in the file uses a real PR id, and per AGENTS.md this file is the audit trail linking config changes back to the PR that triggered them. Please replace XXXX with 1537 before merging.
Extended reasoning...
What the bug is. perf-changelog.yaml line 3063 reads:
pr-link: https://github.com/SemiAnalysisAI/InferenceX/pull/XXXXXXXX is a forgotten-substitution placeholder, not a real PR id. The author clearly intended to fill in a real number — the rest of the new entry (config-keys list and three-line description) is complete and substantive — but the pr-link field was left with its template value.
Why this matters. Every other recent entry in the same file uses a real PR id, e.g.:
- line 3037 →
pull/1411 - line 3044 →
pull/1528 - line 3052 →
pull/1475
AGENTS.md describes perf-changelog.yaml as the chronological audit trail linking benchmark config changes back to the PR that triggered them. An unfilled XXXX permanently breaks that linkage for the four H100 agg configs touched by this change (gptoss-fp4-h100-vllm, minimaxm2.5-fp8-h100-vllm, qwen3.5-fp8-h100-sglang, qwen3.5-fp8-h100-sglang-mtp) once merged — anyone later trying to understand why those four configs were narrowed to a single (isl=1024, osl=1024, conc=4) point will hit a dead 404 link instead of this PR.
Why nothing prevents it. There is no pre-commit or CI check that validates perf-changelog.yaml pr-link URLs resolve to a real PR; it relies on author discipline. The file is structured YAML but XXXX is still a syntactically valid string, so no schema validator will flag it either.
Impact. Once merged, the link https://github.com/SemiAnalysisAI/InferenceX/pull/XXXX will 404 forever (GitHub does not have a PR numbered XXXX). The audit trail for these four H100 configs is severed. It is also non-trivial to fix post-merge because (a) perf-changelog.yaml is append-only by convention, and (b) reviewers in 6 months won't know to look for XXXX specifically.
Step-by-step proof.
- Look at the diff hunk for
perf-changelog.yaml: the new entry ends at line 3063 withpr-link: https://github.com/SemiAnalysisAI/InferenceX/pull/XXXX. - The PR number is 1537 (from the PR metadata at the top of this review).
- Visit
https://github.com/SemiAnalysisAI/InferenceX/pull/XXXX— GitHub will return 404 because no PR is numberedXXXX. - Compare with line 3052:
pr-link: https://github.com/SemiAnalysisAI/InferenceX/pull/1475— that one resolves to a real PR. So the convention in this file is clearly to use a real PR id.
How to fix. One-character (well, four-character) substitution at perf-changelog.yaml:3063:
- pr-link: https://github.com/SemiAnalysisAI/InferenceX/pull/XXXX
+ pr-link: https://github.com/SemiAnalysisAI/InferenceX/pull/1537|
see unofficial run visualizer at https://inferencex.semianalysis.com/inference?unofficialRun=26193777469 |
1 similar comment
|
see unofficial run visualizer at https://inferencex.semianalysis.com/inference?unofficialRun=26193777469 |
Summary
runners/launch_h100-dgxc-slurm.shagg block): switchHF_HUB_CACHE_MOUNTto/mnt/numa1/shared/models/(audit-recommended root), exportHF_HUB_CACHEto container's~/.cache/huggingfacefor eval dataset downloads, and rewriteMODELfrom HF id (org/name) to absolute local path under the mount. Bind-mount target aligned to host path so the rewrittenMODELresolves inside the container.hf download "$MODEL"line. Contract is now staged-or-fail.(isl=1024, osl=1024, conc=4)point to verify model-path wiring end-to-end. To revert to full sweep after wiring is confirmed.In scope
gptoss-fp4-h100-vllm,minimaxm2.5-fp8-h100-vllm,qwen3.5-fp8-h100-sglang,qwen3.5-fp8-h100-sglang-mtpOut of scope
agentic/*.shbench scriptskimik2.5-int4-h100-vllm(agentic-only)