[Handoff to @Oseltamivir Claude /loop] [Klaud Cold] Update dsv4-fp4-b300-sglang (+mtp) SGLang image to v0.5.12-cu130#1455
Conversation
|
Thanks for the contribution! For vLLM & SGLang, please ensure that your recipes is similar to the official vLLM recipes and/or the SGLang cookbook If it is not, please create a PR first before we can merge your single node PR into the master branch. Let's ensure that the documentation is first class such that the entire ML community can benefit from your hard work! Thank you PR authors are responsible for ensuring that after merging, all GitHub Action jobs fully pass. A lot of the time, failures are just flakes and simply re-running the failed jobs will fix it. If re-running failed jobs is attempted, PR authors are responsible for ensuring it passes. See GitHub's docs on re-running failed jobs: https://docs.github.com/en/actions/how-tos/manage-workflow-runs/re-run-workflows-and-jobs#re-running-failed-jobs-in-a-workflow As a rule of thumb, generally, PR authors should request a review & get a PR approval from the respective companies' CODEOWNERS before requesting a review from core maintainers. If additional help is needed, PR authors can reach out to core maintainers over Slack. |
1 similar comment
|
Thanks for the contribution! For vLLM & SGLang, please ensure that your recipes is similar to the official vLLM recipes and/or the SGLang cookbook If it is not, please create a PR first before we can merge your single node PR into the master branch. Let's ensure that the documentation is first class such that the entire ML community can benefit from your hard work! Thank you PR authors are responsible for ensuring that after merging, all GitHub Action jobs fully pass. A lot of the time, failures are just flakes and simply re-running the failed jobs will fix it. If re-running failed jobs is attempted, PR authors are responsible for ensuring it passes. See GitHub's docs on re-running failed jobs: https://docs.github.com/en/actions/how-tos/manage-workflow-runs/re-run-workflows-and-jobs#re-running-failed-jobs-in-a-workflow As a rule of thumb, generally, PR authors should request a review & get a PR approval from the respective companies' CODEOWNERS before requesting a review from core maintainers. If additional help is needed, PR authors can reach out to core maintainers over Slack. |
|
see unofficial run visualizer at https://inferencex.semianalysis.com/inference?unofficialRun=26006190734 |
|
Closing as not viable — same DSV4 transformers issue as #1460/#1450: the generic Keep DSV4 b300 pinned to the SHA-pinned custom image for now. Will reopen when upstream catches up. |
|
Reopening — leaving sweep labels off so it doesn't auto-trigger while you debug + patch the recipe manually. |
|
see unofficial run visualizer at https://inferencex.semianalysis.com/inference?unofficialRun=26006192244 |
| # Parallelisms and concurrency ranges mirror dsv4-fp4-b200-vllm. | ||
| dsv4-fp4-b300-sglang: | ||
| image: lmsysorg/sglang:deepseek-v4-b300@sha256:2fec8d7958bb0d53b50d7bf04d6ae6a7de8a35503775826e0550a45dd8c3ee15 | ||
| image: lmsysorg/sglang:v0.5.12-cu130 |
There was a problem hiding this comment.
🔴 Bumping dsv4-fp4-b300-sglang (line 1986) and dsv4-fp4-b300-sglang-mtp (line 2027) from the SHA-pinned lmsysorg/sglang:deepseek-v4-b300@sha256:... custom image to the generic lmsysorg/sglang:v0.5.12-cu130 strips the patched transformers that registers model_type: "deepseek_v4", so AutoTokenizer.from_pretrained("deepseek-ai/DeepSeek-V4-Pro") will crash with KeyError: '\''deepseek_v4'\'' before the server is even probed. Either hold this change until upstream sglang ships transformers with deepseek_v4 support, or have the recipe pip install a patched transformers inside the container before invoking the bench client. The PR author has already acknowledged this in the timeline.
Extended reasoning...
What the bug is
Both modified entries (dsv4-fp4-b300-sglang at line 1986 and dsv4-fp4-b300-sglang-mtp at line 2027) swap out a SHA-pinned custom image for the generic lmsysorg/sglang:v0.5.12-cu130 image. The custom deepseek-v4-b300@sha256:... builds bundle a patched transformers that registers a model type for deepseek_v4 (the config.json of deepseek-ai/DeepSeek-V4-Pro declares model_type: "deepseek_v4"). The generic v0.5.12-cu130 image bundles the upstream transformers release, which has no deepseek_v4 entry in its model-type registry.
Code path that triggers the failure
- Sweep dispatcher launches a container with
image: lmsysorg/sglang:v0.5.12-cu130. - Bench client runs and calls
AutoTokenizer.from_pretrained("deepseek-ai/DeepSeek-V4-Pro"). transformersdownloads the model repo'sconfig.json, readsmodel_type: "deepseek_v4", then attempts to look it up inCONFIG_MAPPING.- Upstream
transformersinv0.5.12-cu130does not havedeepseek_v4registered →KeyError: 'deepseek_v4'is raised before the SGLang server is ever probed.
Why existing code doesn't prevent it
The recipe scripts for these two configs only change the image tag; nothing in the recipe pipes in a pip install transformers ... upgrade to bring in deepseek-v4 support. The sister entry dsv4-fp4-b200-sglang at line 1699 is still pinned to lmsysorg/sglang:deepseek-v4-blackwell@sha256:df18bfc... for exactly this reason, and other DSV4 entries (e.g. trtllm variants at lines 1781/1802/3016/3039) all use specifically-tagged trtllm-deepseek-v4:feat-deepseek_v4-9aa3715 images. Every DSV4 config in this file requires a special image with deepseek_v4 support — the b300 sglang variants are no exception.
Author confirmation
The PR author (functionstackx) acknowledged this directly in this PR's timeline on 2026-05-18T07:45:18Z: "the generic v0.5.12-cu130 image bundles a transformers that doesn'''t recognise model_type: "deepseek_v4", so the bench client crashes in AutoTokenizer.from_pretrained with KeyError: '\''deepseek_v4'\''. ... the generic-image bump is NOT viable until sglang ships transformers with deepseek_v4 support." They closed the PR as not viable, then reopened it with sweep labels intentionally disabled to avoid auto-triggering failing runs while they debug.
Step-by-step proof
- Open
.github/configs/nvidia-master.yamlat line 1986 — image is nowlmsysorg/sglang:v0.5.12-cu130, model isdeepseek-ai/DeepSeek-V4-Pro. - Pull the
v0.5.12-cu130image:docker pull lmsysorg/sglang:v0.5.12-cu130. - Inside the container:
python -c "from transformers import AutoTokenizer; AutoTokenizer.from_pretrained('\''deepseek-ai/DeepSeek-V4-Pro'\'')". - Observe:
KeyError: '\''deepseek_v4'\''raised fromCONFIG_MAPPING.__getitem__because upstream transformers in this image has no entry fordeepseek_v4. - Repeat for line 2027 (
dsv4-fp4-b300-sglang-mtp) — same image, same model, identical failure.
Impact
Both dsv4-fp4-b300-sglang and dsv4-fp4-b300-sglang-mtp sweep runs will fail at tokenizer load 100% of the time. No benchmarks will be produced. The PR description itself acknowledges this risk: "
How to fix
Either (a) revert the image to the SHA-pinned custom deepseek-v4-b300 builds and wait for upstream sglang to ship a transformers release with deepseek_v4 registered, or (b) keep the generic image bump but have the recipe pip install a transformers build containing deepseek_v4 support inside the container before invoking the bench client. Option (a) is the safer choice and matches what is already done for the b200 sister entry at line 1699.
|
Handing off to @Oseltamivir — tracked alongside 7 other stuck Klaud-Cold PRs in #1511. /loop will stop auto-retrying this one. AI-generated via Claude Code /loop. |
d1c4bee to
7e3166e
Compare
|
see unofficial run visualizer at https://inferencex.semianalysis.com/inference?unofficialRun=26144320337 |
2 similar comments
|
see unofficial run visualizer at https://inferencex.semianalysis.com/inference?unofficialRun=26144320337 |
|
see unofficial run visualizer at https://inferencex.semianalysis.com/inference?unofficialRun=26144320337 |
Summary
dsv4-fp4-b300-sglanganddsv4-fp4-b300-sglang-mtpfrom SHA-pinneddeepseek-v4-b300@sha256:...custom builds (20/18d old) tolmsysorg/sglang:v0.5.12-cu130.deepseek-v4-b300tag is a custom DSV4 build; the generic v0.5.12-cu130 may or may not retain DSV4-specific features. Verify via sweep.Test plan
full-sweep-enabledlabel.🤖 Generated with Claude Code