[Handoff to @Oseltamivir Claude /loop] [Klaud Cold] Update dsv4-fp4-b300-sglang (+mtp) SGLang image to v0.5.12-cu130 by functionstackx · Pull Request #1455 · SemiAnalysisAI/InferenceX

functionstackx · 2026-05-17T23:44:44Z

Summary

Bumps dsv4-fp4-b300-sglang and dsv4-fp4-b300-sglang-mtp from SHA-pinned deepseek-v4-b300@sha256:... custom builds (20/18d old) to lmsysorg/sglang:v0.5.12-cu130.
⚠️ Note: the deepseek-v4-b300 tag is a custom DSV4 build; the generic v0.5.12-cu130 may or may not retain DSV4-specific features. Verify via sweep.

Test plan

Full sweep passes with full-sweep-enabled label.

🤖 Generated with Claude Code

github-actions · 2026-05-17T23:44:51Z

Thanks for the contribution! For vLLM & SGLang, please ensure that your recipes is similar to the official vLLM recipes and/or the SGLang cookbook

If it is not, please create a PR first before we can merge your single node PR into the master branch. Let's ensure that the documentation is first class such that the entire ML community can benefit from your hard work! Thank you

PR authors are responsible for ensuring that after merging, all GitHub Action jobs fully pass. A lot of the time, failures are just flakes and simply re-running the failed jobs will fix it. If re-running failed jobs is attempted, PR authors are responsible for ensuring it passes. See GitHub's docs on re-running failed jobs: https://docs.github.com/en/actions/how-tos/manage-workflow-runs/re-run-workflows-and-jobs#re-running-failed-jobs-in-a-workflow

As a rule of thumb, generally, PR authors should request a review & get a PR approval from the respective companies' CODEOWNERS before requesting a review from core maintainers.

If additional help is needed, PR authors can reach out to core maintainers over Slack.

github-actions · 2026-05-17T23:44:51Z

Thanks for the contribution! For vLLM & SGLang, please ensure that your recipes is similar to the official vLLM recipes and/or the SGLang cookbook

If it is not, please create a PR first before we can merge your single node PR into the master branch. Let's ensure that the documentation is first class such that the entire ML community can benefit from your hard work! Thank you

PR authors are responsible for ensuring that after merging, all GitHub Action jobs fully pass. A lot of the time, failures are just flakes and simply re-running the failed jobs will fix it. If re-running failed jobs is attempted, PR authors are responsible for ensuring it passes. See GitHub's docs on re-running failed jobs: https://docs.github.com/en/actions/how-tos/manage-workflow-runs/re-run-workflows-and-jobs#re-running-failed-jobs-in-a-workflow

As a rule of thumb, generally, PR authors should request a review & get a PR approval from the respective companies' CODEOWNERS before requesting a review from core maintainers.

If additional help is needed, PR authors can reach out to core maintainers over Slack.

github-actions · 2026-05-17T23:45:21Z

see unofficial run visualizer at https://inferencex.semianalysis.com/inference?unofficialRun=26006190734
see unofficial run visualizer at https://inferencex.semianalysis.com/evaluation?unofficialRun=26006190734

functionstackx · 2026-05-18T07:45:18Z

Closing as not viable — same DSV4 transformers issue as #1460/#1450: the generic v0.5.12-cu130 image bundles a transformers that doesn't recognise model_type: "deepseek_v4", so the bench client crashes in AutoTokenizer.from_pretrained with KeyError: 'deepseek_v4'. Custom deepseek-v4-b300@sha256:... image bundles a patched transformers; the generic-image bump is NOT viable until sglang ships transformers with deepseek_v4 support (or the recipe ships its own pip install transformers upgrade).

Keep DSV4 b300 pinned to the SHA-pinned custom image for now. Will reopen when upstream catches up.

functionstackx · 2026-05-18T07:47:03Z

Reopening — leaving sweep labels off so it doesn't auto-trigger while you debug + patch the recipe manually.

github-actions · 2026-05-18T07:49:14Z

see unofficial run visualizer at https://inferencex.semianalysis.com/inference?unofficialRun=26006192244
see unofficial run visualizer at https://inferencex.semianalysis.com/evaluation?unofficialRun=26006192244

claude · 2026-05-18T07:50:49Z

 # Parallelisms and concurrency ranges mirror dsv4-fp4-b200-vllm.
 dsv4-fp4-b300-sglang:
-  image: lmsysorg/sglang:deepseek-v4-b300@sha256:2fec8d7958bb0d53b50d7bf04d6ae6a7de8a35503775826e0550a45dd8c3ee15
+  image: lmsysorg/sglang:v0.5.12-cu130


🔴 Bumping dsv4-fp4-b300-sglang (line 1986) and dsv4-fp4-b300-sglang-mtp (line 2027) from the SHA-pinned lmsysorg/sglang:deepseek-v4-b300@sha256:... custom image to the generic lmsysorg/sglang:v0.5.12-cu130 strips the patched transformers that registers model_type: "deepseek_v4", so AutoTokenizer.from_pretrained("deepseek-ai/DeepSeek-V4-Pro") will crash with KeyError: '\''deepseek_v4'\'' before the server is even probed. Either hold this change until upstream sglang ships transformers with deepseek_v4 support, or have the recipe pip install a patched transformers inside the container before invoking the bench client. The PR author has already acknowledged this in the timeline.

Extended reasoning...

What the bug is

Both modified entries (dsv4-fp4-b300-sglang at line 1986 and dsv4-fp4-b300-sglang-mtp at line 2027) swap out a SHA-pinned custom image for the generic lmsysorg/sglang:v0.5.12-cu130 image. The custom deepseek-v4-b300@sha256:... builds bundle a patched transformers that registers a model type for deepseek_v4 (the config.json of deepseek-ai/DeepSeek-V4-Pro declares model_type: "deepseek_v4"). The generic v0.5.12-cu130 image bundles the upstream transformers release, which has no deepseek_v4 entry in its model-type registry.

Code path that triggers the failure

Sweep dispatcher launches a container with image: lmsysorg/sglang:v0.5.12-cu130.

Bench client runs and calls AutoTokenizer.from_pretrained("deepseek-ai/DeepSeek-V4-Pro").

transformers downloads the model repo's config.json, reads model_type: "deepseek_v4", then attempts to look it up in CONFIG_MAPPING.

Upstream transformers in v0.5.12-cu130 does not have deepseek_v4 registered → KeyError: 'deepseek_v4' is raised before the SGLang server is ever probed.

Why existing code doesn't prevent it

The recipe scripts for these two configs only change the image tag; nothing in the recipe pipes in a pip install transformers ... upgrade to bring in deepseek-v4 support. The sister entry dsv4-fp4-b200-sglang at line 1699 is still pinned to lmsysorg/sglang:deepseek-v4-blackwell@sha256:df18bfc... for exactly this reason, and other DSV4 entries (e.g. trtllm variants at lines 1781/1802/3016/3039) all use specifically-tagged trtllm-deepseek-v4:feat-deepseek_v4-9aa3715 images. Every DSV4 config in this file requires a special image with deepseek_v4 support — the b300 sglang variants are no exception.

Author confirmation

The PR author (functionstackx) acknowledged this directly in this PR's timeline on 2026-05-18T07:45:18Z: "the generic v0.5.12-cu130 image bundles a transformers that doesn'''t recognise model_type: "deepseek_v4", so the bench client crashes in AutoTokenizer.from_pretrained with KeyError: '\''deepseek_v4'\''. ... the generic-image bump is NOT viable until sglang ships transformers with deepseek_v4 support." They closed the PR as not viable, then reopened it with sweep labels intentionally disabled to avoid auto-triggering failing runs while they debug.

Step-by-step proof

Open .github/configs/nvidia-master.yaml at line 1986 — image is now lmsysorg/sglang:v0.5.12-cu130, model is deepseek-ai/DeepSeek-V4-Pro.

Pull the v0.5.12-cu130 image: docker pull lmsysorg/sglang:v0.5.12-cu130.

Inside the container: python -c "from transformers import AutoTokenizer; AutoTokenizer.from_pretrained('\''deepseek-ai/DeepSeek-V4-Pro'\'')".

Observe: KeyError: '\''deepseek_v4'\'' raised from CONFIG_MAPPING.__getitem__ because upstream transformers in this image has no entry for deepseek_v4.

Repeat for line 2027 (dsv4-fp4-b300-sglang-mtp) — same image, same model, identical failure.

Impact

Both dsv4-fp4-b300-sglang and dsv4-fp4-b300-sglang-mtp sweep runs will fail at tokenizer load 100% of the time. No benchmarks will be produced. The PR description itself acknowledges this risk: "⚠️ Note: the deepseek-v4-b300 tag is a custom DSV4 build; the generic v0.5.12-cu130 may or may not retain DSV4-specific features."

How to fix

Either (a) revert the image to the SHA-pinned custom deepseek-v4-b300 builds and wait for upstream sglang to ship a transformers release with deepseek_v4 registered, or (b) keep the generic image bump but have the recipe pip install a transformers build containing deepseek_v4 support inside the container before invoking the bench client. Option (a) is the safer choice and matches what is already done for the b200 sister entry at line 1699.

functionstackx · 2026-05-18T19:56:11Z

Handing off to @Oseltamivir — tracked alongside 7 other stuck Klaud-Cold PRs in #1511. /loop will stop auto-retrying this one.

AI-generated via Claude Code /loop.

github-actions · 2026-05-20T10:34:47Z

see unofficial run visualizer at https://inferencex.semianalysis.com/inference?unofficialRun=26144320337
see unofficial run visualizer at https://inferencex.semianalysis.com/evaluation?unofficialRun=26144320337

github-actions · 2026-05-22T17:25:31Z

see unofficial run visualizer at https://inferencex.semianalysis.com/inference?unofficialRun=26144320337
see unofficial run visualizer at https://inferencex.semianalysis.com/evaluation?unofficialRun=26144320337

github-actions · 2026-05-23T02:44:04Z

see unofficial run visualizer at https://inferencex.semianalysis.com/inference?unofficialRun=26144320337
see unofficial run visualizer at https://inferencex.semianalysis.com/evaluation?unofficialRun=26144320337

functionstackx requested a review from a team May 17, 2026 23:44

functionstackx added the full-sweep-enabled label May 17, 2026

functionstackx requested review from jgangani and kedarpotdar-nv as code owners May 17, 2026 23:44

github-project-automation Bot added this to InferenceMAX Board May 17, 2026

functionstackx added a commit that referenced this pull request May 17, 2026

chore: fill pr-link for #1455

d1c4bee

functionstackx changed the title ~~Update dsv4-fp4-b300-sglang (+mtp) SGLang image to v0.5.12-cu130~~ [Klaud Cold] Update dsv4-fp4-b300-sglang (+mtp) SGLang image to v0.5.12-cu130 May 17, 2026

functionstackx closed this May 18, 2026

github-project-automation Bot moved this to Done in InferenceMAX Board May 18, 2026

functionstackx reopened this May 18, 2026

functionstackx removed the full-sweep-enabled label May 18, 2026

claude Bot reviewed May 18, 2026

View reviewed changes

functionstackx mentioned this pull request May 18, 2026

[AI Generated] [Handoff] out of 70+ image updates, 13 stuck Klaud Cold PRs need upstream coordination / scope decisions #1511

Open

functionstackx changed the title ~~[Klaud Cold] Update dsv4-fp4-b300-sglang (+mtp) SGLang image to v0.5.12-cu130~~ [Handoff to @Oseltamivir Claude /loop] [Klaud Cold] Update dsv4-fp4-b300-sglang (+mtp) SGLang image to v0.5.12-cu130 May 18, 2026

Update dsv4-fp4-b300-sglang (+mtp) SGLang image to v0.5.12-cu130

7e3166e

functionstackx force-pushed the update-dsv4-fp4-b300-sglang-v0.5.12 branch from d1c4bee to 7e3166e Compare May 20, 2026 05:48

functionstackx changed the title ~~[Handoff to @Oseltamivir Claude /loop] [Klaud Cold] Update dsv4-fp4-b300-sglang (+mtp) SGLang image to v0.5.12-cu130~~ [Klaud Cold] Update dsv4-fp4-b300-sglang (+mtp) SGLang image to v0.5.12-cu130 May 20, 2026

functionstackx added the full-sweep-enabled label May 20, 2026

functionstackx changed the title ~~[Klaud Cold] Update dsv4-fp4-b300-sglang (+mtp) SGLang image to v0.5.12-cu130~~ [Handoff to @Oseltamivir Claude /loop] [Klaud Cold] Update dsv4-fp4-b300-sglang (+mtp) SGLang image to v0.5.12-cu130 May 20, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Handoff to @Oseltamivir Claude /loop] [Klaud Cold] Update dsv4-fp4-b300-sglang (+mtp) SGLang image to v0.5.12-cu130#1455

[Handoff to @Oseltamivir Claude /loop] [Klaud Cold] Update dsv4-fp4-b300-sglang (+mtp) SGLang image to v0.5.12-cu130#1455
functionstackx wants to merge 1 commit into
mainfrom
update-dsv4-fp4-b300-sglang-v0.5.12

functionstackx commented May 17, 2026

Uh oh!

github-actions Bot commented May 17, 2026

Uh oh!

github-actions Bot commented May 17, 2026

Uh oh!

github-actions Bot commented May 17, 2026

Uh oh!

functionstackx commented May 18, 2026

Uh oh!

functionstackx commented May 18, 2026

Uh oh!

github-actions Bot commented May 18, 2026

Uh oh!

claude Bot May 18, 2026

Uh oh!

functionstackx commented May 18, 2026

Uh oh!

github-actions Bot commented May 20, 2026

Uh oh!

github-actions Bot commented May 22, 2026

Uh oh!

github-actions Bot commented May 23, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

functionstackx commented May 17, 2026

Summary

Test plan

Uh oh!

github-actions Bot commented May 17, 2026

Uh oh!

github-actions Bot commented May 17, 2026

Uh oh!

github-actions Bot commented May 17, 2026

Uh oh!

functionstackx commented May 18, 2026

Uh oh!

functionstackx commented May 18, 2026

Uh oh!

github-actions Bot commented May 18, 2026

Uh oh!

claude Bot May 18, 2026

Choose a reason for hiding this comment

What the bug is

Code path that triggers the failure

Why existing code doesn't prevent it

Author confirmation

Step-by-step proof

Impact

How to fix

Uh oh!

functionstackx commented May 18, 2026

Uh oh!

github-actions Bot commented May 20, 2026

Uh oh!

github-actions Bot commented May 22, 2026

Uh oh!

github-actions Bot commented May 23, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant