Skip to content

fix(checkpoint): ship nemotron_h remote-code in converted HF dirs for vLLM#11

Draft
Kyle1668 wants to merge 1 commit into
mainfrom
conv-modeling-fix
Draft

fix(checkpoint): ship nemotron_h remote-code in converted HF dirs for vLLM#11
Kyle1668 wants to merge 1 commit into
mainfrom
conv-modeling-fix

Conversation

@Kyle1668

Copy link
Copy Markdown
Contributor

Problem

fixup_hf_output deliberately removes auto_map + the configuration/modeling_nemotron_h.py files after conversion, assuming "transformers >= 5.3.0 has native NemotronH support." That holds for the megatron env (transformers 5.10.x) but breaks every vLLM-based eval/inference consumer: vLLM 0.18.x is hard-pinned to transformers<5,>=4.56.0 (4.57.x), which has no native NemotronH. So vLLM's config parse fails with Transformers does not recognize this architecture and the model never loads.

This silently broke the GEOD-147 MQ capability evals (few-shot tasks load via vLLM). The manual workaround was copying the modeling files + setting auto_map into each converted dir.

Fix

Flip the step remove → ensure-present: fetch the upstream model's remote code (configuration_nemotron_h.py, modeling_nemotron_h.py) + auto_map via hf_hub_download and write them into the converted dir. transformers >= 5.3.0 ignores the remote code (uses native), so it's safe for both consumers. One file, +41/−16.

Note: the export's HF snapshot has config.json (with auto_map) but not the .py remote code — from_hf_pretrained doesn't trust_remote_code, so the files are never pulled locally — hence hf_hub_download rather than a snapshot copy.

Validation

  • hf_hub_download confirmed to fetch both files from nvidia/...Super-120B-A12B-BF16 (config 19.8 KB, modeling 82.3 KB).
  • py_compile clean; ruff adds no new lint (only 2 pre-existing repo-debt errors remain).
  • Not yet run end-to-end through a full conversion — the logic mirrors the manual patch that unblocked the MQ evals.

🤖 Generated with Claude Code

… vLLM

fixup_hf_output previously REMOVED auto_map + the configuration/modeling_nemotron_h.py
files, assuming transformers >= 5.3.0 has native NemotronH support. That holds for the
megatron env (transformers 5.10.x) but breaks every vLLM consumer: vLLM 0.18.x is
hard-pinned to transformers<5 (4.57.x), which has NO native NemotronH -- so its config
parse fails ("Transformers does not recognize this architecture") and the model never
loads. This silently broke the MQ capability evals.

Flip remove -> ensure-present: copy the upstream model's remote-code modeling files +
auto_map (from the HF cache snapshot the export already downloaded) into the converted
dir. transformers >= 5.3.0 ignores the remote code (uses native), so it's safe for both.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant