[CI] Fix gpt_dynamic_inference_tp2_pp2_ep2_gptoss_20b_swa tests#5527
Open
asolergi-nv wants to merge 1 commit into
Open
[CI] Fix gpt_dynamic_inference_tp2_pp2_ep2_gptoss_20b_swa tests#5527asolergi-nv wants to merge 1 commit into
gpt_dynamic_inference_tp2_pp2_ep2_gptoss_20b_swa tests#5527asolergi-nv wants to merge 1 commit into
Conversation
Contributor
|
This PR has been automatically converted to draft because all PRs must start as drafts. When you are ready for review, click Ready for Review to begin the review process. This will:
See the contribution guide for more details. |
Contributor
Author
|
/claude review |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What does this PR do ?
Summary
The functional test
gpt_dynamic_inference_tp2_pp2_ep2_gptoss_20b_swahas failed onmainsince it was introduced and has never passed inmain. It crashes during model construction with:This PR restores the YaRN configuration so GPT-OSS dynamic inference builds and runs again, and places the wiring at the shared config-construction chokepoint so a future builder refactor cannot silently drop it again.
The problem
GPT-OSS uses YaRN RoPE.
GPTModel.__init__(theposition_embedding_type == 'yarn'branch) andyarn_rotary_pos_embeddingread seven hyperparameters off the config as dynamic attributes, with no default:These
yarn_*names are not declared as fields onTransformerConfig(onlyMLATransformerConfigdeclares the unprefixed equivalents —rotary_scaling_factor,mscale,beta_fast, …). So unless some caller setsconfig.yarn_*explicitly, the firstgetattrraisesAttributeError.The dynamic inference path builds its config through
GPTModelBuilder(gpt_config_from_args(args))→core_transformer_config_from_args, which only copies args that are declared dataclass fields of the config class. None of theyarn_*attributes are fields, so they were never set — and the baregetattrcrashed.Why it happened — a clean-merge / semantic conflict between two PRs
The YaRN-for-GPT-OSS feature reads
config.yarn_*, but the introducing change (ADLR/megatron-lm!4044, "YaRN support for gpt-oss") only ever set those attributes in two places: the ModelOpt/export builder (megatron/post_training/model_builder.py, gated on--enable-gpt-oss) and, later, the legacygpt_buildercallable.The functional test arrived in PR #5249 ("Support SWA and sink attention in dynamic inference", merged 2026-06-24 04:02 UTC). That PR wired YaRN into the config inside
gpt_builders.py:gpt_buildervia a helper_apply_yarn_config_from_args(commitdfdb9e8: "apply YaRN config in gpt_builders when running inference"). At that time the inference entry point routed throughgpt_builder, so the attributes were set and the author's smoke test legitimately generated tokens:About ten hours later, PR #5169 ("Add inference functions… and remove legacy modelbuilder functions", merged 2026-06-24 14:04 UTC) refactored
megatron/inference/utils.py:The new
GPTModelBuilder.build_modelbuildsGPTModeldirectly fromconfig.transformerand never calls_apply_yarn_config_from_args. The two PRs touch different files, so git merged them with no textual conflict — but the YaRN wiring that #5249's test depends on now lives in a function (gpt_builder) that the inference path no longer calls. Neither PR's CI caught it: #5249 passed because the legacy path still set the attributes; #5169 predated the new test and so removing the legacy builder broke nothing it tested. The break only materialized once both were inmain.A secondary latent issue: the original
_apply_yarn_config_from_argsskipped any arg whose value wasNone, leaving the attribute unset — so even on the legacy path, omitting any--yarn-*flag would reintroduce the sameAttributeError.The fix
Move the YaRN-from-args wiring out of the now-vestigial
gpt_builderand into the shared config-construction chokepoint,core_transformer_config_from_argsinmegatron/training/argument_utils.py. Both the legacygpt_builderand the newGPTModelBuilder(viagpt_config_from_args) build their config through this function, so both paths are covered by a single source of truth.The helper:
position_embedding_type == 'yarn'and the model is not MLA (MLA declares the unprefixed fields and consumes them in its own attention path).--rotary-scaling-factor → yarn_rotary_scaling_factor,--mscale → yarn_mscale,--mscale-all-dim → yarn_mscale_all_dim); the remaining--yarn-*flags map by name.YarnRotaryEmbedding's documented defaults when a flag is omitted (original_max_position_embeddings=4096,beta_fast=32.0,beta_slow=1.0,correction_range_round_to_int=True). This closes theNone-skip gap so a missing flag degrades gracefully instead of raisingAttributeError.The redundant
_apply_yarn_config_from_argshelper and its call site are removed fromgpt_builders.py.Files changed
megatron/training/argument_utils.py— add_apply_yarn_config_from_argsand call it fromcore_transformer_config_from_args.gpt_builders.py— remove the duplicate helper and its call (now centralized).Issue tracking
For PRs from open-source community contributors:
Linked issue:
Contribution process
Pre-checks
Code review
Feel free to message or comment @NVIDIA/mcore-oncall to help accelerate your merge into main. The less complex your PR is, the faster it will be approved and merged!
All PRs start as draft. If you open a non-draft PR, it will be automatically converted to draft.
Step 1: Mark PR as "Ready for Review"
.github/CODEOWNERS.Final Review might get declined if these requirements are not fulfilled.
Step 2: Final Review
For PRs that change
megatron/core, once all expert reviewers have approved, theFinal Reviewlabel is applied automatically and final reviewers are assigned.For PRs outside
megatron/core, this step is skipped.Step 3: Approved
Once all required reviewers have approved, the
Approvedlabel is applied automatically.Merge
Any member of mcore-engineers will be able to merge your PR.