Correct Hybrid FLOP Calculation by Phlip79 · Pull Request #4508 · NVIDIA-NeMo/Megatron-Bridge

Phlip79 · 2026-06-25T20:51:46Z

Summary

This PR splits the HybridModel TFLOPs/MFU accounting fix out of the GPT-OSS HybridModel migration.

HybridModel commonly represents one logical decoder block as multiple physical hybrid layers, for example *E for attention followed by MoE. Bridge's generic FLOP accounting needs to count those physical symbols as the corresponding logical attention, MLP, and MoE work. Without that, GPT-OSS-style *E layouts can report roughly doubled TFLOPs/MFU even when runtime throughput is unchanged.

Changes:

route configs with hybrid_layer_pattern through hybrid FLOP accounting
use MCore's canonical get_hybrid_layer_counts(), parse_hybrid_pattern(), and Symbols helpers for HybridModel pattern handling
split hybrid attention FLOPs into token-linear projection work and core-attention work so seqlen_squared_sum is honored
account for sliding-window attention on physical hybrid attention layers
add parity tests comparing equivalent transformer and HybridModel physical patterns

Signed-off-by: Philip Petrakian <ppetrakian@nvidia.com> (cherry picked from commit d3165d6)

copy-pr-bot · 2026-06-25T20:51:50Z

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

Signed-off-by: Philip Petrakian <ppetrakian@nvidia.com>

Phlip79 · 2026-06-25T21:22:35Z

/ok to test e6f0afe

claude · 2026-06-25T21:30:47Z

Review: Correct Hybrid FLOP Calculation

The math for splitting attention FLOPs into projection (linear in seq_len) and core-attention (quadratic) terms looks correct and is consistent with the transformer path. The SWA accounting properly mirrors the existing transformer SWA logic. Good parity test coverage.

Two items to clarify:

Import path divergence: flop_utils.py now imports from megatron.core.models.hybrid.hybrid_layer_allocation, while the rest of the codebase (hybrid_provider.py, mlm_compat/model.py) still uses megatron.core.ssm.mamba_hybrid_layer_allocation. If this is the new canonical MCore location, the old sites should be updated in a follow-up.
GPT-OSS test assertion: The new assertion provider.is_hybrid_model is True in test_gpt_oss_bridges.py -- the GPT-OSS bridge does not set this field, and it is not in CONFIG_MAPPING. If this depends on a concurrent PR or MCore change, this test will fail in isolation.

No perf tests impacted.

Signed-off-by: Philip Petrakian <ppetrakian@nvidia.com>

Phlip79 · 2026-06-25T21:42:05Z

/ok to test 9551726

fix(training): correct hybrid FLOP accounting

2d0a07d

Signed-off-by: Philip Petrakian <ppetrakian@nvidia.com> (cherry picked from commit d3165d6)

fix(training): use mcore hybrid layer parser for flops

e6f0afe

Signed-off-by: Philip Petrakian <ppetrakian@nvidia.com>

copy-pr-bot Bot temporarily deployed to public June 25, 2026 21:23 Inactive

copy-pr-bot Bot had a problem deploying to test June 25, 2026 21:23 Error

Phlip79 marked this pull request as ready for review June 25, 2026 21:24

Phlip79 mentioned this pull request Jun 25, 2026

Migrate GPT-OSS to HybridModel #4476

Open

Phlip79 requested a review from yaoyu-33 June 25, 2026 21:26

claude Bot reviewed Jun 25, 2026

View reviewed changes

Comment thread src/megatron/bridge/training/utils/flop_utils.py

claude Bot reviewed Jun 25, 2026

View reviewed changes

Comment thread tests/unit_tests/models/gpt_oss/test_gpt_oss_bridges.py Outdated

copy-pr-bot Bot temporarily deployed to public June 25, 2026 21:32 Inactive

copy-pr-bot Bot temporarily deployed to public June 25, 2026 21:33 Inactive

fix(training): keep hybrid flops PR scoped

9551726

Signed-off-by: Philip Petrakian <ppetrakian@nvidia.com>

copy-pr-bot Bot temporarily deployed to public June 25, 2026 21:42 Inactive

copy-pr-bot Bot temporarily deployed to test June 25, 2026 21:43 Inactive

Phlip79 mentioned this pull request Jun 25, 2026

PoR: migrate to MCore HybridModel #4510

Open

5 tasks

copy-pr-bot Bot temporarily deployed to public June 25, 2026 21:55 Inactive

copy-pr-bot Bot temporarily deployed to public June 25, 2026 22:16 Inactive

yaoyu-33 added bug Something isn't working area:perf Performance optimizations and benchmarking needs-review PR is ready for code review and waiting on a reviewer labels Jun 25, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Correct Hybrid FLOP Calculation#4508

Correct Hybrid FLOP Calculation#4508
Phlip79 wants to merge 3 commits into
mainfrom
philip/hybrid-flop-accounting

Phlip79 commented Jun 25, 2026 •

edited

Loading

Uh oh!

copy-pr-bot Bot commented Jun 25, 2026

Uh oh!

Phlip79 commented Jun 25, 2026

Uh oh!

Uh oh!

Uh oh!

claude Bot commented Jun 25, 2026

Uh oh!

Phlip79 commented Jun 25, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

Phlip79 commented Jun 25, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Uh oh!

copy-pr-bot Bot commented Jun 25, 2026

Uh oh!

Phlip79 commented Jun 25, 2026

Uh oh!

Uh oh!

Uh oh!

claude Bot commented Jun 25, 2026

Review: Correct Hybrid FLOP Calculation

Uh oh!

Phlip79 commented Jun 25, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Phlip79 commented Jun 25, 2026 •

edited

Loading