Skip to content

Correct Hybrid FLOP Calculation#4508

Open
Phlip79 wants to merge 3 commits into
mainfrom
philip/hybrid-flop-accounting
Open

Correct Hybrid FLOP Calculation#4508
Phlip79 wants to merge 3 commits into
mainfrom
philip/hybrid-flop-accounting

Conversation

@Phlip79

@Phlip79 Phlip79 commented Jun 25, 2026

Copy link
Copy Markdown
Member

Summary

This PR splits the HybridModel TFLOPs/MFU accounting fix out of the GPT-OSS HybridModel migration.

HybridModel commonly represents one logical decoder block as multiple physical hybrid layers, for example *E for attention followed by MoE. Bridge's generic FLOP accounting needs to count those physical symbols as the corresponding logical attention, MLP, and MoE work. Without that, GPT-OSS-style *E layouts can report roughly doubled TFLOPs/MFU even when runtime throughput is unchanged.

Changes:

  • route configs with hybrid_layer_pattern through hybrid FLOP accounting
  • use MCore's canonical get_hybrid_layer_counts(), parse_hybrid_pattern(), and Symbols helpers for HybridModel pattern handling
  • split hybrid attention FLOPs into token-linear projection work and core-attention work so seqlen_squared_sum is honored
  • account for sliding-window attention on physical hybrid attention layers
  • add parity tests comparing equivalent transformer and HybridModel physical patterns

Signed-off-by: Philip Petrakian <ppetrakian@nvidia.com>
(cherry picked from commit d3165d6)
@copy-pr-bot

copy-pr-bot Bot commented Jun 25, 2026

Copy link
Copy Markdown

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

Signed-off-by: Philip Petrakian <ppetrakian@nvidia.com>
@Phlip79

Phlip79 commented Jun 25, 2026

Copy link
Copy Markdown
Member Author

/ok to test e6f0afe

@Phlip79 Phlip79 marked this pull request as ready for review June 25, 2026 21:24
@Phlip79 Phlip79 requested a review from yaoyu-33 June 25, 2026 21:26
Comment thread src/megatron/bridge/training/utils/flop_utils.py
Comment thread tests/unit_tests/models/gpt_oss/test_gpt_oss_bridges.py Outdated
@claude

claude Bot commented Jun 25, 2026

Copy link
Copy Markdown
Contributor

Review: Correct Hybrid FLOP Calculation

The math for splitting attention FLOPs into projection (linear in seq_len) and core-attention (quadratic) terms looks correct and is consistent with the transformer path. The SWA accounting properly mirrors the existing transformer SWA logic. Good parity test coverage.

Two items to clarify:

  1. Import path divergence: flop_utils.py now imports from megatron.core.models.hybrid.hybrid_layer_allocation, while the rest of the codebase (hybrid_provider.py, mlm_compat/model.py) still uses megatron.core.ssm.mamba_hybrid_layer_allocation. If this is the new canonical MCore location, the old sites should be updated in a follow-up.

  2. GPT-OSS test assertion: The new assertion provider.is_hybrid_model is True in test_gpt_oss_bridges.py -- the GPT-OSS bridge does not set this field, and it is not in CONFIG_MAPPING. If this depends on a concurrent PR or MCore change, this test will fail in isolation.

No perf tests impacted.

Signed-off-by: Philip Petrakian <ppetrakian@nvidia.com>
@Phlip79

Phlip79 commented Jun 25, 2026

Copy link
Copy Markdown
Member Author

/ok to test 9551726

@yaoyu-33 yaoyu-33 added bug Something isn't working area:perf Performance optimizations and benchmarking needs-review PR is ready for code review and waiting on a reviewer labels Jun 25, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area:perf Performance optimizations and benchmarking bug Something isn't working needs-review PR is ready for code review and waiting on a reviewer

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants