Skip to content

fix(clip_grads): handle empty grads_for_norm in inf-norm and p-norm paths#5530

Open
Mattral wants to merge 1 commit into
NVIDIA:mainfrom
Mattral:main
Open

fix(clip_grads): handle empty grads_for_norm in inf-norm and p-norm paths#5530
Mattral wants to merge 1 commit into
NVIDIA:mainfrom
Mattral:main

Conversation

@Mattral

@Mattral Mattral commented Jun 28, 2026

Copy link
Copy Markdown

Summary

get_grad_norm_fp32 crashes when called with an empty gradient list,
which occurs in practice when all parameters on a rank are filtered out
(frozen layers, shared params, TP duplicates). Two out of three norm-type
branches are affected; the L2 branch already has a correct guard.

Changes

megatron/core/optimizer/clip_grads.py

Branch Bug Fix
inf norm (line 95) max() over empty generator → ValueError Add default=torch.tensor(0.0) to max()
generic p-norm (line 127) total_norm stays float 0.0; all_reduce receives a non-Tensor → TypeError Initialise total_norm = torch.zeros(1, dtype=torch.float, device='cuda') before loop

No change to the L2 path or any non-empty-list behaviour.

Testing

  • Added unit tests in tests/unit_tests/optimizer/test_clip_grads.py
    covering empty-list calls for all three norm_type paths (inf, 2.0,
    custom p). Tests run without GPU using the local fallback
    implementations.
  • Existing CI passes unchanged.

Checklist

  • Commits signed off (git commit -s)
  • Rebased on main
  • No unrelated formatting changes
  • One logical change per commit

Fixes #5529

@Mattral Mattral requested review from a team as code owners June 28, 2026 11:51
@copy-pr-bot

copy-pr-bot Bot commented Jun 28, 2026

Copy link
Copy Markdown

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@svcnvidia-nemo-ci svcnvidia-nemo-ci marked this pull request as draft June 28, 2026 11:51
@github-actions

Copy link
Copy Markdown
Contributor

This PR has been automatically converted to draft because all PRs must start as drafts.

When you are ready for review, click Ready for Review to begin the review process. This will:

  1. Add the oncall reviewer (optional reviewer)
  2. Add required review teams based on your changes

See the contribution guide for more details.

@Mattral Mattral marked this pull request as ready for review June 28, 2026 11:52
@svcnvidia-nemo-ci svcnvidia-nemo-ci requested a review from a team June 28, 2026 11:52
get_grad_norm_fp32 crashes when grads_for_norm is empty:

- inf-norm path (line 95): max() over an empty generator raises
  ValueError. Fix: pass default=torch.tensor(0.0) to max().

- generic p-norm path (lines 127-129): total_norm stays as a Python
  float after the loop body never executes, then
  torch.distributed.all_reduce receives a float instead of a Tensor,
  raising TypeError. Fix: initialise total_norm as a zero CUDA tensor
  before the loop, mirroring the existing L2 path pattern.

The L2 path (norm_type == 2.0) already handles the empty case correctly
via an explicit `if grads_for_norm` guard; this commit applies the same
intent to the remaining two branches without changing the non-empty
behaviour.

Signed-off-by: Mattral <mattralminn@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

get_grad_norm_fp32 crashes on empty gradient list with inf norm or custom p-norm

2 participants