Skip to content

Releases: NVIDIA-NeMo/Emerging-Optimizers

v0.3.0

26 May 23:35
b309e2f

Choose a tag to compare

New optimizers

  • LaProp — RMSProp with momentum / normalized-SGD-with-momentum variant (#206)
  • MAdam — magnitude-aware Adam that removes eps from the update path (#191)
  • SPEL — Spectral steepest descent on the Stiefel manifold (#106)
  • NorMuon — norm-based adaptive moment estimation with orthogonalized momentum (#107)

Newton–Schulz / orthogonalization

  • Batched Newton–Schulz step for distributed/grouped weights (#170)
  • DeepSeek-style NS coefficients (#167)
  • Newton–Schulz benchmark (#172)

SOAP

  • Eigenbasis is now updated every step — amortization removed (#160). Behavioral change for SOAP users; previous
    precondition_frequency / amortization knobs no longer apply.

Breaking API changes

  • calculate_*_update signatures reordered and made kwargs-only. New shape: (grad, <state buffers>, *, betas|momentum, eps, correct_bias, [nesterov], step, <extras>). correct_bias now follows eps; the scalar step: int trails the cluster. Update
    positional callers accordingly.
  • LaProp arg order: correct_bias is no longer before betas.
  • Optimizer __init__s now use *, to mark non-core args (weight_decay_method, dim, correct_bias, etc.) as keyword-only for
    Lion, LaProp, ObliqueSGD, and ObliqueAdam. Existing keyword-style call sites are unaffected.

Bug fixes

  • NorMuon update norm scaling (#187, #186)
  • Sinkhorn test tolerance relaxed for stable CI runs (#173)

Docs / infra

  • New primer: epsilon in Adam (#189)
  • Eigendecomposition utility improvements (#190)
  • Release workflows refactor (#176), CI cleanup (#185, #171)
  • SECURITY.md added (#163)

v0.2.0

18 Mar 18:36
1effa02

Choose a tag to compare

Emerging-Optimizers v0.2.0 Release Notes

Highlights

v0.2.0 brings significant new optimizers, a unified optimizer registry, improved test coverage, and infrastructure upgrades including Python 3.12+ requirement.

New Optimizers

  • Adaptive Muon (AdaMuon / NorMuon) - Adaptive learning rate variants of Muon
  • MOP - Matrix Orthogonalization Preconditioning optimizer with polar grad scaling
  • MuonHyperball - Muon with hyperball-style norm-preserving weight updates (sphere manifold)
  • PolarGrad - Polar decomposition-based orthogonalized optimizer
  • SinkhornMuon - Sinkhorn iteration-based orthogonalization for Muon
  • Lion - Evolved sign momentum optimizer
  • REKLS - A variant of SOAP that uses the up to date eigenbasis calculated by Eigen decomposition

New Features

  • Optimizer Registry - Decorator-based registration system with register_optimizer, get_optimizer_cls, and get_configured_optimizer_cls for easy optimizer lookup and configuration
  • Conv1d utilities - Support for applying orthogonalized optimizers to 1D convolution layers
  • Generalized Newton-Schulz iterator - Flexible coefficient iteration for matrix orthogonalization

Improvements

  • Refactored SOAP internals and utilities
  • Refactored scalar optimizers package structure
  • Standardized naming to follow PyTorch conventions
  • Reorganized and expanded test suite with improved coverage
  • Cleaned up eigendecomposition utilities (removed deprecated eig_orthogonal_iteration)
  • Reduced logging verbosity in Newton-Schulz iterator

Breaking Changes

  • Minimum Python version raised to 3.12
  • Naming changes to follow PyTorch conventions (see #120)

v0.1.0

12 Nov 03:59
d5363b4

Choose a tag to compare

Add support of preconditioning optimizers, Muon, Soap, PGSD.

Add support of distributed Newton-Schulz for tensor parallel support in Muon.