Releases: NVIDIA-NeMo/Emerging-Optimizers
Releases · NVIDIA-NeMo/Emerging-Optimizers
v0.3.0
New optimizers
- LaProp — RMSProp with momentum / normalized-SGD-with-momentum variant (#206)
- MAdam — magnitude-aware Adam that removes
epsfrom the update path (#191) - SPEL — Spectral steepest descent on the Stiefel manifold (#106)
- NorMuon — norm-based adaptive moment estimation with orthogonalized momentum (#107)
Newton–Schulz / orthogonalization
- Batched Newton–Schulz step for distributed/grouped weights (#170)
- DeepSeek-style NS coefficients (#167)
- Newton–Schulz benchmark (#172)
SOAP
- Eigenbasis is now updated every step — amortization removed (#160). Behavioral change for SOAP users; previous
precondition_frequency/ amortization knobs no longer apply.
Breaking API changes
calculate_*_updatesignatures reordered and made kwargs-only. New shape:(grad, <state buffers>, *, betas|momentum, eps, correct_bias, [nesterov], step, <extras>).correct_biasnow followseps; the scalarstep: inttrails the cluster. Update
positional callers accordingly.- LaProp arg order:
correct_biasis no longer beforebetas. - Optimizer
__init__s now use*,to mark non-core args (weight_decay_method,dim,correct_bias, etc.) as keyword-only for
Lion,LaProp,ObliqueSGD, andObliqueAdam. Existing keyword-style call sites are unaffected.
Bug fixes
Docs / infra
v0.2.0
Emerging-Optimizers v0.2.0 Release Notes
Highlights
v0.2.0 brings significant new optimizers, a unified optimizer registry, improved test coverage, and infrastructure upgrades including Python 3.12+ requirement.
New Optimizers
- Adaptive Muon (AdaMuon / NorMuon) - Adaptive learning rate variants of Muon
- MOP - Matrix Orthogonalization Preconditioning optimizer with polar grad scaling
- MuonHyperball - Muon with hyperball-style norm-preserving weight updates (sphere manifold)
- PolarGrad - Polar decomposition-based orthogonalized optimizer
- SinkhornMuon - Sinkhorn iteration-based orthogonalization for Muon
- Lion - Evolved sign momentum optimizer
- REKLS - A variant of SOAP that uses the up to date eigenbasis calculated by Eigen decomposition
New Features
- Optimizer Registry - Decorator-based registration system with
register_optimizer,get_optimizer_cls, andget_configured_optimizer_clsfor easy optimizer lookup and configuration - Conv1d utilities - Support for applying orthogonalized optimizers to 1D convolution layers
- Generalized Newton-Schulz iterator - Flexible coefficient iteration for matrix orthogonalization
Improvements
- Refactored SOAP internals and utilities
- Refactored scalar optimizers package structure
- Standardized naming to follow PyTorch conventions
- Reorganized and expanded test suite with improved coverage
- Cleaned up eigendecomposition utilities (removed deprecated
eig_orthogonal_iteration) - Reduced logging verbosity in Newton-Schulz iterator
Breaking Changes
- Minimum Python version raised to 3.12
- Naming changes to follow PyTorch conventions (see #120)