Skip to content

Inconsistent p.ndim usage in Muon optimizer #37

Description

@N-damo

There's an inconsistency in how p.ndim is used in the optimize.py file. In the get_optimizer function, parameters are selected with the condition p.ndim >= 2, but in the Muon class constructor, there's an assertion assert p.ndim == 2, p.ndim which only allows 2D parameters.

This inconsistency can cause issues when using the Muon optimizer with parameters that have more than 2 dimensions.

In get_optimizer
muon_params = [
    p
    for name, p in model.named_parameters()
    if p.ndim >= 2 and "classifiers" not in name and "embedding" not in name
]
In Muon class constructor
for p in muon_params:
    # Use Muon for every parameter in muon_params which is >= 2D and doesn't look like an embedding or head layer
    assert p.ndim == 2, p.ndim
    self.state[p]["use_muon"] = True

could you give any suggestions?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions