Skip to content

Increasing coord check for the network output #71

@AkshitaB

Description

@AkshitaB

I'm implementing muP for the OLMo model, and am facing an issue with the coordinate check.

sp_trsfmr_adamw_coord
μp_trsfmr_adamw_coord

The increasing l1 is for the network output. Following the docs, I also set readout init and query init to zero. I also ensure that the initialization is applied after set_base_shapes is called.

What other things can I check to debug the issue?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions