Add sweep for relu2max capped hsnorm peri ln compat by klei22 · Pull Request #838 · ReaLLMASIC/ReaLLM-Forge

klei22 · 2026-06-14T23:52:15Z

No description provided.

Copilot

Pull request overview

This PR adds a new exploration sweep for testing ReLU2Max + Infinite Attention in peri-LN mode using CappedHyperSphereNorm, and extends CappedHyperSphereNorm to optionally apply a learnable gain (to align better with existing HyperSphereNorm configuration patterns).

Changes:

Add optional hsnorm_gain support to CappedHyperSphereNorm so it can apply a learnable per-channel gain.
Add a new YAML sweep (relu2max_capped_hypersphere_peri_ln.yaml) to compare capped hypersphere norms vs RMSNorm under peri-LN and pre-LN settings.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 8 comments.

File	Description
`variations/norm_variations.py`	Extends `CappedHyperSphereNorm` with optional gain, but currently misses other `hsnorm_*` behaviors used by the new sweep.
`explorations/relu2max_capped_hypersphere_peri_ln.yaml`	Introduces a new sweep configuration for peri-LN + ReLU2Max + capped hypersphere norms (with a couple of sweep-definition issues to fix).

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

+        ndim = config.n_embd
+
+        self.radius = math.sqrt(ndim)
+
+        if config.hsnorm_gain:
+            self.gain = nn.Parameter(torch.ones(ndim))
+        else:
+            self.gain = 1.0

    def forward(self, x):
        norms = x.norm(2, dim=-1, keepdim=True)
        scale = torch.where(norms > self.radius, self.radius / (norms + 1e-8), torch.ones_like(norms))
-        return x * scale
+        return x * scale * self.gain


+named_variation_groups:
+  - named_group: "wte_norm_var"
+    named_group_alternates: ["capped_rmsnorm", "capped_pair", "rmsnorm"]
+


+    named_group_variations:
+      - "wte_norm_var"
+  # Peri-LN WTE Norm


+    hsnorm_radius_learning: [true]
+    named_group_variations:
+      - "wte_norm_var"


+    hsnorm_radius_learning: [true]
+    named_group_variations:
+      - "wte_norm_var"


+      - "qk_norm"
+      - "peri_ln"
+      - "rotary"
+      - "relu2max"
+      - "infinite"
+      - "hd_150"


+      - "qk_norm"
+      - "peri_ln"
+      - "rotary"
+      - "relu2max"
+      - "infinite"
+      - "hd_150"


+      - "qk_norm"
+      - "pre_ln"
+      - "rotary"
+      - "relu2max"
+      - "infinite"
+      - "hd_150"


klei22 and others added 2 commits June 14, 2026 11:38

Add relu2max capped hypersphere exploration

c968bdb

Add sweep for relu2max and capped norm compat

f90c90c

klei22 requested review from Copilot and gkielian June 14, 2026 23:52

Copilot started reviewing on behalf of klei22 June 14, 2026 23:52 View session

Copilot AI reviewed Jun 14, 2026

View reviewed changes

Add muon adamw options

b65c3db

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add sweep for relu2max capped hsnorm peri ln compat#838

Add sweep for relu2max capped hsnorm peri ln compat#838
klei22 wants to merge 3 commits into
ReaLLMASIC:masterfrom
klei22:add_sweep_for_relu2max_capped_hsnorm_peri_ln_compat

klei22 commented Jun 14, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

klei22 commented Jun 14, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants