Skip to content

feat(examples/kws_raw): per-conv LayerNorm (10-seed-validated) replacing fragile end-LN#259

Merged
LeoBuron merged 1 commit into
developfrom
examples-kws-raw-perconv
Jun 29, 2026
Merged

feat(examples/kws_raw): per-conv LayerNorm (10-seed-validated) replacing fragile end-LN#259
LeoBuron merged 1 commit into
developfrom
examples-kws-raw-perconv

Conversation

@LeoBuron

Copy link
Copy Markdown
Member

Revises the kws_raw model after a 10-seed config sweep showed the previously-shipped end-feature LayerNorm(64) was the worst, least-stable choice.

Why

A 10-seed × 3-placement × 3-lr sweep (50 epochs each):

placement mean ± std test_acc seeds converged
no LayerNorm 0.70 ± 0.02 10/10
LayerNorm(64) after pooling (was shipped) 0.47 ± 0.25 ~6/10
per-conv LayerNorm([C,L]) 0.72 ± 0.01 10/10

The shipped end-feature LayerNorm collapses to a one-class reference on ~40% of seeds (the original seed-42 pick was lucky → a fragile/degenerate gate). Per-conv LayerNorm — one over each conv's full [C,L] feature map, pre-ReLU — converges reliably and highest. (No-LayerNorm also trains fine at 50 epochs; the raw model was never un-trainable, just slow. LayerNorm is kept as the framework's only bit-parity-covered normalizer + to exercise it.)

Change

  • Model: AvgPool1d(16) → 3× [Conv1d(K3,SAME) → LayerNorm([C,L]) → ReLU → MaxPool(4)] → AdaptiveAvgPool1d(1) → Flatten → Linear. LayerNorm shapes [16,1000], [32,250], [64,62]. lr=0.005, 50 epochs.
  • C: MODEL_SIZE 15→17, three layerNormLayerInit(numNormDims=2, eps=1e-5), 7-entry state-dict {conv1,ln1,conv2,ln2,conv3,ln3,fc}.
  • Gate: BIT_PARITY=1 C int32 predictions bit-identical to PyTorch (2483/2483), diverse across all 6 classes, test_acc 0.721 — the first bit-parity exercise of a multi-dim [C,L] LayerNorm in an example.
  • README rewritten with the sweep table.

Supersedes the end-LN config from #256. The shipped gate was seed-independent (loads fixed weights) so this is a quality/robustness fix, not a correctness bug.

🤖 Generated with Claude Code

…ing fragile end-LN

A 10-seed x 3-placement x 3-lr sweep (50 epochs) showed the previously-shipped
end-feature LayerNorm(64) is the WORST option: 0.47 +/- 0.25 test_acc, collapsing
to a one-class reference on ~40% of seeds (the original seed-42 pick was lucky).
Per-conv LayerNorm([C,L]) over each conv's full feature map (pre-ReLU) is the best
and most stable: 0.72 +/- 0.01, all 10 seeds converge across all 6 classes. (Plain
no-LayerNorm also trains fine at 50 epochs, 0.70 +/- 0.02 — the raw model was never
un-trainable, just slow; LayerNorm is kept as the framework's bit-parity-covered
normalizer and to exercise it in the gate.)

Model: 3x [Conv1d -> LayerNorm([C,L]) -> ReLU -> MaxPool(4)] (shapes [16,1000],
[32,250], [64,62]), lr=0.005, 50 epochs. C: MODEL_SIZE 15->17, three
layerNormLayerInit(numNormDims=2, eps=1e-5) at model[2]/[6]/[10], 7-entry
state-dict {conv1,ln1,conv2,ln2,conv3,ln3,fc}. Gate PASSES bit-identical (2483/2483)
with diverse predictions across all 6 classes -- first bit-parity exercise of a
multi-dim [C,L] LayerNorm in an example.
@LeoBuron LeoBuron merged commit 062e7a3 into develop Jun 29, 2026
8 checks passed
@LeoBuron LeoBuron deleted the examples-kws-raw-perconv branch June 29, 2026 00:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant