feat(examples/kws_raw): per-conv LayerNorm (10-seed-validated) replacing fragile end-LN by LeoBuron · Pull Request #259 · es-ude/OnDeviceTraining

LeoBuron · 2026-06-28T23:48:14Z

Revises the kws_raw model after a 10-seed config sweep showed the previously-shipped end-feature LayerNorm(64) was the worst, least-stable choice.

Why

A 10-seed × 3-placement × 3-lr sweep (50 epochs each):

placement	mean ± std test_acc	seeds converged
no LayerNorm	0.70 ± 0.02	10/10
LayerNorm(64) after pooling (was shipped)	0.47 ± 0.25	~6/10
per-conv `LayerNorm([C,L])`	0.72 ± 0.01	10/10

The shipped end-feature LayerNorm collapses to a one-class reference on ~40% of seeds (the original seed-42 pick was lucky → a fragile/degenerate gate). Per-conv LayerNorm — one over each conv's full [C,L] feature map, pre-ReLU — converges reliably and highest. (No-LayerNorm also trains fine at 50 epochs; the raw model was never un-trainable, just slow. LayerNorm is kept as the framework's only bit-parity-covered normalizer + to exercise it.)

Change

Model: AvgPool1d(16) → 3× [Conv1d(K3,SAME) → LayerNorm([C,L]) → ReLU → MaxPool(4)] → AdaptiveAvgPool1d(1) → Flatten → Linear. LayerNorm shapes [16,1000], [32,250], [64,62]. lr=0.005, 50 epochs.
C: MODEL_SIZE 15→17, three layerNormLayerInit(numNormDims=2, eps=1e-5), 7-entry state-dict {conv1,ln1,conv2,ln2,conv3,ln3,fc}.
Gate: BIT_PARITY=1 C int32 predictions bit-identical to PyTorch (2483/2483), diverse across all 6 classes, test_acc 0.721 — the first bit-parity exercise of a multi-dim [C,L] LayerNorm in an example.
README rewritten with the sweep table.

Supersedes the end-LN config from #256. The shipped gate was seed-independent (loads fixed weights) so this is a quality/robustness fix, not a correctness bug.

🤖 Generated with Claude Code

…ing fragile end-LN A 10-seed x 3-placement x 3-lr sweep (50 epochs) showed the previously-shipped end-feature LayerNorm(64) is the WORST option: 0.47 +/- 0.25 test_acc, collapsing to a one-class reference on ~40% of seeds (the original seed-42 pick was lucky). Per-conv LayerNorm([C,L]) over each conv's full feature map (pre-ReLU) is the best and most stable: 0.72 +/- 0.01, all 10 seeds converge across all 6 classes. (Plain no-LayerNorm also trains fine at 50 epochs, 0.70 +/- 0.02 — the raw model was never un-trainable, just slow; LayerNorm is kept as the framework's bit-parity-covered normalizer and to exercise it in the gate.) Model: 3x [Conv1d -> LayerNorm([C,L]) -> ReLU -> MaxPool(4)] (shapes [16,1000], [32,250], [64,62]), lr=0.005, 50 epochs. C: MODEL_SIZE 15->17, three layerNormLayerInit(numNormDims=2, eps=1e-5) at model[2]/[6]/[10], 7-entry state-dict {conv1,ln1,conv2,ln2,conv3,ln3,fc}. Gate PASSES bit-identical (2483/2483) with diverse predictions across all 6 classes -- first bit-parity exercise of a multi-dim [C,L] LayerNorm in an example.

LeoBuron merged commit 062e7a3 into develop Jun 29, 2026
8 checks passed

LeoBuron deleted the examples-kws-raw-perconv branch June 29, 2026 00:00

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat(examples/kws_raw): per-conv LayerNorm (10-seed-validated) replacing fragile end-LN#259

feat(examples/kws_raw): per-conv LayerNorm (10-seed-validated) replacing fragile end-LN#259
LeoBuron merged 1 commit into
developfrom
examples-kws-raw-perconv

LeoBuron commented Jun 28, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

LeoBuron commented Jun 28, 2026

Why

Change

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant