feat(layer): parametrize factory init, default to PyTorch parity (Issue C) by LeoBuron · Pull Request #250 · es-ude/OnDeviceTraining

LeoBuron · 2026-06-26T09:43:40Z

What

Parametrizes the factory weight/bias initialization and changes the default to match PyTorch's. Implements the deferred Issue C (distribution parametrization for PyTorch-compatible init) — the gain=√2 mismatch the factory code already flagged as "requires Issue C".

Why

conv1dLayerInit / linearLayerInit / conv1dTransposedLayerInit hardcoded KAIMING_UNIFORM gain=√2 (He) for weights and zeroed biases. PyTorch's default is kaiming_uniform_(a=√5) weights (bound 1/√fan_in) + uniform(±1/√fan_in) bias — a √6 ≈ 2.45× narrower weight scale. The framework targets PyTorch parity, so the default should match PyTorch.

How

New weightInit_t { initScheme_t scheme; float gain; } field on each init struct (zero-init → INIT_DEFAULT, mirroring the existing bias_t idiom). All three factories route weight+bias allocation through shared helpers initWeightTensor / initBiasTensor (new compiled src/userApi/LayerCommon.c):

scheme	weight
`INIT_DEFAULT` (0)	`kaimingUniform(√(1/3), fan_in)` = `uniform(±1/√fan_in)` = PyTorch `a=√5`
`INIT_KAIMING_UNIFORM`	He, gain `√2` (overridable via `.gain`)
`INIT_XAVIER_UNIFORM`	Glorot, gain `1` (overridable)

Bias is always uniform(±1/√fan_in) (PyTorch convention). Fan modes match PyTorch's _calculate_fan_in_and_fan_out per layout (Conv1d in·k, Linear in, ConvT out·k). LayerCommon changed from a header-only INTERFACE target to a compiled static lib to host the shared helpers.

TDD / verification

Seeded statistical value tests (default bound + explicit He override) added to UnitTestConv1dApi, UnitTestLinear, UnitTestConv1dTransposedApi; mutation-verified non-vacuous (gain→√2 and bias→0 both flip them red).
Existing structural factory tests unaffected.
62/62 ctest (unit_test_debug); examples build clean against the static-lib change.

Note

This is a framework-wide default init change (all factory-built models now init like PyTorch). It does not affect bit-parity (which loads trained weights). Surfaced while making the examples' train-from-scratch demos comparable — and notably, matching the init did not close the ECG demo gap, revealing a separate C-vs-PyTorch training-dynamics divergence (bit-parity tests inference only). That finding is written up separately for investigation; it does not block this change.

🤖 Generated with Claude Code

… parity The Conv1d/Linear/Conv1dTransposed factories hardcoded KAIMING_UNIFORM gain=sqrt(2) (He) for weights and zeroed biases - flagged in-code as deferred to 'Issue C'. That could not reproduce PyTorch's default init, so train-from-scratch parity demos diverged. Add a weightInit_t {initScheme_t scheme; float gain;} field to each init struct (zero-init -> INIT_DEFAULT, mirroring the existing bias_t idiom). Route all three factories' weight+bias allocation through shared helpers initWeightTensor/initBiasTensor (new src/userApi/LayerCommon.c, now a compiled static lib). Schemes: INIT_DEFAULT -> kaimingUniform(gain=sqrt(1/3), fan_in) = uniform(+/-1/sqrt(fan_in)), exactly PyTorch's kaiming_uniform_(a=sqrt(5)) weight default; bias (all schemes) -> uniform(+/-1/sqrt(fan_in)) per PyTorch; INIT_KAIMING_UNIFORM -> He (gain sqrt(2) default, overridable); INIT_XAVIER_UNIFORM -> Glorot (gain 1 default, overridable). Fan modes match PyTorch _calculate_fan_in_and_fan_out per layout (Conv1d in*k, Linear in, ConvT out*k). TDD: seeded statistical default-bound + override value tests added to UnitTestConv1dApi / UnitTestLinear / UnitTestConv1dTransposedApi; mutation-verified non-vacuous. Existing structural tests unaffected; 62/62 ctest. Implements the deferred Issue C (distribution parametrization for PyTorch-compatible init). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

LeoBuron merged commit 233ee5e into develop Jun 26, 2026
8 checks passed

LeoBuron deleted the issueC-init-parametrization branch June 26, 2026 09:46

This was referenced Jun 26, 2026

refactor(examples): consolidate v1/v2 into one factory-API trainer per example #251

Merged

ci(examples): build full 'all' target in bit-parity job (rot guard) #252

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat(layer): parametrize factory init, default to PyTorch parity (Issue C)#250

feat(layer): parametrize factory init, default to PyTorch parity (Issue C)#250
LeoBuron merged 1 commit into
developfrom
issueC-init-parametrization

LeoBuron commented Jun 26, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

LeoBuron commented Jun 26, 2026

What

Why

How

TDD / verification

Note

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant