Skip to content

feat(layer): parametrize factory init, default to PyTorch parity (Issue C)#250

Merged
LeoBuron merged 1 commit into
developfrom
issueC-init-parametrization
Jun 26, 2026
Merged

feat(layer): parametrize factory init, default to PyTorch parity (Issue C)#250
LeoBuron merged 1 commit into
developfrom
issueC-init-parametrization

Conversation

@LeoBuron

Copy link
Copy Markdown
Member

What

Parametrizes the factory weight/bias initialization and changes the default to match PyTorch's. Implements the deferred Issue C (distribution parametrization for PyTorch-compatible init) — the gain=√2 mismatch the factory code already flagged as "requires Issue C".

Why

conv1dLayerInit / linearLayerInit / conv1dTransposedLayerInit hardcoded KAIMING_UNIFORM gain=√2 (He) for weights and zeroed biases. PyTorch's default is kaiming_uniform_(a=√5) weights (bound 1/√fan_in) + uniform(±1/√fan_in) bias — a √6 ≈ 2.45× narrower weight scale. The framework targets PyTorch parity, so the default should match PyTorch.

How

New weightInit_t { initScheme_t scheme; float gain; } field on each init struct (zero-init → INIT_DEFAULT, mirroring the existing bias_t idiom). All three factories route weight+bias allocation through shared helpers initWeightTensor / initBiasTensor (new compiled src/userApi/LayerCommon.c):

scheme weight
INIT_DEFAULT (0) kaimingUniform(√(1/3), fan_in) = uniform(±1/√fan_in) = PyTorch a=√5
INIT_KAIMING_UNIFORM He, gain √2 (overridable via .gain)
INIT_XAVIER_UNIFORM Glorot, gain 1 (overridable)

Bias is always uniform(±1/√fan_in) (PyTorch convention). Fan modes match PyTorch's _calculate_fan_in_and_fan_out per layout (Conv1d in·k, Linear in, ConvT out·k). LayerCommon changed from a header-only INTERFACE target to a compiled static lib to host the shared helpers.

TDD / verification

  • Seeded statistical value tests (default bound + explicit He override) added to UnitTestConv1dApi, UnitTestLinear, UnitTestConv1dTransposedApi; mutation-verified non-vacuous (gain→√2 and bias→0 both flip them red).
  • Existing structural factory tests unaffected.
  • 62/62 ctest (unit_test_debug); examples build clean against the static-lib change.

Note

This is a framework-wide default init change (all factory-built models now init like PyTorch). It does not affect bit-parity (which loads trained weights). Surfaced while making the examples' train-from-scratch demos comparable — and notably, matching the init did not close the ECG demo gap, revealing a separate C-vs-PyTorch training-dynamics divergence (bit-parity tests inference only). That finding is written up separately for investigation; it does not block this change.

🤖 Generated with Claude Code

… parity

The Conv1d/Linear/Conv1dTransposed factories hardcoded KAIMING_UNIFORM gain=sqrt(2) (He) for weights and zeroed biases - flagged in-code as deferred to 'Issue C'. That could not reproduce PyTorch's default init, so train-from-scratch parity demos diverged.

Add a weightInit_t {initScheme_t scheme; float gain;} field to each init struct (zero-init -> INIT_DEFAULT, mirroring the existing bias_t idiom). Route all three factories' weight+bias allocation through shared helpers initWeightTensor/initBiasTensor (new src/userApi/LayerCommon.c, now a compiled static lib).

Schemes: INIT_DEFAULT -> kaimingUniform(gain=sqrt(1/3), fan_in) = uniform(+/-1/sqrt(fan_in)), exactly PyTorch's kaiming_uniform_(a=sqrt(5)) weight default; bias (all schemes) -> uniform(+/-1/sqrt(fan_in)) per PyTorch; INIT_KAIMING_UNIFORM -> He (gain sqrt(2) default, overridable); INIT_XAVIER_UNIFORM -> Glorot (gain 1 default, overridable). Fan modes match PyTorch _calculate_fan_in_and_fan_out per layout (Conv1d in*k, Linear in, ConvT out*k).

TDD: seeded statistical default-bound + override value tests added to UnitTestConv1dApi / UnitTestLinear / UnitTestConv1dTransposedApi; mutation-verified non-vacuous. Existing structural tests unaffected; 62/62 ctest. Implements the deferred Issue C (distribution parametrization for PyTorch-compatible init).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@LeoBuron LeoBuron merged commit 233ee5e into develop Jun 26, 2026
8 checks passed
@LeoBuron LeoBuron deleted the issueC-init-parametrization branch June 26, 2026 09:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant