feat(layer): parametrize factory init, default to PyTorch parity (Issue C)#250
Merged
Conversation
… parity
The Conv1d/Linear/Conv1dTransposed factories hardcoded KAIMING_UNIFORM gain=sqrt(2) (He) for weights and zeroed biases - flagged in-code as deferred to 'Issue C'. That could not reproduce PyTorch's default init, so train-from-scratch parity demos diverged.
Add a weightInit_t {initScheme_t scheme; float gain;} field to each init struct (zero-init -> INIT_DEFAULT, mirroring the existing bias_t idiom). Route all three factories' weight+bias allocation through shared helpers initWeightTensor/initBiasTensor (new src/userApi/LayerCommon.c, now a compiled static lib).
Schemes: INIT_DEFAULT -> kaimingUniform(gain=sqrt(1/3), fan_in) = uniform(+/-1/sqrt(fan_in)), exactly PyTorch's kaiming_uniform_(a=sqrt(5)) weight default; bias (all schemes) -> uniform(+/-1/sqrt(fan_in)) per PyTorch; INIT_KAIMING_UNIFORM -> He (gain sqrt(2) default, overridable); INIT_XAVIER_UNIFORM -> Glorot (gain 1 default, overridable). Fan modes match PyTorch _calculate_fan_in_and_fan_out per layout (Conv1d in*k, Linear in, ConvT out*k).
TDD: seeded statistical default-bound + override value tests added to UnitTestConv1dApi / UnitTestLinear / UnitTestConv1dTransposedApi; mutation-verified non-vacuous. Existing structural tests unaffected; 62/62 ctest. Implements the deferred Issue C (distribution parametrization for PyTorch-compatible init).
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This was referenced Jun 26, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What
Parametrizes the factory weight/bias initialization and changes the default to match PyTorch's. Implements the deferred Issue C (distribution parametrization for PyTorch-compatible init) — the
gain=√2mismatch the factory code already flagged as "requires Issue C".Why
conv1dLayerInit/linearLayerInit/conv1dTransposedLayerInithardcodedKAIMING_UNIFORM gain=√2(He) for weights and zeroed biases. PyTorch's default iskaiming_uniform_(a=√5)weights (bound1/√fan_in) +uniform(±1/√fan_in)bias — a√6 ≈ 2.45×narrower weight scale. The framework targets PyTorch parity, so the default should match PyTorch.How
New
weightInit_t { initScheme_t scheme; float gain; }field on each init struct (zero-init →INIT_DEFAULT, mirroring the existingbias_tidiom). All three factories route weight+bias allocation through shared helpersinitWeightTensor/initBiasTensor(new compiledsrc/userApi/LayerCommon.c):INIT_DEFAULT(0)kaimingUniform(√(1/3), fan_in)=uniform(±1/√fan_in)= PyTorcha=√5INIT_KAIMING_UNIFORM√2(overridable via.gain)INIT_XAVIER_UNIFORM1(overridable)Bias is always
uniform(±1/√fan_in)(PyTorch convention). Fan modes match PyTorch's_calculate_fan_in_and_fan_outper layout (Conv1din·k, Linearin, ConvTout·k).LayerCommonchanged from a header-only INTERFACE target to a compiled static lib to host the shared helpers.TDD / verification
UnitTestConv1dApi,UnitTestLinear,UnitTestConv1dTransposedApi; mutation-verified non-vacuous (gain→√2 and bias→0 both flip them red).unit_test_debug); examples build clean against the static-lib change.Note
This is a framework-wide default init change (all factory-built models now init like PyTorch). It does not affect bit-parity (which loads trained weights). Surfaced while making the examples' train-from-scratch demos comparable — and notably, matching the init did not close the ECG demo gap, revealing a separate C-vs-PyTorch training-dynamics divergence (bit-parity tests inference only). That finding is written up separately for investigation; it does not block this change.
🤖 Generated with Claude Code