Skip to content

SYM completeness: requant primitives + Quantization layer + layer SYM arms #210

Description

@LeoBuron

Goal

Make full-SYM training a composable property of the framework: requant primitives + a generic Quantization layer close the accumulator-range gap between SYM layers (Linear and LayerNorm emit raw accumulator-range mantissas, violating the int16 inter-layer norm); per-layer SYM arms (conv, pools, CE) complete the kernel inventory.

Fixed entry point: the SYM_INT32 -> SYM_INT32 requant primitive with int16-range mantissas (qMaxBits <= 16). M1 acceptance: the chain Linear -> Quant -> LayerNorm -> Quant -> Linear as a full-SYM training step (#192).
Research anchor: M. Deutel, F. Hannig, C. Mutschler, J. Teich, "On-Device Training of Fully Quantized Deep Neural Networks on Cortex-M Microcontrollers", IEEE TCAD 44(4) 2025, doi:10.1109/TCAD.2024.3484354, arXiv:2407.10734. Design decisions recorded 2026-06-11 (interactive design review, 4 rounds).

Roadmap

Stage Content PRs
Immediate (parallel) #188 rename, conversionMatrix hygiene, CE-fwd guard, UBSan CI job 188-rename-half-away, conversion-matrix-hygiene, ce-forward-dispatch-guard, ci-ubsan-overflow
M1 = #192 requant primitives (dynamic + fixed-scale) + goldgen module + gold suite; then Quantization layer + factory + opt-in validator + chain test 192-requant-primitives then 192-quantization-layer
M2 = #45 #189 shared rescale helper first; Conv1d/ConvT1d SYM via own plan (LayerNorm pattern) plan, then 2-3 PRs
M3 full-SYM ECG example (first end-to-end proof) 1 PR
M4 pools SYM (Max select / Avg scale-fold / Adaptive int-div) 1 PR
M5 CE SYM arm + Softmax stabilization + full-SYM HAR with accuracy measurement 2 PRs
parallel docs (mutation catalog, padding note, integer-only open problem, pool mechanics) 1 PR

Ordering: #188 and the matrix hygiene land before the requant primitives (clean names + guarded matrix); primitives before the layer; M2 needs the #189 helper; M3 needs M1+M2; M4/M5/docs parallelize against M2+.

Task list

Immediate tier (parallel):

M1 (#192):

M2 (#45):

M3–M5:

Parallel / unscheduled:

Scope boundary (agreed 2026-06-11)

This program builds framework infrastructure: per-layer SYM kernels (fwd+bwd), requant primitives, the Quantization layer, the overflow policy, and the opt-in validator. The FQT research stages — layer-selection and bitwidth questions (#140), forward-pass quantization composition (#141), fully quantized training evaluation (#142), gradient checkpointing (#4/#138/#139) — are Jan's thesis work and CONSUME these primitives. This program enables #140/#141/#142 and never closes them. #189 stays linked under #137 as Leo-authored framework hardening.

Relates to #137, #192.

🤖 Generated with Claude Code

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type

    Fields

    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions