Goal
Make full-SYM training a composable property of the framework: requant primitives + a generic Quantization layer close the accumulator-range gap between SYM layers (Linear and LayerNorm emit raw accumulator-range mantissas, violating the int16 inter-layer norm); per-layer SYM arms (conv, pools, CE) complete the kernel inventory.
Fixed entry point: the SYM_INT32 -> SYM_INT32 requant primitive with int16-range mantissas (qMaxBits <= 16). M1 acceptance: the chain Linear -> Quant -> LayerNorm -> Quant -> Linear as a full-SYM training step (#192).
Research anchor: M. Deutel, F. Hannig, C. Mutschler, J. Teich, "On-Device Training of Fully Quantized Deep Neural Networks on Cortex-M Microcontrollers", IEEE TCAD 44(4) 2025, doi:10.1109/TCAD.2024.3484354, arXiv:2407.10734. Design decisions recorded 2026-06-11 (interactive design review, 4 rounds).
Roadmap
| Stage |
Content |
PRs |
| Immediate (parallel) |
#188 rename, conversionMatrix hygiene, CE-fwd guard, UBSan CI job |
188-rename-half-away, conversion-matrix-hygiene, ce-forward-dispatch-guard, ci-ubsan-overflow |
| M1 = #192 |
requant primitives (dynamic + fixed-scale) + goldgen module + gold suite; then Quantization layer + factory + opt-in validator + chain test |
192-requant-primitives then 192-quantization-layer |
| M2 = #45 |
#189 shared rescale helper first; Conv1d/ConvT1d SYM via own plan (LayerNorm pattern) |
plan, then 2-3 PRs |
| M3 |
full-SYM ECG example (first end-to-end proof) |
1 PR |
| M4 |
pools SYM (Max select / Avg scale-fold / Adaptive int-div) |
1 PR |
| M5 |
CE SYM arm + Softmax stabilization + full-SYM HAR with accuracy measurement |
2 PRs |
| parallel |
docs (mutation catalog, padding note, integer-only open problem, pool mechanics) |
1 PR |
Ordering: #188 and the matrix hygiene land before the requant primitives (clean names + guarded matrix); primitives before the layer; M2 needs the #189 helper; M3 needs M1+M2; M4/M5/docs parallelize against M2+.
Task list
Immediate tier (parallel):
M1 (#192):
M2 (#45):
M3–M5:
Parallel / unscheduled:
Scope boundary (agreed 2026-06-11)
This program builds framework infrastructure: per-layer SYM kernels (fwd+bwd), requant primitives, the Quantization layer, the overflow policy, and the opt-in validator. The FQT research stages — layer-selection and bitwidth questions (#140), forward-pass quantization composition (#141), fully quantized training evaluation (#142), gradient checkpointing (#4/#138/#139) — are Jan's thesis work and CONSUME these primitives. This program enables #140/#141/#142 and never closes them. #189 stays linked under #137 as Leo-authored framework hardening.
Relates to #137, #192.
🤖 Generated with Claude Code
Goal
Make full-SYM training a composable property of the framework: requant primitives + a generic Quantization layer close the accumulator-range gap between SYM layers (Linear and LayerNorm emit raw accumulator-range mantissas, violating the int16 inter-layer norm); per-layer SYM arms (conv, pools, CE) complete the kernel inventory.
Fixed entry point: the SYM_INT32 -> SYM_INT32 requant primitive with int16-range mantissas (qMaxBits <= 16). M1 acceptance: the chain
Linear -> Quant -> LayerNorm -> Quant -> Linearas a full-SYM training step (#192).Research anchor: M. Deutel, F. Hannig, C. Mutschler, J. Teich, "On-Device Training of Fully Quantized Deep Neural Networks on Cortex-M Microcontrollers", IEEE TCAD 44(4) 2025, doi:10.1109/TCAD.2024.3484354, arXiv:2407.10734. Design decisions recorded 2026-06-11 (interactive design review, 4 rounds).
Roadmap
188-rename-half-away,conversion-matrix-hygiene,ce-forward-dispatch-guard,ci-ubsan-overflow192-requant-primitivesthen192-quantization-layerOrdering: #188 and the matrix hygiene land before the requant primitives (clean names + guarded matrix); primitives before the layer; M2 needs the #189 helper; M3 needs M1+M2; M4/M5/docs parallelize against M2+.
Task list
Immediate tier (parallel):
HTE->HALF_AWAY,roundHTE->roundHalfAwayM1 (#192):
M2 (#45):
M3–M5:
Parallel / unscheduled:
Scope boundary (agreed 2026-06-11)
This program builds framework infrastructure: per-layer SYM kernels (fwd+bwd), requant primitives, the Quantization layer, the overflow policy, and the opt-in validator. The FQT research stages — layer-selection and bitwidth questions (#140), forward-pass quantization composition (#141), fully quantized training evaluation (#142), gradient checkpointing (#4/#138/#139) — are Jan's thesis work and CONSUME these primitives. This program enables #140/#141/#142 and never closes them. #189 stays linked under #137 as Leo-authored framework hardening.
Relates to #137, #192.
🤖 Generated with Claude Code