SYM completeness: requant primitives + Quantization layer + layer SYM arms

## Goal

Make full-SYM training a composable property of the framework: requant primitives + a generic Quantization layer close the accumulator-range gap between SYM layers (Linear and LayerNorm emit raw accumulator-range mantissas, violating the int16 inter-layer norm); per-layer SYM arms (conv, pools, CE) complete the kernel inventory.

**Fixed entry point:** the SYM_INT32 -> SYM_INT32 requant primitive with int16-range mantissas (qMaxBits <= 16). **M1 acceptance:** the chain `Linear -> Quant -> LayerNorm -> Quant -> Linear` as a full-SYM training step (#192).
**Research anchor:** M. Deutel, F. Hannig, C. Mutschler, J. Teich, "On-Device Training of Fully Quantized Deep Neural Networks on Cortex-M Microcontrollers", IEEE TCAD 44(4) 2025, doi:10.1109/TCAD.2024.3484354, arXiv:2407.10734. Design decisions recorded 2026-06-11 (interactive design review, 4 rounds).

## Roadmap

| Stage | Content | PRs |
|---|---|---|
| Immediate (parallel) | #188 rename, conversionMatrix hygiene, CE-fwd guard, UBSan CI job | `188-rename-half-away`, `conversion-matrix-hygiene`, `ce-forward-dispatch-guard`, `ci-ubsan-overflow` |
| **M1 = #192** | requant primitives (dynamic + fixed-scale) + goldgen module + gold suite; then Quantization layer + factory + opt-in validator + chain test | `192-requant-primitives` then `192-quantization-layer` |
| **M2 = #45** | #189 shared rescale helper first; Conv1d/ConvT1d SYM via own plan (LayerNorm pattern) | plan, then 2-3 PRs |
| **M3** | full-SYM ECG example (first end-to-end proof) | 1 PR |
| **M4** | pools SYM (Max select / Avg scale-fold / Adaptive int-div) | 1 PR |
| **M5** | CE SYM arm + Softmax stabilization + full-SYM HAR with accuracy measurement | 2 PRs |
| parallel | docs (mutation catalog, padding note, integer-only open problem, pool mechanics) | 1 PR |

Ordering: #188 and the matrix hygiene land before the requant primitives (clean names + guarded matrix); primitives before the layer; M2 needs the #189 helper; M3 needs M1+M2; M4/M5/docs parallelize against M2+.

## Task list

**Immediate tier (parallel):**
- [ ] #188 — rename `HTE` -> `HALF_AWAY`, `roundHTE` -> `roundHalfAway`
- [ ] #199 — conversionMatrix hygiene
- [ ] #200 — CrossEntropy forward fail-fast guard
- [ ] #204 — CI UBSan job

**M1 (#192):**
- [ ] #192 — requant primitives, Quantization layer, factory, opt-in validator, full-SYM chain test

**M2 (#45):**
- [ ] #189 — overflow policy + shared rescale helper + flag-gated semantic guards (prerequisite)
- [ ] #45 — Conv1d/ConvT1d SYM forward + backward (own plan)

**M3–M5:**
- [ ] #207 — full-SYM ECG example
- [ ] #205 — pools SYM arms
- [ ] #206 — CrossEntropy SYM arm + accuracy measurement
- [ ] #201 — Softmax SYM max-subtraction (at the latest in M5)

**Parallel / unscheduled:**
- [ ] #209 — CONVENTIONS.md docs batch
- [ ] #208 — goldgen migration + C test helpers + MORE_SOURCES removal
- [ ] #186 — Serialize/Deserialize QUANTIZATION case (third missing case)
- [ ] #202 — qMaxBits=32 UB cast (unscheduled)
- [ ] #203 — sgdStepM grad write-back asymmetry (unscheduled)

## Scope boundary (agreed 2026-06-11)

This program builds **framework infrastructure**: per-layer SYM kernels (fwd+bwd), requant primitives, the Quantization layer, the overflow policy, and the opt-in validator. The FQT research stages — layer-selection and bitwidth questions (#140), forward-pass quantization composition (#141), fully quantized training evaluation (#142), gradient checkpointing (#4/#138/#139) — are Jan's thesis work and CONSUME these primitives. **This program enables #140/#141/#142 and never closes them.** #189 stays linked under #137 as Leo-authored framework hardening.

Relates to #137, #192.

🤖 Generated with [Claude Code](https://claude.com/claude-code)


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

SYM completeness: requant primitives + Quantization layer + layer SYM arms #210

Goal

Roadmap

Task list

Scope boundary (agreed 2026-06-11)

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Stage	Content	PRs
Immediate (parallel)	#188 rename, conversionMatrix hygiene, CE-fwd guard, UBSan CI job	`188-rename-half-away`, `conversion-matrix-hygiene`, `ce-forward-dispatch-guard`, `ci-ubsan-overflow`
M1 = #192	requant primitives (dynamic + fixed-scale) + goldgen module + gold suite; then Quantization layer + factory + opt-in validator + chain test	`192-requant-primitives` then `192-quantization-layer`
M2 = #45	#189 shared rescale helper first; Conv1d/ConvT1d SYM via own plan (LayerNorm pattern)	plan, then 2-3 PRs
M3	full-SYM ECG example (first end-to-end proof)	1 PR
M4	pools SYM (Max select / Avg scale-fold / Adaptive int-div)	1 PR
M5	CE SYM arm + Softmax stabilization + full-SYM HAR with accuracy measurement	2 PRs
parallel	docs (mutation catalog, padding note, integer-only open problem, pool mechanics)	1 PR

Uh oh!

SYM completeness: requant primitives + Quantization layer + layer SYM arms #210

Description

Goal

Roadmap

Task list

Scope boundary (agreed 2026-06-11)

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions