You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
SYM_INT32 is the framework's compute representation — an int32 mantissa plus one per-tensor float scale — used because integer kernels (matmul / conv / backward) are the only integer-math path we support. It is not a storage format: it occupies the same 4 bytes/element as FLOAT32 but represents a fixed-point approximation (one scale for the whole tensor), so small-magnitude values lose precision that FLOAT32 (per-value exponent) keeps.
Storing a gradient as SYM_INT32 is therefore dominated on both axes:
vs SYM / ASYM — no memory saving (those sub-byte-pack; SYM_INT32 is full int32).
The only place a gradient legitimately takes SYM_INT32 form is transiently, as an operand wire during backprop (dx/agrad feeding the next layer's integer backward — allocated int12 in initGradTensor, freed after the pass). The persistent parameter gradients (weightGrad/biasGrad) have no such reason.
Current state
gradInitSymInt32 (src/userApi/tensor/TensorApi.c:281) stores parameter grads as SYM_INT32 (ODT_SYM_GRAD_QMAXBITS = 16).
The SGD SYM_INT32 path (src/optimizer/Sgd.c) already dequantizes grad→float, steps in float, requantizes — i.e. it treats the grad as float internally, so the SYM_INT32 storage buys nothing and adds a lossy round-trip.
Gradients should be stored as FLOAT32 (same size, better fidelity) or SYM/ASYM (if compression is wanted); the integer math stays a transient SYM_INT32 step. Likely:
the optimizer consumes float grads directly (no grad dequant/requant), keeping the param-side quant handling;
the two-width operand/grad contract (int12 operands / int16 grads) largely dissolves — if terminal grads are FLOAT32, the grad-width question disappears.
Open design questions: bit-width for SYM/ASYM grad storage if compression is chosen; whether param storage is affected (separate concern — params are forward operands); feasibility of an integer optimizer step. To be resolved in a dedicated design pass.
Part of the SYM completeness program #210; framework hardening under #137. Related: #221 (backwardQ not honored by grad-tensor allocation), #203 (symptom).
Conceptual finding
SYM_INT32is the framework's compute representation — an int32 mantissa plus one per-tensor float scale — used because integer kernels (matmul / conv / backward) are the only integer-math path we support. It is not a storage format: it occupies the same 4 bytes/element asFLOAT32but represents a fixed-point approximation (one scale for the whole tensor), so small-magnitude values lose precision thatFLOAT32(per-value exponent) keeps.Storing a gradient as
SYM_INT32is therefore dominated on both axes:SYM_INT32is full int32).The only place a gradient legitimately takes
SYM_INT32form is transiently, as an operand wire during backprop (dx/agrad feeding the next layer's integer backward — allocated int12 ininitGradTensor, freed after the pass). The persistent parameter gradients (weightGrad/biasGrad) have no such reason.Current state
gradInitSymInt32(src/userApi/tensor/TensorApi.c:281) stores parameter grads asSYM_INT32(ODT_SYM_GRAD_QMAXBITS = 16).SYM_INT32path (src/optimizer/Sgd.c) already dequantizes grad→float, steps in float, requantizes — i.e. it treats the grad as float internally, so theSYM_INT32storage buys nothing and adds a lossy round-trip.SYM_INT32SGD variants disagree on whether to requantize the grad back, precisely because "the grad inSYM_INT32after a step" has no well-defined meaning.Proposed direction (to design)
Gradients should be stored as
FLOAT32(same size, better fidelity) orSYM/ASYM(if compression is wanted); the integer math stays a transientSYM_INT32step. Likely:FLOAT32storage (retire / repurposegradInitSymInt32'sSYM_INT32default);FLOAT32, the grad-width question disappears.Open design questions: bit-width for
SYM/ASYMgrad storage if compression is chosen; whether param storage is affected (separate concern — params are forward operands); feasibility of an integer optimizer step. To be resolved in a dedicated design pass.Subsumes #203.
Relations
Part of the SYM completeness program #210; framework hardening under #137. Related: #221 (backwardQ not honored by grad-tensor allocation), #203 (symptom).
🤖 Generated with Claude Code