optimizer: gradients should not be stored as SYM_INT32 (compute-format ≠ storage-format)

## Conceptual finding

`SYM_INT32` is the framework's **compute** representation — an int32 mantissa plus one per-tensor float scale — used because integer kernels (matmul / conv / backward) are the only integer-math path we support. It is **not a storage format**: it occupies the same 4 bytes/element as `FLOAT32` but represents a fixed-point approximation (one scale for the whole tensor), so small-magnitude values lose precision that `FLOAT32` (per-value exponent) keeps.

Storing a **gradient** as `SYM_INT32` is therefore dominated on both axes:

- vs **FLOAT32** — identical footprint, strictly worse fidelity (one scale, fixed-point relative error).
- vs **SYM / ASYM** — no memory saving (those sub-byte-pack; `SYM_INT32` is full int32).

The only place a gradient legitimately takes `SYM_INT32` form is **transiently**, as an operand wire during backprop (dx/agrad feeding the next layer's integer backward — allocated int12 in `initGradTensor`, freed after the pass). The **persistent parameter gradients** (weightGrad/biasGrad) have no such reason.

## Current state

- `gradInitSymInt32` (`src/userApi/tensor/TensorApi.c:281`) stores parameter grads as `SYM_INT32` (`ODT_SYM_GRAD_QMAXBITS = 16`).
- The SGD `SYM_INT32` path (`src/optimizer/Sgd.c`) already dequantizes grad→float, steps in float, requantizes — i.e. it treats the grad as float internally, so the `SYM_INT32` storage buys nothing and adds a lossy round-trip.
- **#203** (sgdStepM grad write-back asymmetry) is a *symptom*: the two `SYM_INT32` SGD variants disagree on whether to requantize the grad back, precisely because "the grad in `SYM_INT32` after a step" has no well-defined meaning.

## Proposed direction (to design)

Gradients should be stored as `FLOAT32` (same size, better fidelity) or `SYM`/`ASYM` (if compression is wanted); the integer math stays a transient `SYM_INT32` step. Likely:

- parameter grads default to `FLOAT32` storage (retire / repurpose `gradInitSymInt32`'s `SYM_INT32` default);
- the optimizer consumes float grads directly (no grad dequant/requant), keeping the param-side quant handling;
- the two-width operand/grad contract (int12 operands / int16 grads) largely dissolves — if terminal grads are `FLOAT32`, the grad-width question disappears.

Open design questions: bit-width for `SYM`/`ASYM` grad storage if compression is chosen; whether param storage is affected (separate concern — params are forward operands); feasibility of an integer optimizer step. To be resolved in a dedicated design pass.

**Subsumes #203.**

## Relations

Part of the SYM completeness program #210; framework hardening under #137. Related: #221 (backwardQ not honored by grad-tensor allocation), #203 (symptom).

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

optimizer: gradients should not be stored as SYM_INT32 (compute-format ≠ storage-format) #261

Conceptual finding

Current state

Proposed direction (to design)

Relations

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Uh oh!

optimizer: gradients should not be stored as SYM_INT32 (compute-format ≠ storage-format) #261

Description

Conceptual finding

Current state

Proposed direction (to design)

Relations

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions