You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Part of WS1 — Full Batch-Invariant Forward Chain (epic: #)
Why
The input embedding lookup and the final vocab projection bracket the network. The LM head is a large matmul sitting directly upstream of logprob, so any batch-dependent reduction there lands in the logprobs immediately. The embedding lookup is simpler but must be confirmed not to branch on batch shape. Both must be on the batch-invariant path or the chain is not actually closed.
Scope
Confirm the input embedding and the LM-head projection run batch-invariantly.
Confirm the embedding lookup (gather) produces identical vectors for a token regardless of batch size, position, or padding (no shape-dependent path).
Route the LM-head vocab projection through the batch-invariant GEMM (the matmul issue), with a fixed K-accumulation order over the hidden dimension.
Confirm tied-embedding weight sharing (if used by the model) does not change the reduction path between the two.
Part of WS1 — Full Batch-Invariant Forward Chain (epic: #)
Why
The input embedding lookup and the final vocab projection bracket the network. The LM head is a large matmul sitting directly upstream of logprob, so any batch-dependent reduction there lands in the logprobs immediately. The embedding lookup is simpler but must be confirmed not to branch on batch shape. Both must be on the batch-invariant path or the chain is not actually closed.
Scope
Confirm the input embedding and the LM-head projection run batch-invariantly.
Out of scope
Acceptance criteria
Notes
Planned PRs