[WS1][kernels] Batch-invariant embedding + LM head projection

Part of WS1 — Full Batch-Invariant Forward Chain (epic: #<WS1 tracking issue>)

## Why

The input embedding lookup and the final vocab projection bracket the network. The LM head is a large matmul sitting directly upstream of logprob, so any batch-dependent reduction there lands in the logprobs immediately. The embedding lookup is simpler but must be confirmed not to branch on batch shape. Both must be on the batch-invariant path or the chain is not actually closed.

## Scope

Confirm the input embedding and the LM-head projection run batch-invariantly.

- Confirm the embedding lookup (gather) produces identical vectors for a token regardless of batch size, position, or padding (no shape-dependent path).
- Route the LM-head vocab projection through the batch-invariant GEMM (the matmul issue), with a fixed K-accumulation order over the hidden dimension.
- Confirm tied-embedding weight sharing (if used by the model) does not change the reduction path between the two.
- Validate both against the #108 harness across the standard sweep, with particular attention to the LM-head -> logprob handoff.

## Out of scope

- The GEMM kernel itself (covered by the matmul issue; this issue consumes it).
- The logprob reduction (covered by the logprob issue; this issue feeds it).
- Vocab-parallel distributed LM head; weight-tying policy changes.
- FP8.

## Acceptance criteria

- Embedding output for a fixed token is bitwise-identical across all batch configs in the sweep.
- LM-head logits for a fixed position are identical (within #108 tolerance) across batch=1/N, chunked-prefill on/off, padding layouts.
- The LM-head -> logprob path is verified jointly: the logits handed to logprob do not drift with batch shape.
- Embedding / LM-head backward paths (incl. tied-weight gradient accumulation, if used) pass the shared gradient-invariance check from the WS1 backward-consistency issue.
- Both ops pass the #108 shared test helper.

## Notes

- Depends on #108 and on the batch-invariant matmul issue (the LM head routes through it).
- Pairs tightly with the logprob issue — drift here is the most direct cause of logprob drift, so validate the two together.

## Planned PRs

- [ ] Embedding-lookup invariance tests (fixed token/position -> identical vector)
- [ ] Route the LM-head vocab projection through the deterministic GEMM
- [ ] Tied-weight sharing consistency check (if the model uses tied embeddings)
- [ ] LM-head -> logprob handoff test
- [ ] Embedding / LM-head backward invariance via the shared gradient check; wire through #108 harness

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[WS1][kernels] Batch-invariant embedding + LM head projection #151

Why

Scope

Out of scope

Acceptance criteria

Notes

Planned PRs

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Uh oh!

[WS1][kernels] Batch-invariant embedding + LM head projection #151

Description

Why

Scope

Out of scope

Acceptance criteria

Notes

Planned PRs

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions