Finding
The native backend's selling point is logprob consistency, yet no test asserts areno_varlen_causal_attention / areno_paged_causal_attention_decode match flash-attn or SDPA numerically. The only test on the used path stubs fake_varlen and checks shapes, not math.
Acceptance
- Add a GPU-guarded equivalence test (skips when no GPU) comparing used kernels against a reference.
- CPU CI remains unaffected.
Finding
The native backend's selling point is logprob consistency, yet no test asserts
areno_varlen_causal_attention/areno_paged_causal_attention_decodematch flash-attn or SDPA numerically. The only test on the used path stubsfake_varlenand checks shapes, not math.Acceptance