Skip to content

Add GPU-guarded numerical equivalence tests for native attention kernels #81

Description

@adohe

Finding

The native backend's selling point is logprob consistency, yet no test asserts areno_varlen_causal_attention / areno_paged_causal_attention_decode match flash-attn or SDPA numerically. The only test on the used path stubs fake_varlen and checks shapes, not math.

Acceptance

  • Add a GPU-guarded equivalence test (skips when no GPU) comparing used kernels against a reference.
  • CPU CI remains unaffected.

Metadata

Metadata

Assignees

No one assigned

    Labels

    area/accelIssues or PRs related to CUDA kernels and fused operatorsarea/testingIssues or PRs related to the test suite and test infrastructurekind/featureCategorizes issue or PR as related to a new feature

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions