Skip to content

Clean up native attention nits (cc threshold comment, duplicate reason, stale docstrings) #84

Description

@adohe

Items

  • FLASH_ATTENTION_MIN_CUDA_CAPABILITY = (8, 0) disables flash on all Turing (sm_75/T4) though flash-attn 2.x forward supports some head dims on Turing — add a comment explaining why 8.0 not 7.5.
  • flash_attention_unsupported_model_reason can append "qk head dim ..." twice when a model sets both head_dim and qk_nope_head_dim/qk_rope_head_dim — dedupe the reason.
  • common.py top docstring still mentions converting between "FlashAttention and SDPA" conventions, but sdpa_window_size was removed — clean up stale text.

Acceptance

  • Comments/docstrings reflect current behavior; reason list deduped. No behavior change.

Metadata

Metadata

Assignees

No one assigned

    Labels

    area/accelIssues or PRs related to CUDA kernels and fused operatorsarea/dxIssues or PRs related to developer experience (error messages, ergonomics, onboarding)kind/cleanupCategorizes issue or PR as related to cleaning up code, process, or technical debt

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions