You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
FLASH_ATTENTION_MIN_CUDA_CAPABILITY = (8, 0) disables flash on all Turing (sm_75/T4) though flash-attn 2.x forward supports some head dims on Turing — add a comment explaining why 8.0 not 7.5.
flash_attention_unsupported_model_reason can append "qk head dim ..." twice when a model sets both head_dim and qk_nope_head_dim/qk_rope_head_dim — dedupe the reason.
common.py top docstring still mentions converting between "FlashAttention and SDPA" conventions, but sdpa_window_size was removed — clean up stale text.
Acceptance
Comments/docstrings reflect current behavior; reason list deduped. No behavior change.
Items
FLASH_ATTENTION_MIN_CUDA_CAPABILITY = (8, 0)disables flash on all Turing (sm_75/T4) though flash-attn 2.x forward supports some head dims on Turing — add a comment explaining why 8.0 not 7.5.flash_attention_unsupported_model_reasoncan append "qk head dim ..." twice when a model sets bothhead_dimandqk_nope_head_dim/qk_rope_head_dim— dedupe the reason.common.pytop docstring still mentions converting between "FlashAttention and SDPA" conventions, butsdpa_window_sizewas removed — clean up stale text.Acceptance