Skip to content

Consider using pre-built Flash Attention kernels via kernels #158

@sayakpaul

Description

@sayakpaul

Hey,

I am Sayak from the Kernels team at Hugging Face. I noticed that this project uses Flash Attention which includes a long build time. We ship pre-built binaries (which provide bit-exact outputs as the upstream) and thereby, we make it easy to use.

Using FA3 on a supported machine is as easy as:

# make sure `kernels` is installed: `pip install -U kernels`
from kernels import get_kernel

kernel_module = get_kernel("kernels-community/flash-attn3")
flash_attn_func = kernel_module.flash_attn_func

flash_attn_func(...)

Let us know if you'd be interested in this and and we'd be happy to provide a draft of how it would look in your repo.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions