Consider using pre-built Flash Attention kernels via `kernels`

Hey,

I am Sayak from the [Kernels](https://github.com/huggingface/kernels) team at Hugging Face. I noticed that this project uses Flash Attention which includes a long build time. We ship pre-built binaries (which provide bit-exact outputs as the upstream) and thereby, we make it easy to use.

Using FA3 on a supported machine is as easy as:

```py
# make sure `kernels` is installed: `pip install -U kernels`
from kernels import get_kernel

kernel_module = get_kernel("kernels-community/flash-attn3")
flash_attn_func = kernel_module.flash_attn_func

flash_attn_func(...)
```

Let us know if you'd be interested in this and and we'd be happy to provide a draft of how it would look in your repo.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Consider using pre-built Flash Attention kernels via `kernels` #158

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Uh oh!

Consider using pre-built Flash Attention kernels via kernels #158

Description

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions

Consider using pre-built Flash Attention kernels via `kernels` #158