Hey,
I am Sayak from the Kernels team at Hugging Face. I noticed that this project uses Flash Attention which includes a long build time. We ship pre-built binaries (which provide bit-exact outputs as the upstream) and thereby, we make it easy to use.
Using FA3 on a supported machine is as easy as:
# make sure `kernels` is installed: `pip install -U kernels`
from kernels import get_kernel
kernel_module = get_kernel("kernels-community/flash-attn3")
flash_attn_func = kernel_module.flash_attn_func
flash_attn_func(...)
Let us know if you'd be interested in this and and we'd be happy to provide a draft of how it would look in your repo.
Hey,
I am Sayak from the Kernels team at Hugging Face. I noticed that this project uses Flash Attention which includes a long build time. We ship pre-built binaries (which provide bit-exact outputs as the upstream) and thereby, we make it easy to use.
Using FA3 on a supported machine is as easy as:
Let us know if you'd be interested in this and and we'd be happy to provide a draft of how it would look in your repo.