Thank you for your work—this library is extremely useful.
Before discovering Sparse_SageAttention_API I implemented something similar, but I noticed that combining sparsity with quantization introduces significant accuracy loss.
Have you considered adding per-thread or per-token quantization (similar to what Sage-Attention 2 does)? That would make the approach even more practical!
Thank you for your work—this library is extremely useful.
Before discovering Sparse_SageAttention_API I implemented something similar, but I noticed that combining sparsity with quantization introduces significant accuracy loss.
Have you considered adding per-thread or per-token quantization (similar to what Sage-Attention 2 does)? That would make the approach even more practical!