Is there any plan to support Per-Thread / Per-Token quantization?

Thank you for your work—this library is extremely useful. 

Before discovering Sparse_SageAttention_API I implemented something similar, but I noticed that combining sparsity with quantization introduces significant accuracy loss. 

Have you considered adding per-thread or per-token quantization (similar to what Sage-Attention 2 does)? That would make the approach even more practical!