Skip to content

Is there any plan to support Per-Thread / Per-Token quantization? #1

Description

@Lyxien

Thank you for your work—this library is extremely useful.

Before discovering Sparse_SageAttention_API I implemented something similar, but I noticed that combining sparsity with quantization introduces significant accuracy loss.

Have you considered adding per-thread or per-token quantization (similar to what Sage-Attention 2 does)? That would make the approach even more practical!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions