Sparse attention for frozen LLMs. Train a tiny router that picks the top-K keys per query, swap dense attention for sparse — no retraining the model.
transformers inference pytorch attention-mechanism huggingface sparse-attention llm long-context qwen nare-labs
-
Updated
May 26, 2026 - Python