#

nare-labs

Here is 1 public repository matching this topic...

narelabs / ossa

Sparse attention for frozen LLMs. Train a tiny router that picks the top-K keys per query, swap dense attention for sparse — no retraining the model.

transformers inference pytorch attention-mechanism huggingface sparse-attention llm long-context qwen nare-labs

Updated May 26, 2026
Python

Improve this page

Add a description, image, and links to the nare-labs topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the nare-labs topic, visit your repo's landing page and select "manage topics."