attention-sinks

Here are 2 public repositories matching this topic...

Annusha / refam

Official implementaiton of RefAM: Attention Magnets for Zero-Shot Referral Segmentaiton

zero-shot rios rvos referral-segmentation attention-sinks

Updated Feb 6, 2026
Jupyter Notebook

Faithful from-scratch PyTorch reproduction of OpenAI's GPT-OSS architecture (sliding/full attention alternation, learned attention sinks, YaRN 128K, top-2-of-8 MoE), scaled to Chinchilla-optimal 502M total / 247M active training on a single A100 80GB

yarn pytorch from-scratch mixture-of-experts llm sliding-window-attention gpt-oss attention-sinks

Updated Jun 29, 2026
Python

Improve this page

Add a description, image, and links to the attention-sinks topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the attention-sinks topic, visit your repo's landing page and select "manage topics."

Learn more

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

attention-sinks

Here are 2 public repositories matching this topic...

Annusha / refam

atandra2000 / GPT-OSS-Lite

Improve this page

Add this topic to your repo