Why Transformers Fail at Counting and How to Fix It
[NeurIPS 2026 Submission] | arXiv (full version)
Transformers fail at counting not because they can't represent counts, but because
the output pathway can't route the answer. Linear probes recover counts at
- arXiv (full, 26 pages): arxiv.org/abs/2605.03258 |
paper/main.tex - NeurIPS 2026 (9 pages, anonymized):
paper/main_neurips.pdf|paper/main_neurips.tex
All experiments can be reproduced from the supplement. See supplement/README.md for the full guide.
cd supplement/code
python data_generation.py # Generate benchmark
python run_phase112_fullvocab_all_tasks.py # 9-row repair
python run_phase118_lora_generation.py # LoRA Q/V generation
python run_phase122_cot.py # CoT baselineResults match the paper exactly — verified on the same TPU VM (PyTorch, CPU mode).
| Experiment | Metric | Result |
|---|---|---|
| Probe R² | Layer 2+ | >0.99 |
| 9-row repair | Constrained | 60.7--100.0% |
| LoRA Q/V generation | 5 seeds | 83.1% ± 7.2% |
| CoT baseline | Few-shot | 20.2% ± 1.9% |
.
├── paper/ # LaTeX source, PDFs, figures, checklist
├── supplement/
│ ├── code/ # 16 Python experiment scripts
│ ├── results/ # 14 primary result JSONs
│ └── figures/ # All paper figures (PDF + PNG)
└── README.md
@inproceedings{garcia2026right,
title={The Right Answer, the Wrong Direction: Why Transformers
Fail at Counting and How to Fix It},
author={Garcia, Gabriel},
booktitle={Advances in Neural Information Processing Systems},
year={2026}
}Code: MIT. Paper, figures, and text: CC BY 4.0.