When more sampling stops helping: a reasoning model can generate a right answer long before it can pick one. The modal and correlation ceilings of test-time scaling, with paper, figures, and code (Bay & Yearick).
machine-learning reproducible-research self-consistency best-of-n effective-sample-size scaling-laws large-language-models inference-time-compute reasoning-models test-time-scaling arxiv-paper repeated-sampling
-
Updated
Jun 30, 2026 - TeX