A from-scratch reimplementation of nanochat, built end-to-end on a single consumer GPU (RTX 5060 Ti, 16 GiB) for the local stages, then scaled to 8×H100 cloud for the GPT-2 speedrun.
The goal isn't to beat nanochat — it's to deeply understand every layer of the modern LLM stack by rewriting it. nanochat's numbers serve as the correctness oracle: ArcherChat-d8 should match nanochat-d8 within noise, ArcherChat-d12 should match nanochat-d12, and the final d24 cloud run should clear GPT-2's CORE score (0.2565) on 8×H100.
Run karpathy's nanochat unchanged on a 5060 Ti at d8 and d12 to establish reference numbers ArcherChat will be measured against.
Outputs:
runs/d8_local.shandruns/d12_local.shadapted from nanochat'sruns/speedrun.shfor single-GPU consumer Blackwell (no FA3, no FP8,--window-pattern=L).- A small patch to nanochat's
get_peak_flopstable so MFU logs a real number on the 5060 Ti. - Reference numbers locked in for both depths — see STAGE1.md.
Rewrite the core modules from scratch:
- Muon optimizer + compute-optimal scaling (the auto-derivation of token horizon, batch size, LR, weight decay from one
--depthdial). - Sliding-window attention (FA3 path + SDPA fallback for consumer hardware).
- Tokenizing distributed dataloader with epoch / shard state.
- KV-cache inference engine with batched decoding.
- SFT pipeline including the packing-to-padding transition for variable-length chat data.
Train ArcherChat-d8 locally; compare against nanochat-d8 baseline. If within noise, train ArcherChat-d12 as the second data point — proves the rewrite scales correctly across depth, not just that one configuration happened to work.
Rent 8×H100, run the ArcherChat speedrun script, target the nanochat leaderboard's GPT-2 threshold (0.2565 CORE). Final weights + metrics sync from cloud to home box via Endlex. The cloud instance can be torn down the moment training finishes — nothing of value lives on it.
- Endlex — self-hosted wandb replacement (metrics dashboard + checkpoint sync). ArcherChat will use Endlex as its only telemetry layer from day one. Must exist before Stage 2 starts.
Stage 1 complete. Stage 2 starts after Endlex MVP lands.
TBD.