Skip to content

ulanch/llama.cpp

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

9,099 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

nanochat support (this branch)

This branch (nanochat) of the llama.cpp fork adds support for Karpathy's nanochat architecture, specifically the nanochat-d34 checkpoint.

For pre-converted GGUFs and full usage docs, see:

https://huggingface.co/ulanch/nanochat-d34-GGUF

What's in the branch

One new file plus a handful of small edits on top of upstream:

src/llama-arch.h       +1   LLM_ARCH_NANOCHAT enum value
src/llama-arch.cpp     +1   { LLM_ARCH_NANOCHAT, "nanochat" } in the arch-name map
src/llama-vocab.h      +1   LLAMA_VOCAB_PRE_TYPE_NANOCHAT enum value
src/llama-vocab.cpp    +12  match "nanochat" → pre-type, plus the BPE split regex
src/models/models.h    +13  llama_model_nanochat forward declaration
src/llama-model.cpp    +3   dispatch + NEOX rope_type
src/models/nanochat.cpp +172 (new)  the actual model implementation

master on this fork is byte-identical to ggml-org/llama.cpp master; everything above lives on nanochat.

Build

cmake -B build -G Ninja -DCMAKE_BUILD_TYPE=Release \
    -DLLAMA_CURL=OFF -DLLAMA_BUILD_SERVER=ON \
    -DLLAMA_BUILD_EXAMPLES=OFF -DLLAMA_BUILD_TESTS=OFF
cmake --build build -j 8 --target llama-cli llama-completion llama-server llama-quantize

Metal on Apple Silicon and AVX2/AVX-512 on x86 are auto-detected. LLAMA_BUILD_SERVER=ON is required even for llama-cli — it's gated on the server build upstream.

Converting your own nanochat checkpoint

The converter lives at the root of this repo as convert_nanochat_to_gguf.py (it's a standalone Python script, no install needed beyond torch, gguf, and tiktoken). It reads model_*.pt + meta_*.json + tokenizer.pkl from a nanochat checkpoint directory and writes a GGUF with arch="nanochat". Default output is bf16 — see the HF page for why fp16 is deprecated for this architecture.

python convert_nanochat_to_gguf.py --src /path/to/checkpoint --out model.gguf
./build/bin/llama-quantize model.gguf model-Q4_K_M.gguf Q4_K_M

A couple of notes

  • d34 was trained at nanochat commit 2c4473d (Jan 11 2026). Current master of nanochat has diverged significantly — smear gates, value embeddings, residual lambdas. None of that is in d34. Don't try to match this code against the current gpt.py.
  • The RoPE in this arch uses an inverted-sin convention vs ggml's NEOX. The graph compensates by passing freq_scale = -1.0 to ggml_rope_ext. That's the only non-obvious thing in nanochat.cpp.

License

MIT, inherited from upstream llama.cpp and from karpathy/nanochat.

About

llama.cpp fork adding nanochat architecture support, so Karpathy's nanochat checkpoints run as GGUF. See NANOCHAT.md

Topics

Resources

License

Contributing

Security policy

Stars

Watchers

Forks

Releases

No releases published

Contributors

Languages

  • C++ 56.9%
  • C 13.3%
  • Python 7.8%
  • Cuda 5.7%
  • TypeScript 3.2%
  • HTML 2.9%
  • Other 10.2%