My attempt at rebuilding
https://github.com/karpathy/nanochat/tree/master
and combining it with
https://github.com/jingyaogong/minimind
PyTorch has a lot of syntax that makes even knowing what underlying math is being done. To combat this, this code contains far more comments than nanochat to battle the jargon. The dimensions of nearly every tensor are commented.
- Directory structure, README.md, .gitignore, initial files
- Tokenizer implementation in Rust (rustbpe/) + python bindings + tokenizer test/eval scripts (scripts/tok_eval.py)
- gpt.py: model implementation (attention, MLP, transformer block, GPT model) and engine.py (kv cache)