Skip to content

wildstyl3r/rustgpt

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

45 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

RustGPT

A simple testbed project to cover various optimization tricks used in modern transformer architectures. Was abandoned for a while, but currently is under active development. Its origin can be traced back to Let's build GPT:... video by Andrej Karpathy, and the default startup parameters are near to those featured at some point in the video.

The project is focused primarily on tiny models and training/inference on CPU.

general stuff

  • model weights saving and loading (May 2026)
  • enum dispatch based model construction managed from CLI (May 13, 2026)
  • RMSnorm (May 13, 2026)
  • BPE tokenizer

transformer block modifications

  • parallel transformer block (May 13, 2026)
  • rotary position embedding
  • polar position embedding
  • Mixture-of-Experts

linear attention

  • random feature attention
  • Taylor series based softmax approximation

quasilinear attention

  • sliced ReLU attention

generic attention tricks

  • QK-norm
  • MQA
  • GQA
  • MLA

inference

  • KV-caching
  • speculative decoding

optimization

  • Muon optimizer (???)

About

A tiny testbed for GPT tricks

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages