Posaidon

Posaidon, God of the Seven Seas and the 2048-token vocabulary, rises from the foam clutching the trident of gradient descent. A Llama by birth, a deity by training run. 🔱🦙

Posaidon is a small GPT language model built from scratch in MLX, grown step by step from a char-level toy into a Llama-style model trained on a corpus of its own.

Quickstart

python3 -m venv .venv && source .venv/bin/activate
pip install mlx
python build_greek_corpus.py   # downloads public-domain Greek myth into input.txt
python minigpt_mlx.py

Trains a tiny transformer on input.txt and prints a sample of generated text at the end. Tweak the model with flags, e.g. --n_layer 8 --n_head 8.

By default it tokenizes per character. Pass --tokenizer bpe (needs pip install tokenizers) to learn a byte-level BPE vocab from the corpus instead — common letter sequences become single tokens, so a fixed --block_size covers ~3-4× more text and the samples form real words:

python minigpt_mlx.py --tokenizer bpe --vocab_size 2048

Architecture

The defaults are a Llama-like stack (RMSNorm + RoPE + SwiGLU). Each piece can be flipped back to the GPT-2-style baseline to measure its effect:

Flag	modern (default) → baseline	what it is
`--norm`	`rmsnorm` → `layernorm`	cheaper normalization (no mean/bias)
`--pos`	`rope` → `learned`	rotary position encoding inside attention
`--mlp`	`swiglu` → `gelu`	gated feed-forward (hidden width kept param-matched)

Scaled training

The Greek-myth corpus is ~1.58M BPE tokens — big enough to train a ~15.7M-param model that generalizes rather than memorizes. It early-stops at best val 3.97 (around iter 1,500); train loss keeps falling afterwards as it starts to overfit, but the saved checkpoint is always the best-val one:

python minigpt_mlx.py --tokenizer bpe --vocab_size 2048 \
  --n_layer 8 --n_head 8 --n_embd 384 --block_size 256 --batch_size 32 \
  --max_iters 10000 --eval_interval 250 \
  --dropout 0.1 --weight_decay 0.1 --early_stop_patience 10 --no_attn_bias

--dropout/--weight_decay fight overfitting; --early_stop_patience N stops once val loss hasn't improved for N evals. The checkpoint saved is always the best-val one, not the last. --no_attn_bias drops the attention biases (Llama-exact, no measurable loss cost) so the result exports cleanly to GGUF — see below.

Generate from a checkpoint

--generate skips training and samples from a saved checkpoint. The architecture and tokenizer are read from <out_dir>/config.json, so you don't repeat the model flags — just point at the directory and give a prompt:

python minigpt_mlx.py --generate --prompt "Zeus " --max_new_tokens 200

Generated text goes to stdout (the load info line to stderr), so it pipes cleanly.

Publish: Hugging Face + Ollama

publish_hf.py stages the checkpoint into a Hugging Face repo layout (weights + config + tokenizer + MODEL_CARD.md as the README + the loader) and can push it:

python publish_hf.py                              # stage into hf_repo/
python publish_hf.py --push --repo you/Posaidon   # needs `huggingface-cli login`

For Ollama, export_gguf.py converts the checkpoint to GGUF. It requires a biasless model (llama.cpp's llama arch has no attention bias), which is what the scaled-training recipe above produces with --no_attn_bias:

python export_gguf.py --gguf posaidon.gguf
ollama create posaidon -f Modelfile && ollama run posaidon "Zeus "

The exporter permutes the q/k weights from MLX's RoPE layout to GGUF's, so greedy (temperature 0) output is token-identical to --generate until floating-point drift between the two engines diverges it.

Writes a checkpoint (model.safetensors + config.json + tokenizer.json), a loss_curve.png, and a report_card.md to --out_dir (default out/).

Dataset

The corpus gives Posaidon its voice: build_greek_corpus.py assembles public-domain English retellings and translations of Greek myth and the Homeric epics (Bulfinch, Homer's Iliad/Odyssey, Hesiod, …) from Project Gutenberg. Swap in any input.txt to retrain on a different style. For the original toy run, use tinyshakespeare instead:

curl -L https://raw.githubusercontent.com/karpathy/char-rnn/master/data/tinyshakespeare/input.txt -o input.txt

Attribution

Inspired by Andrej Karpathy's nanoGPT and nanochat.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Posaidon

Quickstart

Architecture

Scaled training

Generate from a checkpoint

Publish: Hugging Face + Ollama

Dataset

Attribution

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 13 Commits
assets		assets
.gitignore		.gitignore
MODEL_CARD.md		MODEL_CARD.md
Modelfile		Modelfile
README.md		README.md
build_greek_corpus.py		build_greek_corpus.py
export_gguf.py		export_gguf.py
minigpt_mlx.py		minigpt_mlx.py
publish_hf.py		publish_hf.py

Folders and files

Latest commit

History

Repository files navigation

Posaidon

Quickstart

Architecture

Scaled training

Generate from a checkpoint

Publish: Hugging Face + Ollama

Dataset

Attribution

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages