Failed Star (`fs`)

A small, self-contained LLM inference engine for Apple Silicon —
built from scratch, in the open, to learn and teach how inference engineering works.

📖 Read the learning site →

A failed star (a brown dwarf) is smaller than a dwarf star: not enough mass to sustain fusion. This project is the smaller sibling of Dwarf Star (ds4), antirez's self-contained inference engine for DeepSeek-V4. Where ds4 targets big MoE models on 96GB+ Macs, Failed Star runs a tiny model on a 64GB MacBook Pro (M5) — and trades raw capability for something else: every line is meant to be read, understood, and learned from.

Why this exists

The goal is understanding inference, by building it. Reading about attention is one thing; writing the kernel that computes it and watching tokens stream out of your own code is another. This repo is the second thing.

Three sources form its spine, cross-referenced throughout the docs. (The prerequisites point to a wider set of optional brush-up and go-deeper resources — those fill gaps; these three are what the docs lean on.)

The concepts — Inference Engineering by Philip Kiely (Baseten, 2026). The "why" and the vocabulary. (Peruse the free interactive guide, or get your own copy from Baseten Books; Inference Engineering.pdf is in this repo.)
A real implementation — ds4, cloned into reference/ds4/. The "how a pro does it." Working code doesn't lie.
Architecture context — Sebastian Raschka's free articles: the architecture comparison, gallery, and workflow for understanding LLMs. (His book is a good optional extra, not a dependency — see the prerequisites.)

How it's built

Host language: Rust. Model loading, tokenizer, orchestration, sampling, KV cache — all Rust.
GPU kernels: MSL (Metal Shading Language) — hand-written, one operation per file, just like ds4's metal/ shaders.
Metal via raw FFI / the Objective-C runtime — no convenience wrapper crate. We send messages to Metal ourselves so nothing is hidden. Tight, like ds4.
First model: Qwen3-0.6B — a tiny dense model with GQA, RoPE, SwiGLU, and RMSNorm; small enough to inspect and debug while still looking like a real modern LLM.
Correctness via golden vectors: match logits from the model's official implementation. (Python appears only as a one-shot oracle, never as a second engine.)

Repo layout

fs/
├── README.md                  ← you are here
├── PLAN.md                    ← the milestone curriculum (M0 … M7+)
├── PROGRESS.md                ← running session log; start here each session
├── Inference Engineering.pdf  ← local copy of the book (ignored; bring your own)
├── src/                       ← Rust engine + thin CLI
├── scripts/                   ← uv-managed Python oracle/data scripts
├── tests/golden/              ← committed golden fixtures for verification
├── tools/                     ← site/sync helper scripts
├── docs/                       ← the learning site + notes (served at /fs via Pages)
│   ├── index.html             ← learning-site landing page (rich HTML)
│   ├── prerequisites.md       ← what to know before diving in (read this first)
│   ├── 00-map.md              ← THE BIG PICTURE of an inference engine
│   ├── 01-tokenizer.md        ← M0 writeup (.md + rich .html version)
│   ├── dev-loop.md            ← how to resume work after a break
│   ├── testing.md             ← verification strategy and golden-vector plan
│   ├── diagrams.html          ← shared diagram gallery
│   ├── RESOURCES.md           ← cross-reference index (book §§, ds4 files, Raschka)
│   ├── learnings/             ← bite-sized notes on what we figured out & why
│   └── assets/                ← logo + site assets
├── reference/ds4/             ← antirez's ds4 — pinned git submodule (read-only ref)
└── models/                    ← downloaded model assets (ignored; generated locally)

Where to start

Read docs/prerequisites.md — the honest "what to know before you dive in" (spoiler: inference is the forward pass only — no training, no calculus), with brush-up resources and a knowledge-map.
Read docs/00-map.md — the end-to-end picture of an inference engine, with an "abstraction ladder" so you can stop digging at whatever depth interests you.
Skim PLAN.md — the milestones.
Each session, open PROGRESS.md to see what's next.
If resuming development, use docs/dev-loop.md and docs/testing.md for the local checks and verification strategy.

Status

🌱 M0 — Tokenizer: ✅ done. M1 — Load the weights: in progress. The byte-level BPE tokenizer is implemented and verified — fs tokenize / fs detokenize run end-to-end against Qwen3-0.6B, loading vocab + merges + regex

special tokens from the single tokenizer.json (14/14 golden cases pass; see docs/01-tokenizer.md). Next step: parse the safetensors weights and config.json so fs inspect model/ can print the architecture and tensor table.

Milestones (the full curriculum, with cross-links, lives in PLAN.md):

M0 — Tokenizer — text ↔ token IDs, verified against the real vocab
M1 — Load the weights ← current
M2 — Forward pass → logits
M3 — Sampling → generation
M4 — KV cache
M5 — Quantization
M6 — Metal acceleration
M7+ — Stretch goals

This is a slow, multi-session learning project. It is not (yet) fast, capable, or finished — that's the point. Local models keep getting better; the bet is that a clean, well-documented small engine becomes more useful to more people over time.

Name		Name	Last commit message	Last commit date
Latest commit History 30 Commits
assets/logo-drafts		assets/logo-drafts
docs		docs
reference		reference
scripts		scripts
src		src
tests		tests
tools		tools
.gitignore		.gitignore
.gitmodules		.gitmodules
AGENTS.md		AGENTS.md
CLAUDE.md		CLAUDE.md
Cargo.lock		Cargo.lock
Cargo.toml		Cargo.toml
LICENSE		LICENSE
PLAN.md		PLAN.md
PROGRESS.md		PROGRESS.md
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Failed Star (`fs`)

Why this exists

How it's built

Repo layout

Where to start

Status

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Failed Star (fs)

Why this exists

How it's built

Repo layout

Where to start

Status

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Failed Star (`fs`)

Packages