LayerRun is a Rust workspace designed for experimenting with raw safetensors-level Large Language Model (LLM) loading, custom tokenizer plumbing, model probing, greedy/sampled text generation, and optimization into a per-layer model layout.
Unlike high-level inference frameworks, LayerRun implements model execution, tensor deserialization, attention logic, and server interfaces directly from scratch.
The repository is organized as a Cargo workspace containing the following crates:
layerrun-core: The shared runtime covering configurations, memory-mapped tensor loading, tokenization, compute backend abstraction, and mathematical kernels.layerrun-cli: Command-line interface for inspection, tokenizer execution, validation fixtures, and offline layout optimizations.layerrun-server: OpenAI-compatible and Ollama-style HTTP server providing local model serving.
Detailed guides are available in the following documents:
- 📖 CLI Reference Guide: How to run inspections, tokenize text, generate text, convert checkpoints, and run validation.
- 🖥️ Server API Guide: Endpoints reference, sampling options, model-specific chat templates, and integration examples.
- 🐳 Docker Usage Guide: Build the image, run with Docker or Compose, and verify the server.
- 📐 Architecture Overview: Deep dive into the workspace crates, tensor representation, CPU/MLX math kernels, and data flow.
- 🎯 Stabilization & Features Plan: Current roadmap, correctness validation strategy, and upcoming feature enhancements.
- Rust Toolchain: Stable compiler with
cargo. - Local Model Files: PyTorch/Hugging Face standard
.safetensorsfiles or optimized per-layer LayerRun directories. - Hugging Face Hub Access: Gated/private models require a valid
HF_TOKENenvironment variable or--hf-tokenargument. - MLX Backend (Optional): Apple Silicon macOS with
cmakeand full Xcode installed (ensurexcrun -find metalsucceeds).
cargo build --releasecargo run -p layerrun-cli -- init --models-dir modelsThis writes $HOME/.layerrun-conf, creates the models directory, and can save a Hugging Face token for later --hf-repo commands.
cargo testcargo run -p layerrun-cli -- --helpcargo build -p layerrun-cli --features mlxNote: Ensure full Xcode is selected if compiling MLX native dependencies:
sudo xcode-select -s /Applications/Xcode.app/Contents/Developercargo run --release -p layerrun-cli -- serveThe workspace comes with local models under the models/ directory:
models/qwen: Local standard single-file safetensors model.models/qwen-layered: LayerRun optimized per-layer model directory.models/llama-3.2-1b-instruct: Local standard sharded safetensors model.models/llama-3.2-1b-instruct-layered: LayerRun optimized per-layer directory.models/mistral: Local model files configuration.
LayerRun is licensed under the GNU General Public License v3.0.
Copyright (C) 2026 SYIGEN (PRIVATE) LIMITED.

