Skip to content

ceylonai/layerrun

Repository files navigation

LayerRun

LayerRun Logo

LayerRun is a Rust workspace designed for experimenting with raw safetensors-level Large Language Model (LLM) loading, custom tokenizer plumbing, model probing, greedy/sampled text generation, and optimization into a per-layer model layout.

Unlike high-level inference frameworks, LayerRun implements model execution, tensor deserialization, attention logic, and server interfaces directly from scratch.

LayerRun Architecture Infographic


Workspace Layout

The repository is organized as a Cargo workspace containing the following crates:

  • layerrun-core: The shared runtime covering configurations, memory-mapped tensor loading, tokenization, compute backend abstraction, and mathematical kernels.
  • layerrun-cli: Command-line interface for inspection, tokenizer execution, validation fixtures, and offline layout optimizations.
  • layerrun-server: OpenAI-compatible and Ollama-style HTTP server providing local model serving.

Documentation Index

Detailed guides are available in the following documents:

  • 📖 CLI Reference Guide: How to run inspections, tokenize text, generate text, convert checkpoints, and run validation.
  • 🖥️ Server API Guide: Endpoints reference, sampling options, model-specific chat templates, and integration examples.
  • 🐳 Docker Usage Guide: Build the image, run with Docker or Compose, and verify the server.
  • 📐 Architecture Overview: Deep dive into the workspace crates, tensor representation, CPU/MLX math kernels, and data flow.
  • 🎯 Stabilization & Features Plan: Current roadmap, correctness validation strategy, and upcoming feature enhancements.

Requirements

  • Rust Toolchain: Stable compiler with cargo.
  • Local Model Files: PyTorch/Hugging Face standard .safetensors files or optimized per-layer LayerRun directories.
  • Hugging Face Hub Access: Gated/private models require a valid HF_TOKEN environment variable or --hf-token argument.
  • MLX Backend (Optional): Apple Silicon macOS with cmake and full Xcode installed (ensure xcrun -find metal succeeds).

Common Commands

1. Build the Workspace

cargo build --release

2. Initialize Local Config

cargo run -p layerrun-cli -- init --models-dir models

This writes $HOME/.layerrun-conf, creates the models directory, and can save a Hugging Face token for later --hf-repo commands.

3. Run Workspace Tests

cargo test

4. Run the CLI Help

cargo run -p layerrun-cli -- --help

5. Build with MLX Acceleration

cargo build -p layerrun-cli --features mlx

Note: Ensure full Xcode is selected if compiling MLX native dependencies:

sudo xcode-select -s /Applications/Xcode.app/Contents/Developer

6. Launch the Server

cargo run --release -p layerrun-cli -- serve

Installed Example Models

The workspace comes with local models under the models/ directory:

  • models/qwen: Local standard single-file safetensors model.
  • models/qwen-layered: LayerRun optimized per-layer model directory.
  • models/llama-3.2-1b-instruct: Local standard sharded safetensors model.
  • models/llama-3.2-1b-instruct-layered: LayerRun optimized per-layer directory.
  • models/mistral: Local model files configuration.

License

LayerRun is licensed under the GNU General Public License v3.0.

Copyright (C) 2026 SYIGEN (PRIVATE) LIMITED.

About

LayerRun is a Rust-based local LLM runtime for memory-aware model execution, layer-wise loading, model inspection, and flexible inference serving.

Topics

Resources

License

Stars

Watchers

Forks

Packages

 
 
 

Contributors