⚡ Native MLX Swift LLM inference server for Apple Silicon. OpenAI-compatible API, SSD streaming for 100B+ MoE models, TurboQuant KV cache compression, MACOS + iOS iPhone app.
-
Updated
May 19, 2026 - Swift
⚡ Native MLX Swift LLM inference server for Apple Silicon. OpenAI-compatible API, SSD streaming for 100B+ MoE models, TurboQuant KV cache compression, MACOS + iOS iPhone app.
Optimize LLM inference with near-optimal 4-bit weight quantization and on-the-fly dequantization for lower memory use and faster matmul
Add a description, image, and links to the apple-sili topic page so that developers can more easily learn about it.
To associate your repository with the apple-sili topic, visit your repo's landing page and select "manage topics."