Rapid-MLX: faster Apple Silicon backend for Goose (2-4x vs Ollama) #8436
raullenchai
started this conversation in
Show and tell
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
Summary
Rapid-MLX is an OpenAI-compatible inference server built on Apple's MLX framework. It works with Goose via the Ollama provider — just point
OLLAMA_HOSTat Rapid-MLX.Setup
Why faster?
Rapid-MLX uses Apple's MLX framework with native Metal compute kernels, purpose-built for unified memory. Benchmarks on Mac Studio M3 Ultra (Ollama 0.20.4 with MLX backend):
Tested with Goose
OLLAMA_HOSTpointing to Rapid-MLXAlso includes prompt cache (sub-100ms TTFT on follow-up turns), 17 tool call parsers, and reasoning separation for thinking models.
Beta Was this translation helpful? Give feedback.
All reactions