Here is your own alien rock friend inspired by Andy Weir's Project Hail Mary. He communicates in musical chords.
This project implements a Dual-Device "Voice Box & Brain" Architecture to bring Rocky to life on a Raspberry Pi Zero 2W with near-instant responsiveness.
Important
Looking to build your own Rocky?
Check out the Step-by-Step Installation Guide (BUILD_YOUR_OWN_ROCKY.md)! It covers setting up a fresh Raspberry Pi Zero 2W, installing the PiSugar Whisplay HAT and audio drivers, setting up uv, configuring Google Gemini vs Local LM Studio, and running offline Piper TTS voice models.
Note
Update (May 17, 2026): Piper TTS Integration!
Rocky can now speak with a human voice in parallel with his Eridanian musical chords! This upgrade is completely opt-in, fully configurable via your .env file, and allows you to load custom .onnx voice models. Check out the new demo video below!
- Lore-Accurate Eridanian Synthesis: Generates polyphonic chords in real-time. Specific words map to emotional chords (e.g., "amaze" is a bright E Major), while unknown words are mathematically hashed to permanent, unique 3-frequency signatures.
- Dual-Brain Architecture:
- Voice Box (Pi Zero 2W): Handles recording, hardware LEDs, LCD display, and Eridanian voice synthesis.
- Brain (Mac Hub): Uses Apple Silicon-optimized
mlx-whisper(specifically theWhisper-Tinymodel) for near-instant STT and LM Studio for local LLM inference (Gemma 4).
- Interactive Visuals: Custom LCD boot screen (
rocky_boot_screen.png) and a dynamic "thinking bubble" animation that appears while Rocky is processing. - Piper TTS Integration (Opt-in): High-quality verbal speech synthesis running in parallel with musical chords. By default, Rocky only speaks in his Eridanian musical chords, but human TTS can be enabled via
.env. Supports custom fine-tuned models. - Sequential Demo Mode: Advanced demo mode that cycles through a list of phrases from a text file, perfect for presentations or testing without a network connection.
- Hardware Integration: Full support for the PiSugar Whisplay HAT (LCD, Button, RGB LED, and WM8960 Audio).
To achieve low-latency AI interactions on a Raspberry Pi Zero 2W, we split the workload:
- The Pi records audio and
POSTs the raw bytes to the Mac Hub. - The Mac Hub intercepts the audio, transcribes it instantly using
mlx-whisper(Whisper-Tiny) and pings LM Studio. - LM Studio generates the Rocky-persona response.
- The Pi receives the text, renders the response on the LCD, and synthesizes the Eridanian chords.
| Interaction Phase | Local (STT Whisper-Tiny + gemma-4-e2b) |
Cloud (Gemini API gemini-2.5-flash-lite) |
|---|---|---|
| Handshake & Transcribe | ~0.8s | ~1.2s |
| LLM Reasoning & Response | ~1.2s | ~0.8s |
| Total Latency | ~2.0s | ~2.0s |
Note: The Local setup matches Cloud speeds while running entirely offline/privately on your local network.
Tip
For a detailed, comprehensive walkthrough, read the Step-by-Step Hardware & Software Setup Guide (BUILD_YOUR_OWN_ROCKY.md).
Skip virtual environments and use pre-compiled hardware drivers:
sudo apt-get update
sudo apt-get install python3-pygame python3-spidev python3-pil python3-lgpio
SDL_AUDIODRIVER=alsa python3 rocky.py# Install build headers
sudo apt-get install libsdl2-2.0-0 libsdl2-mixer-2.0-0 python3-dev libgpiod-dev liblgpio-dev swig
uv sync
uv run rocky.py- Start LM Studio: Load a 4-bit quantized model (e.g.,
Gemma 4-e2b-it-GGUF) and start the Local Server on port1234. - Run STT Proxy:
uv sync --extra mac-server uv run stt_server.py
| Variable | Description | Default |
|---|---|---|
GEMINI_API_KEY |
Your Google AI Studio API Key (for Cloud mode). | None |
USE_LOCAL_LLM |
Set to True to use the Mac STT Server. |
False |
LM_STUDIO_URL |
The endpoint for your local LM Studio instance. | http://localhost:1234/v1 |
SHOW_BOOT_SCREEN |
Show Rocky's splash screen on startup. | True |
USE_PIPER |
Enable Piper TTS for verbal speech. | False |
PIPER_MODEL |
Piper voice model file (e.g. en_US-lessac-low.onnx). |
en_US-lessac-low.onnx |
PIPER_RATE |
Sample rate matching the Piper model. | 16000 |
DEMO_MODE |
Bypass LLM and use pre-defined responses. | False |
DEMO_FILE |
Path to text file for sequential demo phrases. | demo_phrases.txt |
- Show & Speak: Test TTS and LCD visuals. Supports loading specific model epochs:
# Load 'rocky' voice with specific epoch checkpoint (e.g. 2607) python lab/show_and_speak.py rocky 2607 - Astromech Test: R2-D2 style sound generator (
uv run r2d2). - Gemma Model: The original 10GB unquantized inference server (
uv run rocky-server).
MIT License. Feel free to build your own Eridanian friend!
