A self-hosted AI assistant for Ray-Ban smart glasses. Replaces Meta AI with a fully local pipeline — no cloud, no Meta account required.
Tested on Ray-Ban Meta glasses + Pixel 9 Pro running Android 14.
GlassesAI supports two inference modes selectable in-app:
┌─────────────────────────────────────────────────────────────────┐
│ Ray-Ban Glasses │
│ Bluetooth A2DP (audio) / SCO (voice call) │
└───────────────────────────┬─────────────────────────────────────┘
│
┌───────────────────────────▼─────────────────────────────────────┐
│ Android Phone (GlassesAI) │
│ │
│ ┌─────────────────┐ wake word ┌──────────────────────┐ │
│ │ Phone Mic │ ───────────────► │ Vosk (on-device) │ │
│ │ (always-on) │ │ grammar mode, offline│ │
│ └─────────────────┘ └──────────┬───────────┘ │
│ │ detected │
│ ┌──────────────────────────────────────────────▼───────────┐ │
│ │ GlassesAIService (FGS) │ │
│ │ • SCO connects → glasses mic becomes active │ │
│ │ • Streams PCM audio to server via WebSocket │ │
│ │ • Plays server audio response via AudioTrack (SCO) │ │
│ │ • Pauses media (Spotify, etc.) during conversation │ │
│ └──────────────────────────┬────────────────────────────────┘ │
└─────────────────────────────┼───────────────────────────────────┘
│ WebSocket (wss://)
│ PCM audio up / PCM audio down
┌─────────────────────────────▼───────────────────────────────────┐
│ Linux Server (server.py) │
│ │
│ ┌──────────────────┐ ┌─────────────────┐ ┌─────────────┐ │
│ │ faster-whisper │ │ Mistral 7B │ │ Kokoro TTS │ │
│ │ (CUDA, int8) │──►│ via Ollama │──►│ (CPU) │ │
│ │ STT ~0.2s │ │ LLM inference │ │ 24kHz PCM │ │
│ └──────────────────┘ └─────────────────┘ └─────────────┘ │
│ │
│ Implements the Gemini Live WebSocket protocol │
└─────────────────────────────────────────────────────────────────┘
┌─────────────────────────────────────────────────────────────────┐
│ Android Phone (GlassesAI) │
│ │
│ ┌──────────────┐ wake word ┌──────────────────────────────┐ │
│ │ Phone Mic │ ──────────► │ Vosk (grammar, offline) │ │
│ └──────────────┘ └───────────────┬──────────────┘ │
│ │ detected │
│ ┌────────────────────────────────────────────▼──────────────┐ │
│ │ GlassesAIService (FGS) │ │
│ │ │ │
│ │ Vosk free-form STT ──► Gemma 3 1B INT4 (MediaPipe) │ │
│ │ │ │ │
│ │ Piper TTS (bundled) ◄─────┘ │ │
│ │ 22050 Hz PCM → AudioTrack (SCO) │ │
│ └────────────────────────────────────────────────────────────┘ │
└─────────────────────────────────────────────────────────────────┘
No network required after initial model download.
- Two inference modes — Server (your self-hosted Linux box) or On-Device (fully offline)
- Fully offline wake word detection (OpenWakeWord, on-device) — wake word: "Hey Prism"
- Bluetooth SCO for full-duplex voice via glasses mic + speaker
- Configurable server URL, auth token, and TTS voice
- 11 Kokoro voices in server mode; Piper TTS bundled for on-device mode
- Media pause/resume during conversations (Spotify, podcasts, etc.)
- In-app status display — mirrors the notification, always in sync
- Auto-reconnects on glasses Bluetooth connect
- Bearer token auth on WebSocket connection (token configured in Settings)
- Feature toggles for optional permissions (Phone Calls, SMS)
- Voice confirmation for calls and SMS — Prism reads back the action and listens for yes/no before executing
- Multi-turn follow-up — brief re-listen window after each response with rolling conversation context
Phone:
- Android 12+ (API 31)
- Bluetooth-capable (SCO support)
Glasses:
- Ray-Ban Meta smart glasses (any generation)
- Paired to the phone via the Meta View app (pairing only — Meta app does not need to run)
Server (Server mode only):
- Linux machine with NVIDIA GPU (CUDA) recommended for Whisper
- Python 3.10+
- server.py — separate repo
On-Device mode:
- ~800 MB free storage for the Gemma model
- A HuggingFace account with access to
litert-community/Gemma3-1B-IT(free, requires accepting terms)
Clone and open in Android Studio:
git clone https://github.com/ryon137/GlassesAI.git
cd GlassesAI
cp secrets.properties.example secrets.propertiesEdit secrets.properties (optional — sets the default server URL pre-filled in Settings):
DEFAULT_SERVER_URL=ws://your-server-ip:9073Use
wss://if your server is behind a TLS reverse proxy (recommended for remote access).
Build and install via Android Studio, or:
./gradlew installDebugOn first launch, open Settings and enter your server URL and auth token.
See GlassesAI-server for server setup instructions.
The app downloads the Gemma model (~800 MB) from HuggingFace on first use. You need a read token with access to litert-community/Gemma3-1B-IT:
- Accept the model terms at huggingface.co/litert-community/Gemma3-1B-IT
- Generate a read token at huggingface.co/settings/tokens
- Enter the token in the app when prompted
The Piper TTS voice model is bundled in the APK — no separate download needed.
- Open the app and grant microphone and Bluetooth permissions
- Select Server or On-Device mode in Settings
- Set your server URL + auth token (Server mode) or download the Gemma model (On-Device mode), tap Save
- Tap Start — the Vosk wake word model downloads on first launch (~40 MB)
- Say your wake word — you'll hear a chime when it's listening
| Setting | Where | Description | Default |
|---|---|---|---|
| Server URL | Settings → Server | WebSocket URL of your server | ws://YOUR_SERVER_IP:9073 |
| Auth Token | Settings → Server | Bearer token matching your server config | — |
| Inference mode | Settings → Mode | Server or On-Device | Server |
| Voice | Settings → Server | 11 Kokoro voices (server mode only) | Heart (American, female) |
Wake word is "Hey Prism" — hardcoded to eliminate false-trigger variance.
app/src/main/java/com/ryoncook/glassesai/
├── GlassesAIService.kt # Core foreground service — audio, BT, WebSocket, on-device inference
├── MainActivity.kt # Navigation host (bottom nav, permission requests)
├── HomeFragment.kt # Status display, Start/Stop button
├── SettingsFragment.kt # All settings — mode, server, wake word, voice, features
├── InfoFragment.kt # App info, how-to, privacy/permissions summary
├── InferenceManager.kt # On-device LLM via MediaPipe (Gemma 3 1B INT4)
├── TtsManager.kt # On-device TTS via sherpa-onnx + Piper (asset extraction + synthesis)
├── ResponseParser.kt # Strips markdown code fences from LLM output
├── ActionParser.kt # Parses structured JSON actions (call, SMS) from LLM response
├── FeatureGate.kt # Gates feature execution behind Settings toggles
├── ConfigValidator.kt # Validates server config before service start
├── ModelDownloadManager.kt # Downloads Gemma model from HuggingFace with progress reporting
├── ModelDownloadService.kt # Foreground service wrapper for model download
├── WakeWordMatcher.kt # Vosk grammar-mode wake word detection helper
├── AudioUtils.kt # PCM audio helpers
├── BluetoothReceiver.kt # BroadcastReceiver — auto-reconnect on BT connect
└── Config.kt # App-wide constants
app/src/main/assets/
└── tts-model/ # Bundled Piper TTS (vits-piper-en_US-lessac-medium-int8)
- Phone actions — make calls and send SMS by voice
- Expanded voice commands — volume control, timers/alarms
- End-of-turn descending beep — audio cue when AI finishes responding
- Status transition timing — fix "Responding…" lingering after AI voice stops
- On-device mode — fully offline with Gemma 3 1B + Piper TTS
- Per-user auth token — configured in Settings, not baked into the build
- Play Store basics — app icon, R8 minification, privacy policy
- Default phone and clock app — pre-select in Settings to avoid disambiguation popups
- Custom wake word — "Hey Prism" OpenWakeWord model with in-app threshold calibration
- UI redesign — system-settings style, value-driven descriptions, on-device as default mode
- Call/SMS confirmation — Prism reads back "Call Mom?" and listens immediately for verbal yes/no before executing; auto-cancels on timeout
- Multi-turn follow-up — stay in a brief listening window after a response; short conversation context for the LLM
- Model response improvements — tune system prompt and inference parameters for on-device Gemma
- Separate media volume control — "set volume" controls Prism's voice (TTS track); add distinct command for glasses media (A2DP) stream
- Read notifications and calendar by voice
- OpenClaw integration — smart home, WhatsApp, web search
MIT