GlassesAI

A self-hosted AI assistant for Ray-Ban smart glasses. Replaces Meta AI with a fully local pipeline — no cloud, no Meta account required.

Tested on Ray-Ban Meta glasses + Pixel 9 Pro running Android 14.

Architecture

GlassesAI supports two inference modes selectable in-app:

Server Mode (default)

┌─────────────────────────────────────────────────────────────────┐
│                        Ray-Ban Glasses                          │
│              Bluetooth A2DP (audio) / SCO (voice call)         │
└───────────────────────────┬─────────────────────────────────────┘
                            │
┌───────────────────────────▼─────────────────────────────────────┐
│                    Android Phone (GlassesAI)                    │
│                                                                 │
│  ┌─────────────────┐    wake word     ┌──────────────────────┐  │
│  │   Phone Mic     │ ───────────────► │  Vosk (on-device)    │  │
│  │  (always-on)    │                  │  grammar mode, offline│  │
│  └─────────────────┘                  └──────────┬───────────┘  │
│                                                  │ detected     │
│  ┌──────────────────────────────────────────────▼───────────┐  │
│  │                  GlassesAIService (FGS)                   │  │
│  │  • SCO connects → glasses mic becomes active              │  │
│  │  • Streams PCM audio to server via WebSocket              │  │
│  │  • Plays server audio response via AudioTrack (SCO)       │  │
│  │  • Pauses media (Spotify, etc.) during conversation       │  │
│  └──────────────────────────┬────────────────────────────────┘  │
└─────────────────────────────┼───────────────────────────────────┘
                              │ WebSocket (wss://)
                              │ PCM audio up / PCM audio down
┌─────────────────────────────▼───────────────────────────────────┐
│                      Linux Server (server.py)                   │
│                                                                 │
│  ┌──────────────────┐   ┌─────────────────┐   ┌─────────────┐  │
│  │  faster-whisper  │   │   Mistral 7B    │   │  Kokoro TTS │  │
│  │  (CUDA, int8)    │──►│  via Ollama     │──►│  (CPU)      │  │
│  │  STT  ~0.2s      │   │  LLM inference  │   │  24kHz PCM  │  │
│  └──────────────────┘   └─────────────────┘   └─────────────┘  │
│                                                                 │
│  Implements the Gemini Live WebSocket protocol                  │
└─────────────────────────────────────────────────────────────────┘

On-Device Mode (fully offline)

┌─────────────────────────────────────────────────────────────────┐
│                    Android Phone (GlassesAI)                    │
│                                                                 │
│  ┌──────────────┐  wake word  ┌──────────────────────────────┐  │
│  │  Phone Mic   │ ──────────► │  Vosk (grammar, offline)     │  │
│  └──────────────┘             └───────────────┬──────────────┘  │
│                                               │ detected        │
│  ┌────────────────────────────────────────────▼──────────────┐  │
│  │                  GlassesAIService (FGS)                    │  │
│  │                                                            │  │
│  │  Vosk free-form STT  ──►  Gemma 3 1B INT4 (MediaPipe)    │  │
│  │                                         │                  │  │
│  │               Piper TTS (bundled) ◄─────┘                  │  │
│  │               22050 Hz PCM → AudioTrack (SCO)              │  │
│  └────────────────────────────────────────────────────────────┘  │
└─────────────────────────────────────────────────────────────────┘

No network required after initial model download.

Features

Two inference modes — Server (your self-hosted Linux box) or On-Device (fully offline)
Fully offline wake word detection (OpenWakeWord, on-device) — wake word: "Hey Prism"
Bluetooth SCO for full-duplex voice via glasses mic + speaker
Configurable server URL, auth token, and TTS voice
11 Kokoro voices in server mode; Piper TTS bundled for on-device mode
Media pause/resume during conversations (Spotify, podcasts, etc.)
In-app status display — mirrors the notification, always in sync
Auto-reconnects on glasses Bluetooth connect
Bearer token auth on WebSocket connection (token configured in Settings)
Feature toggles for optional permissions (Phone Calls, SMS)
Voice confirmation for calls and SMS — Prism reads back the action and listens for yes/no before executing
Multi-turn follow-up — brief re-listen window after each response with rolling conversation context

Requirements

Phone:

Android 12+ (API 31)
Bluetooth-capable (SCO support)

Glasses:

Ray-Ban Meta smart glasses (any generation)
Paired to the phone via the Meta View app (pairing only — Meta app does not need to run)

Server (Server mode only):

Linux machine with NVIDIA GPU (CUDA) recommended for Whisper
Python 3.10+
server.py — separate repo

On-Device mode:

~800 MB free storage for the Gemma model
A HuggingFace account with access to litert-community/Gemma3-1B-IT (free, requires accepting terms)

Setup

1. Android App

Clone and open in Android Studio:

git clone https://github.com/ryon137/GlassesAI.git
cd GlassesAI
cp secrets.properties.example secrets.properties

Edit secrets.properties (optional — sets the default server URL pre-filled in Settings):

DEFAULT_SERVER_URL=ws://your-server-ip:9073

Use wss:// if your server is behind a TLS reverse proxy (recommended for remote access).

Build and install via Android Studio, or:

./gradlew installDebug

On first launch, open Settings and enter your server URL and auth token.

2. Server (Server mode only)

See GlassesAI-server for server setup instructions.

3. On-Device Model (On-Device mode only)

The app downloads the Gemma model (~800 MB) from HuggingFace on first use. You need a read token with access to litert-community/Gemma3-1B-IT:

Accept the model terms at huggingface.co/litert-community/Gemma3-1B-IT
Generate a read token at huggingface.co/settings/tokens
Enter the token in the app when prompted

The Piper TTS voice model is bundled in the APK — no separate download needed.

4. First Run

Open the app and grant microphone and Bluetooth permissions
Select Server or On-Device mode in Settings
Set your server URL + auth token (Server mode) or download the Gemma model (On-Device mode), tap Save
Tap Start — the Vosk wake word model downloads on first launch (~40 MB)
Say your wake word — you'll hear a chime when it's listening

Configuration

Setting	Where	Description	Default
Server URL	Settings → Server	WebSocket URL of your server	`ws://YOUR_SERVER_IP:9073`
Auth Token	Settings → Server	Bearer token matching your server config	—
Inference mode	Settings → Mode	Server or On-Device	Server
Voice	Settings → Server	11 Kokoro voices (server mode only)	`Heart (American, female)`

Wake word is "Hey Prism" — hardcoded to eliminate false-trigger variance.

Project Structure

app/src/main/java/com/ryoncook/glassesai/
├── GlassesAIService.kt     # Core foreground service — audio, BT, WebSocket, on-device inference
├── MainActivity.kt         # Navigation host (bottom nav, permission requests)
├── HomeFragment.kt         # Status display, Start/Stop button
├── SettingsFragment.kt     # All settings — mode, server, wake word, voice, features
├── InfoFragment.kt         # App info, how-to, privacy/permissions summary
├── InferenceManager.kt     # On-device LLM via MediaPipe (Gemma 3 1B INT4)
├── TtsManager.kt           # On-device TTS via sherpa-onnx + Piper (asset extraction + synthesis)
├── ResponseParser.kt       # Strips markdown code fences from LLM output
├── ActionParser.kt         # Parses structured JSON actions (call, SMS) from LLM response
├── FeatureGate.kt          # Gates feature execution behind Settings toggles
├── ConfigValidator.kt      # Validates server config before service start
├── ModelDownloadManager.kt # Downloads Gemma model from HuggingFace with progress reporting
├── ModelDownloadService.kt # Foreground service wrapper for model download
├── WakeWordMatcher.kt      # Vosk grammar-mode wake word detection helper
├── AudioUtils.kt           # PCM audio helpers
├── BluetoothReceiver.kt    # BroadcastReceiver — auto-reconnect on BT connect
└── Config.kt               # App-wide constants

app/src/main/assets/
└── tts-model/              # Bundled Piper TTS (vits-piper-en_US-lessac-medium-int8)

Roadmap

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 129 Commits
.github/workflows		.github/workflows
app		app
docs/privacy		docs/privacy
gradle/wrapper		gradle/wrapper
training		training
.gitignore		.gitignore
CLAUDE.md		CLAUDE.md
README.md		README.md
build.gradle.kts		build.gradle.kts
gradle.properties		gradle.properties
gradlew		gradlew
gradlew.bat		gradlew.bat
secrets.properties.example		secrets.properties.example
settings.gradle.kts		settings.gradle.kts

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

GlassesAI

Architecture

Server Mode (default)

On-Device Mode (fully offline)

Features

Requirements

Setup

1. Android App

2. Server (Server mode only)

3. On-Device Model (On-Device mode only)

4. First Run

Configuration

Project Structure

Roadmap

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

GlassesAI

Architecture

Server Mode (default)

On-Device Mode (fully offline)

Features

Requirements

Setup

1. Android App

2. Server (Server mode only)

3. On-Device Model (On-Device mode only)

4. First Run

Configuration

Project Structure

Roadmap

License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages