Voice-Operated eXecution for macOS
System-wide speech-to-text that runs entirely locally. Hold a hotkey, speak, release — transcribed text appears wherever your cursor is.
Install • How it works • Configuration • Development
Vox turns your voice into text in any application. The entire pipeline runs locally — no cloud services, no API keys required for core dictation.
Hold your hotkey, speak naturally, release — transcribed text is pasted at your cursor. Works in editors, browsers, terminals, chat apps, anywhere.
Hold hotkey → Record mic → Whisper transcribes → Classify → [AI Process] → Text pasted at cursor
-
Transcribe — WAV audio is sent to a local whisper.cpp server. Auto-detects endpoint format, applies custom vocabulary hints.
-
Filter — Detects blank audio, whisper hallucinations (
[BLANK_AUDIO]), and empty transcriptions. Cancels the pipeline early if there's nothing to process. -
Classify — Fast prefix matching (no API call) routes the transcription into one of three modes:
- Dictation — Normal speech-to-text (default)
- Prompt — Voice-to-Claude shortcuts ("summarize my clipboard", "translate to Spanish", "explain this error")
- Command — Shell execution via voice ("create PR", "git status", "query flag <name>")
-
Post-process (Dictation) — Optional AI cleanup via Claude for grammar, punctuation, and context-aware formatting based on the frontmost app (terse for terminals, conversational for chat).
-
Prompt (Prompt mode) — Sends the classified action to Claude with appropriate system prompts. Operates on clipboard contents or spoken subjects.
-
Command (Command mode) — Routes to a registry of shell commands:
gh,git,ldcli,go test,open. -
Inject — Snapshots the clipboard, writes text via
pbcopy, simulates Cmd+V via CGEvent, then restores the original clipboard.
Requirements: macOS, Homebrew, Go 1.24+
git clone https://github.com/mattthewong/vox.git
cd vox
make startmake start handles everything:
- Installs missing system deps (
sox,whisper-cpp) via Homebrew - Downloads the default Whisper model (~150 MB) to
~/.local/share/whisper-cpp/ - Builds
bin/Vox.appand ad-hoc codesigns it - Launches Vox detached — it manages
whisper-serveritself
The first launch triggers two macOS permission prompts (Microphone and Accessibility); grant both and you're done.
brew install sox whisper-cpp
mkdir -p ~/.local/share/whisper-cpp
curl -L -o ~/.local/share/whisper-cpp/ggml-base.en.bin \
"https://huggingface.co/ggerganov/whisper.cpp/resolve/main/ggml-base.en.bin"make build # outputs bin/vox (bare binary)
make app # outputs bin/Vox.app (macOS bundle, ad-hoc signed)
make install # installs bin/vox to /usr/local/bin/voxmake start # ensures deps, builds, launches detached
make stop # stops Vox (whisper child exits with it)
make status # shows whether Vox is runningOn first run, macOS prompts for two permissions. Grant them to Vox (Vox.app in System Settings):
- Microphone — System Settings > Privacy & Security > Microphone
- Accessibility — System Settings > Privacy & Security > Accessibility
The .app bundle uses a stable CFBundleIdentifier (dev.vox.menubar), so permissions survive rebuilds.
All via environment variables:
| Variable | Default | Description |
|---|---|---|
VOX_HOTKEY |
option+space |
Hotkey to trigger recording. Comma-separated for multiple. |
VOX_WHISPER_MODEL_ID |
base.en |
Model ID (tiny.en, base.en, small.en, medium.en, large-v3-turbo) |
VOX_HOLD_TO_TALK |
true |
true = hold to record, false = toggle on/off |
VOX_LANGUAGE |
(auto) | BCP-47 language code (e.g. en, es) |
VOX_VERBOSE |
false |
Debug logging |
Menubar toggles (mode, sounds, auto-paste, hotkey, model) persist to ~/Library/Application Support/Vox/preferences.json. Env vars > preferences > defaults.
VOX_HOTKEY="fn" # Fn / Globe key
VOX_HOTKEY="cmd+shift" # Modifier-only
VOX_HOTKEY="option+space" # Modifier + key
VOX_HOTKEY="ctrl+shift+d" # Multiple modifiers + key
VOX_HOTKEY="fn,cmd+shift" # Multiple hotkeys (either triggers)Modifiers: ctrl, shift, option/alt, cmd/command
Keys: a-z, 0-9, f1-f20, space, return, escape, tab, delete, arrow keys
AI-powered features require an Anthropic API key. Set via ~/.vox/config.yaml:
anthropic_api_key: sk-ant-...Or distribute keys to a team via LaunchDarkly feature flags:
| Flag | Controls |
|---|---|
vox-ai-postprocess |
AI grammar/punctuation cleanup |
vox-prompt-mode |
Voice-to-Claude prompt shortcuts |
vox-voice-commands |
Shell command execution via voice |
vox-context-aware |
App-aware formatting hints |
vox-ai-model |
Which Claude model to use |
vox-anthropic-key |
Team-managed API key distribution |
vox-streaming-overlay |
Floating transcription overlay |
Flag precedence: LaunchDarkly > env var > config file > default.
cmd/vox/main.go — Entrypoint, event loops, signal/menubar shutdown wiring
internal/hotkey/ — CGEventTap-based global hotkey (modifier-only, fn, modifier+key)
internal/audio/ — Mic recording via ffmpeg/sox subprocess
internal/transcribe/ — Whisper HTTP client (multipart upload, auto endpoint detection)
internal/classify/ — Intent classifier (prefix matching → Dictation/Prompt/Command)
internal/claude/ — Anthropic Claude Messages API client
internal/prompt/ — Prompt mode executor (summarize, explain, rewrite, translate)
internal/commands/ — Voice command registry (gh, git, ldcli, go test, open)
internal/pipeline/ — Generic stage pipeline
internal/inject/ — Text injection (pbcopy + CGEvent Cmd+V + clipboard restore)
internal/ui/ — Menubar status item (Cocoa via cgo)
internal/config/ — Env var config + hotkey parsing
internal/flags/ — LaunchDarkly Go Server SDK v7 integration
internal/appctx/ — Frontmost app detection (NSWorkspace)
internal/format/ — Context-aware formatting hints per app category
Threading: The main goroutine owns NSApp's run loop. CGEventTap registers on the same loop, so menubar clicks and hotkey events are dispatched on the same thread. Recording, transcription, and injection run in goroutines.
make build # Build bare binary (bin/vox)
make app # Wrap into bin/Vox.app (ad-hoc codesigned)
make test # Run all tests
make test-short # Skip integration tests
make lint # go vet
make fmt # gofmt
make run # Build and runI was using Whisper Flow for speech-to-text but kept hitting rate limits on their free plan. Vox does the same thing — system-wide dictation with a hold-to-talk hotkey — but runs entirely on your machine with no external dependencies.
MIT
