VOX

Voice-Operated eXecution for macOS

System-wide speech-to-text that runs entirely locally. Hold a hotkey, speak, release — transcribed text appears wherever your cursor is.

Install • How it works • Configuration • Development

What it does

Vox turns your voice into text in any application. The entire pipeline runs locally — no cloud services, no API keys required for core dictation.

Hold your hotkey, speak naturally, release — transcribed text is pasted at your cursor. Works in editors, browsers, terminals, chat apps, anywhere.

How it works

Hold hotkey → Record mic → Whisper transcribes → Classify → [AI Process] → Text pasted at cursor

Transcribe — WAV audio is sent to a local whisper.cpp server. Auto-detects endpoint format, applies custom vocabulary hints.
Filter — Detects blank audio, whisper hallucinations ([BLANK_AUDIO]), and empty transcriptions. Cancels the pipeline early if there's nothing to process.
Classify — Fast prefix matching (no API call) routes the transcription into one of three modes:
- Dictation — Normal speech-to-text (default)
- Prompt — Voice-to-Claude shortcuts ("summarize my clipboard", "translate to Spanish", "explain this error")
- Command — Shell execution via voice ("create PR", "git status", "query flag <name>")
Post-process (Dictation) — Optional AI cleanup via Claude for grammar, punctuation, and context-aware formatting based on the frontmost app (terse for terminals, conversational for chat).
Prompt (Prompt mode) — Sends the classified action to Claude with appropriate system prompts. Operates on clipboard contents or spoken subjects.
Command (Command mode) — Routes to a registry of shell commands: gh, git, ldcli, go test, open.
Inject — Snapshots the clipboard, writes text via pbcopy, simulates Cmd+V via CGEvent, then restores the original clipboard.

Install

Quick start (one command)

Requirements: macOS, Homebrew, Go 1.24+

git clone https://github.com/mattthewong/vox.git
cd vox
make start

make start handles everything:

Installs missing system deps (sox, whisper-cpp) via Homebrew
Downloads the default Whisper model (~150 MB) to ~/.local/share/whisper-cpp/
Builds bin/Vox.app and ad-hoc codesigns it
Launches Vox detached — it manages whisper-server itself

The first launch triggers two macOS permission prompts (Microphone and Accessibility); grant both and you're done.

Manual setup

brew install sox whisper-cpp
mkdir -p ~/.local/share/whisper-cpp
curl -L -o ~/.local/share/whisper-cpp/ggml-base.en.bin \
  "https://huggingface.co/ggerganov/whisper.cpp/resolve/main/ggml-base.en.bin"

Build

make build      # outputs bin/vox (bare binary)
make app        # outputs bin/Vox.app (macOS bundle, ad-hoc signed)
make install    # installs bin/vox to /usr/local/bin/vox

Lifecycle

make start    # ensures deps, builds, launches detached
make stop     # stops Vox (whisper child exits with it)
make status   # shows whether Vox is running

macOS permissions

On first run, macOS prompts for two permissions. Grant them to Vox (Vox.app in System Settings):

Microphone — System Settings > Privacy & Security > Microphone
Accessibility — System Settings > Privacy & Security > Accessibility

The .app bundle uses a stable CFBundleIdentifier (dev.vox.menubar), so permissions survive rebuilds.

Configuration

All via environment variables:

Variable	Default	Description
`VOX_HOTKEY`	`option+space`	Hotkey to trigger recording. Comma-separated for multiple.
`VOX_WHISPER_MODEL_ID`	`base.en`	Model ID (`tiny.en`, `base.en`, `small.en`, `medium.en`, `large-v3-turbo`)
`VOX_HOLD_TO_TALK`	`true`	`true` = hold to record, `false` = toggle on/off
`VOX_LANGUAGE`	(auto)	BCP-47 language code (e.g. `en`, `es`)
`VOX_VERBOSE`	`false`	Debug logging

Menubar toggles (mode, sounds, auto-paste, hotkey, model) persist to ~/Library/Application Support/Vox/preferences.json. Env vars > preferences > defaults.

Hotkey formats

VOX_HOTKEY="fn"                 # Fn / Globe key
VOX_HOTKEY="cmd+shift"          # Modifier-only
VOX_HOTKEY="option+space"       # Modifier + key
VOX_HOTKEY="ctrl+shift+d"       # Multiple modifiers + key
VOX_HOTKEY="fn,cmd+shift"       # Multiple hotkeys (either triggers)

Modifiers: ctrl, shift, option/alt, cmd/command Keys: a-z, 0-9, f1-f20, space, return, escape, tab, delete, arrow keys

AI features

AI-powered features require an Anthropic API key. Set via ~/.vox/config.yaml:

anthropic_api_key: sk-ant-...

Or distribute keys to a team via LaunchDarkly feature flags:

Flag	Controls
`vox-ai-postprocess`	AI grammar/punctuation cleanup
`vox-prompt-mode`	Voice-to-Claude prompt shortcuts
`vox-voice-commands`	Shell command execution via voice
`vox-context-aware`	App-aware formatting hints
`vox-ai-model`	Which Claude model to use
`vox-anthropic-key`	Team-managed API key distribution
`vox-streaming-overlay`	Floating transcription overlay

Flag precedence: LaunchDarkly > env var > config file > default.

Architecture

cmd/vox/main.go          — Entrypoint, event loops, signal/menubar shutdown wiring
internal/hotkey/          — CGEventTap-based global hotkey (modifier-only, fn, modifier+key)
internal/audio/           — Mic recording via ffmpeg/sox subprocess
internal/transcribe/      — Whisper HTTP client (multipart upload, auto endpoint detection)
internal/classify/        — Intent classifier (prefix matching → Dictation/Prompt/Command)
internal/claude/          — Anthropic Claude Messages API client
internal/prompt/          — Prompt mode executor (summarize, explain, rewrite, translate)
internal/commands/        — Voice command registry (gh, git, ldcli, go test, open)
internal/pipeline/        — Generic stage pipeline
internal/inject/          — Text injection (pbcopy + CGEvent Cmd+V + clipboard restore)
internal/ui/              — Menubar status item (Cocoa via cgo)
internal/config/          — Env var config + hotkey parsing
internal/flags/           — LaunchDarkly Go Server SDK v7 integration
internal/appctx/          — Frontmost app detection (NSWorkspace)
internal/format/          — Context-aware formatting hints per app category

Threading: The main goroutine owns NSApp's run loop. CGEventTap registers on the same loop, so menubar clicks and hotkey events are dispatched on the same thread. Recording, transcription, and injection run in goroutines.

Development

make build        # Build bare binary (bin/vox)
make app          # Wrap into bin/Vox.app (ad-hoc codesigned)
make test         # Run all tests
make test-short   # Skip integration tests
make lint         # go vet
make fmt          # gofmt
make run          # Build and run

Why

I was using Whisper Flow for speech-to-text but kept hitting rate limits on their free plan. Vox does the same thing — system-wide dictation with a hold-to-talk hotkey — but runs entirely on your machine with no external dependencies.

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 124 Commits
.github/workflows		.github/workflows
assets		assets
cmd/vox		cmd/vox
internal		internal
packaging		packaging
.gitignore		.gitignore
AGENTS.md		AGENTS.md
CLAUDE.md		CLAUDE.md
Makefile		Makefile
README.md		README.md
commands.yaml.example		commands.yaml.example
go.mod		go.mod
go.sum		go.sum

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

VOX

What it does

How it works

Install

Quick start (one command)

Manual setup

Build

Lifecycle

macOS permissions

Configuration

Hotkey formats

AI features

Architecture

Development

Why

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

VOX

What it does

How it works

Install

Quick start (one command)

Manual setup

Build

Lifecycle

macOS permissions

Configuration

Hotkey formats

AI features

Architecture

Development

Why

License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages