Esper

Real-time voice transcription for macOS

Captures microphone audio, detects speech with Silero VAD, transcribes with Whisper large-v3-turbo via MLX, and optionally streams text to Telegram. Runs entirely on-device. No cloud. No internet.

Install

Download Esper.dmg from the latest release
Open the DMG and drag Esper to Applications
Open Esper from Applications or Launchpad

Requirements: macOS 14+ (Sonoma), Apple Silicon (M1/M2/M3/M4)

Developer Setup

CLI

# 1. Set up environment
python3 -m venv .venv && source .venv/bin/activate
pip install -r requirements.txt

# 2. Run
python -m src.realtime_demo

Select your mic from the device picker, speak, see transcriptions.

SwiftUI App (from source)

Open EsperApp/EsperApp.xcodeproj in Xcode and hit Cmd+R.

Global Hotkey

Press Option+Space from any app to toggle transcription on/off. No need to switch to the Esper window.

Customize the shortcut in Settings > Shortcuts.

How It Works

  Microphone
      |  16kHz mono, 512-sample frames
      v
  AudioCapture ──> audio_q ──> VadThread (Silero VAD)
                                    |
                                    | speech detected, silence sealed
                                    v
                                speech_q ──> WhisperTranscriber
                                                 |
                                                 | spawn-context subprocess
                                                 | (MLX Metal isolation)
                                                 v
                                            mlx-whisper
                                            large-v3-turbo
                                                 |
                                                 v
                                        TranscriptionUpdate
                                            |           |
                                            v           v
                                        Console    Telegram
                                        (CLI)      (optional)

Pipeline

Stage	What it does
AudioCapture	Continuous mic input via sounddevice (16kHz mono, 32ms frames)
VadThread	Silero VAD scores each frame. 300ms pre-buffer on speech onset. 300ms silence seals utterance.
WhisperTranscriber	Whisper large-v3-turbo in isolated subprocess (MLX Metal safety). 15s watchdog. Auto-restart on crash.
Hallucination filter	Discards high `no_speech_prob` or extreme `compression_ratio` outputs
Output	Per-utterance text to console, SwiftUI transcript view, and/or Telegram

Requirements

Requirement	Details
OS	macOS 14+ (Sonoma or later)
Chip	Apple Silicon (M1/M2/M3/M4)
Python	3.11+ via pyenv
Xcode	15+ (SwiftUI app only)

CLI Usage

python -m src.realtime_demo                    # Interactive device picker
python -m src.realtime_demo --device 0         # Specific mic
python -m src.realtime_demo --list-devices     # Show audio devices
python -m src.realtime_demo --telegram         # Send to Telegram
python -m src.realtime_demo --record           # Save speech audio to WAV

Telegram Setup

Copy the example config and fill in your credentials:

cp .env.example .env

Edit .env with your bot token and chat ID from @BotFather:

TELEGRAM_BOT_TOKEN=your-bot-token
TELEGRAM_CHAT_ID=your-chat-id

Run with --telegram, or configure in the SwiftUI app settings.

Reliability: Messages retry up to 3x with exponential backoff. Rate limits (429) are respected automatically. Non-retryable errors (401/403) fail immediately. Messages over 4096 chars are truncated.

SwiftUI App

Menu bar app with waveform icon. Click to start/stop listening.

Feature	Description
Device picker	Dropdown with refresh button for Bluetooth hot-connect
Audio level meter	Real-time RMS visualization
Transcript view	Scrolling per-utterance transcript
Telegram	Configure bot token + chat ID in settings
Auto-restart	Python process auto-restarts on crash (up to 3x)
Mic permission	Prompts for microphone access with clear error if denied
Command timeout	30s watchdog — auto-restarts if Python becomes unresponsive
Floating overlay	Always-on-top transcription text over any window (configurable)
Auto-updates	Sparkle 2 — checks every 24h, EdDSA-verified, installs and relaunches automatically

Auto-Updates

Feature	Details
Framework	Sparkle 2
Check interval	Every 24 hours (configurable in Settings)
Manual check	Menu bar > "Check for Updates..." or Settings > Updates
Verification	EdDSA signature verification
Install	Downloads, verifies, replaces, relaunches automatically

Floating Overlay

A transparent floating panel that shows live transcription text on top of all windows — no need to switch to the app to verify what was said.

Enable: Settings → Overlay → toggle ON, or click "Show Overlay" in the menu bar.

Setting	Options
Placement	Draggable (drag anywhere) or Fixed (6 preset positions)
Text Size	Small / Medium / Large
Text Color	5 presets + custom color picker
Lines	1–9 visible lines
Opacity	30–100%

The overlay is click-through in fixed mode — clicks pass to windows below. In draggable mode, grab and reposition it anywhere on screen. Position is remembered between sessions.

IPC: SwiftUI spawns python -m src.server as a subprocess. Commands go over stdin, events come back over stdout -- both as newline-delimited JSON (protocol v1). Thread-safe with NSLock, bounded event buffer (200), zombie process cleanup with SIGKILL fallback.

Model


Model	Whisper large-v3-turbo
Source	`mlx-community/whisper-large-v3-turbo`
Params	809M
Format	MLX (Metal-optimized)
Size	~1.5GB
Location	`models/whisper/` (local, gitignored)
Inference	~1-2s per utterance (M1 Max)
Model load	~2-3s (warm)
Compute	Apple Silicon GPU via Metal

No internet required at runtime. Model ships with the project.

Configuration

All tunables live in src/config.py:

Setting	Default	Purpose
`VAD_SPEECH_THRESHOLD`	0.3	Silero speech probability threshold
`VAD_SILENCE_THRESHOLD_MS`	300	Silence duration to seal utterance
`VAD_MIN_SPEECH_DURATION_MS`	100	Minimum utterance length
`VAD_MIN_ENERGY`	0.003	RMS floor for quiet speech
`WHISPER_LANGUAGE`	en	Transcription language
`WHISPER_SUBPROCESS_TIMEOUT_S`	15.0	Inference watchdog timeout
`WHISPER_NO_SPEECH_THRESHOLD`	0.8	Hallucination filter sensitivity

Project Structure

src/
  config.py                All tunables (single source of truth)
  audio_capture.py         Mic input via sounddevice
  vad.py                   Silero VAD thread (speech gating)
  transcriber.py           WhisperTranscriber + subprocess management
  whisper_worker.py        Whisper inference subprocess (MLX)
  telegram_sender.py       Per-utterance Telegram sender with 429 retry
  server.py                JSON-line server for SwiftUI app
  realtime_demo.py         CLI entry point

EsperApp/
  EsperApp/
    EsperApp.swift               App entry (MenuBarExtra + WindowGroup + OverlayController)
    ProcessBridge.swift          Python subprocess management (NSLock, bounded stream)
    TranscriptionEngine.swift    @Observable state + event consumption + watchdog
    TranscriptPanel.swift        Floating NSPanel (vibrancy, click-through, draggable)
    GlobalHotkey.swift           KeyboardShortcuts name definition (Option+Space default)
    Helpers/
      KeychainHelper.swift       Keychain read/write (for future use with Developer ID)
    Models/
      Protocol.swift             Event types + JSON parsing (protocol v1)
      AppSettings.swift          @AppStorage preferences (Telegram + Overlay + dev paths)
      OverlayPosition.swift      6-position enum with screen coordinate math
    Views/
      MainWindowView.swift       Primary window
      MenuBarView.swift          Menu bar controls (incl. overlay toggle)
      TranscriptView.swift       Scrolling transcript
      TranscriptOverlayView.swift  Floating overlay SwiftUI content + OverlayViewModel
      AudioLevelMeter.swift      Real-time audio meter
      StatusBadge.swift          Status indicator
      SettingsView.swift         App settings (6 tabs, sidebar navigation)
      ShortcutsTab.swift         Global hotkey configuration (KeyboardShortcuts.Recorder)
      OverlaySettingsTab.swift   Overlay config (position, appearance, preview)
  EsperAppTests/
    ProtocolTests.swift          25 XCTests for JSON event parsing
    OverlayPositionTests.swift   8 XCTests for position coordinate math
    AppSettingsOverlayTests.swift  6 XCTests for overlay settings defaults

models/
  silero_vad.onnx           Silero VAD model (2.2MB, tracked in git)
  whisper/                  Whisper large-v3-turbo (1.5GB, gitignored)

tests/                      119 Python tests
  test_config.py            Configuration constants + validation
  test_server_ipc.py        IPC protocol (--protocol-fd)
  test_server_commands.py   Server command handlers
  test_vad.py               VAD state machine
  test_vad_model.py         Silero ONNX wrapper
  test_audio_capture.py     Microphone capture + queue
  test_whisper_transcriber.py  Whisper subprocess lifecycle
  test_whisper_worker.py    Pipe protocol (JSON + numpy framing)
  test_telegram_sender.py   Telegram retry/truncation/validation
  test_integration.py       Full pipeline (VAD -> Whisper -> Telegram)
  test_cleanup.py           Dead code assertions
  test_frozen_paths.py      PyInstaller path resolution

Name		Name	Last commit message	Last commit date
Latest commit History 289 Commits
.github/workflows		.github/workflows
.planning		.planning
EsperApp		EsperApp
docs		docs
models		models
scripts		scripts
src		src
tests		tests
.env.example		.env.example
.gitignore		.gitignore
.venv		.venv
LICENSE		LICENSE
README.md		README.md
default.profraw		default.profraw
esper-server.spec		esper-server.spec
esper_server_entry.py		esper_server_entry.py
overlay_demo.swift		overlay_demo.swift
requirements-build.txt		requirements-build.txt
requirements.txt		requirements.txt
setup.sh		setup.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Esper

Install

Developer Setup

CLI

SwiftUI App (from source)

Global Hotkey

How It Works

Pipeline

Requirements

CLI Usage

Telegram Setup

SwiftUI App

Auto-Updates

Floating Overlay

Model

Configuration

Project Structure

License

About

Uh oh!

Releases 16

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Esper

Install

Developer Setup

CLI

SwiftUI App (from source)

Global Hotkey

How It Works

Pipeline

Requirements

CLI Usage

Telegram Setup

SwiftUI App

Auto-Updates

Floating Overlay

Model

Configuration

Project Structure

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 16

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages