Skip to content

drajb/whisper-local

Β 
Β 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

516 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Whisper Local

Free, Open-Source, 100% Offline AI Dictation for Windows & macOS

Press a hotkey. Speak. Your words appear at the cursor. No cloud. No subscription. No telemetry. Powered by OpenAI Whisper.

Tests Release PyPI Python 3.11+ License: MIT Platform Code of Conduct

Quick Start Β· Features Β· vs. Wispr Flow / Dragon Β· Voice Commands Β· Contributing

Whisper Local β€” press, speak, type

Want a real screen-recording demo here? See docs/demo-recording.md β€” drop a docs/demo.gif in and uncomment the line below.

Whisper Local is a free, open-source, fully offline alternative to Wispr Flow, Dragon, and Otter for power users who want AI dictation without sending audio to the cloud. Built on faster-whisper (CTranslate2), it delivers push-to-talk speech-to-text in any application β€” chat apps, code editors, browsers, terminals, design tools, anywhere a cursor blinks. Self-hosted, hackable, MIT-licensed.

Looking for: Wispr Flow alternative, offline voice typing, local Whisper dictation, free Dragon NaturallySpeaking alternative, privacy-first speech-to-text, Windows voice dictation without cloud, macOS push-to-talk transcription. You found it.


🌟 Why this exists

Most AI dictation tools are great β€” until you check the privacy policy. Your audio goes to a server, gets processed, and (sometimes) stored. You pay a monthly fee or get cut off.

Whisper Local exists because you shouldn't have to choose between accuracy and privacy.

  • πŸ”’ Your voice never leaves your machine β€” not even metadata
  • πŸ†“ Free forever β€” no account, no API key, no subscription
  • πŸ”Œ Works offline, air-gapped, after the internet is gone
  • πŸ› οΈ Fork it, hack it, ship your own version β€” MIT licensed
  • πŸ’‘ Same Whisper model quality as cloud services, running on your own GPU

This is a community tool, not a product. There's no support SLA, no roadmap committee, no marketing. If it's useful to you, great. If something's broken, PRs are welcome.

A note from the maintainer: I built this for myself, then realised it might help others. So I'm releasing it for anyone who wants it β€” no strings attached. Use it. Fork it. Rebrand it. Ship your own version. The only thing I ask is that you keep the LICENSE attribution intact (to Pin Wang, the original upstream author, and to me as the fork maintainer). If you build something cool on top of it, I'd love to hear about it via a Discussion β€” but you don't owe anyone anything.

β€” Rohit Burani


✨ Why Whisper Local?

Feature Whisper Local Wispr Flow Dragon / Dragon Anywhere Otter.ai Windows Speech Recognition
Runs 100% offline βœ… ❌ ❌ (Anywhere) ❌ βœ…
Audio never leaves your machine βœ… ❌ ❌ ❌ partial
Free / open source βœ… ❌ ❌ ($$$/yr) ❌ ($$/mo) βœ…
Modern AI accuracy (Whisper) βœ… βœ… partial βœ… ❌
Works in any app via hotkey βœ… βœ… partial ❌ partial
Customisable voice commands βœ… partial βœ… ❌ ❌
Push-to-talk + auto-paste + auto-send βœ… βœ… partial ❌ ❌
GPU acceleration (NVIDIA & AMD) βœ… n/a n/a n/a ❌
AI rephrase / transforms (Ollama) βœ… βœ… ❌ ❌ ❌
Hackable / MIT licensed βœ… ❌ ❌ ❌ ❌
No account required βœ… ❌ ❌ ❌ βœ…

🎯 Features

  • πŸŽ™οΈ Global push-to-talk hotkey β€” start recording from any app with Ctrl+Win (Windows) or Fn+Ctrl (macOS)
  • ⚑ Pre-roll buffer + warmup β€” captures the 500 ms before you press the key and pre-loads Whisper at boot, so the first word is never clipped and the first recording feels instant
  • πŸ”΅ Floating level overlay β€” a small pill at the screen edge shows you're being heard, with the transcript appearing next to the level bar (Wispr Flow–style). Optional real-time streaming preview shows words as you speak.
  • πŸ“ Inline voice formatting β€” say "comma", "period", "question mark", "new paragraph", "open quote", etc. mid-sentence
  • πŸ€– AI rephrase β€” dedicated Ctrl+Shift+Win hotkey: select text, hold, speak your instruction, release β€” local Ollama rewrites it in place
  • 🌐 Translation mode β€” speak any language, get English; tray β†’ Profile β†’ Translate
  • πŸ” Continuous dictation mode β€” for long-form notes, the app auto-restarts recording after each delivery
  • πŸ“‹ Fallback window β€” if no text field is focused, the transcript appears in a small window (pre-selected, copy button, already on clipboard)
  • ⏸ Pause-all hotkey β€” Ctrl+Alt+Win disables every Whisper Local hotkey until you press it again
  • πŸ“‹ Auto-paste at cursor β€” transcript lands wherever you're typing, optionally followed by Enter (auto-send)
  • πŸ”’ 100 % local & private β€” no network calls during use; Whisper models cached on disk
  • πŸš€ GPU acceleration β€” NVIDIA CUDA and AMD ROCm supported, CPU works out of the box
  • πŸ—£οΈ Voice commands β€” say a trigger phrase to send a hotkey, type pre-written text, or run a shell command
  • πŸ” Hot-reload β€” edit commands.yaml and your change applies on the next transcription, no restart
  • 🩺 Built-in diagnostics β€” whisper-local --doctor checks audio devices, model cache, hotkeys, and recent errors
  • πŸŽ›οΈ Profiles β€” switch between Dictation / Chat / Code / Notes presets from the tray
  • πŸͺŸ Per-app rules β€” different behaviour per foreground app (auto-send in Slack, copy-only in VS Code, suppress in 1Password)
  • 🧹 Optional LLM cleanup β€” pipe transcripts through a local Ollama model for punctuation / capitalisation polish (off by default, fully local)
  • πŸ“œ Recent transcriptions β€” last 10 results in the tray menu, click to copy back
  • πŸ”§ Settings backup/restore β€” --export-settings / --import-settings for portability
  • πŸ–₯️ Settings UI β€” whisper-local --settings opens a GUI settings window (no YAML editing required)
  • πŸ“œ Transcript history β€” whisper-local --history opens a searchable log of everything you've dictated
  • πŸ”” Opt-in update notifications β€” daily GitHub release check, fully offline by default (update_check.enabled: true to opt in)
  • 🎚️ Noise suppression β€” spectral gating via noisereduce, off by default (pip install 'whisper-local[noise]')
  • 🩺 --selftest β€” one-command sanity check (mic, model, transcription, clipboard) β€” perfect for first-launch
  • 🎯 Hotkey cheat sheet β€” whisper-local --cheat-sheet or tray menu β€” shows your current configured hotkeys at a glance
  • πŸ“¦ --bundle-logs β€” zip up redacted logs + diagnostics for bug reports with one command
  • 🌐 Local OpenAI-compatible API β€” whisper-local --serve exposes POST /v1/audio/transcriptions on localhost:7777 for Cursor, Open WebUI, anything that speaks OpenAI Whisper API
  • πŸ›‘οΈ Auto-recovery β€” silently reconnects when a USB mic is unplugged mid-recording
  • πŸ›‘οΈ Crash reports β€” uncaught errors write a self-contained dump to disk
  • πŸͺŸ System tray UI β€” model selection, mic selection, profile switch, diagnostics
  • 🍎 Cross-platform β€” Windows 10+, macOS

πŸš€ Quick Start

Install (Python 3.11–3.13)

git clone https://github.com/drajb/whisper-local.git
cd whisper-local
pip install -e .

Launch

Terminal whisper-local (or wl for short)
Double-click whisper-local.cmd (Windows)
Start on login Tick Start on login in the tray menu (or the first-run welcome), or run whisper-local --enable-autostart. Disable anytime the same way.

First launch downloads the default base Whisper model (~141 MB) into your HuggingFace cache. After that, everything runs offline. (Prefer a smaller/faster download? Set whisper.model: tiny β€” ~75 MB.)

Use it

Action Windows macOS
Hold to record Ctrl+Win Fn+Ctrl
Stop & paste release key (push-to-talk) or Ctrl release or Fn
Stop & auto-send (Enter) Alt Option
Cancel Esc Shift
Voice command mode Alt+Win Fn+Command

Verify everything works

whisper-local --doctor

Runs through Python version, dependencies, config validation, audio devices, model cache, hotkey backend, and recent log errors. Exit 0 = clean.


πŸ—£οΈ Voice Commands

Speak a trigger to run keyboard shortcuts, type snippets, or launch programs. Defined in:

  • Windows: %APPDATA%\whisperkey\commands.yaml
  • macOS: ~/.whisperkey/commands.yaml
commands:
  # Send a keyboard shortcut
  - trigger: "undo"
    hotkey: "ctrl+z"

  # Deliver pre-written text
  - trigger: "my email"
    type: "user@example.com"

  # Run a shell command
  - trigger: "open notepad"
    run: 'notepad.exe'

Edits hot-reload β€” no app restart required. See docs/voice-commands.md for the full guide.

⚠️ Voice commands with run: execute through your system shell with your user privileges. Only add commands you trust.


⚑ GPU Acceleration

On first launch, Whisper Local detects your GPU and offers one-press install of the required runtime libraries. Supports NVIDIA CUDA and AMD ROCm.

For manual setup or AMD RDNA 1, see docs/gpu-setup.md.


🌐 Local OpenAI-Compatible API

Whisper Local doubles as a drop-in local replacement for the OpenAI Whisper API β€” fully offline. Point any tool that speaks POST /v1/audio/transcriptions at it (Cursor, VS Code Continue, Open WebUI, n8n, custom scripts, anything else).

whisper-local --serve            # listens on http://127.0.0.1:7777
whisper-local --serve --serve-port 8080
# Drop-in compatible with the OpenAI SDK:
curl -X POST http://127.0.0.1:7777/v1/audio/transcriptions \
  -F file=@audio.wav -F model=whisper-1 -F response_format=text

Same Whisper model you use for dictation. Same GPU. No API key. No rate limit. No outgoing traffic.


πŸŽ›οΈ Profiles

Switch between presets from the tray icon β†’ Profile:

Profile Behaviour
Dictation General-purpose voice typing, auto-paste on
Chat Push-to-talk, auto-paste + auto-send via Alt
Code Copy-only mode for editors, never auto-sends
Notes Quiet copy-to-clipboard, voice commands disabled

Edit or add new profiles in %APPDATA%\whisperkey\profiles.yaml.


πŸͺŸ Per-app rules

Different apps want different behaviour. Whisper Local detects the foreground window before delivering each transcription and matches it against rules in %APPDATA%\whisperkey\app_rules.yaml:

rules:
  # Chat apps: send the message immediately
  - match: ["slack.exe", "discord.exe"]
    auto_send: true

  # Code editors: never auto-send, copy only
  - match: ["code.exe", "cursor.exe"]
    auto_paste: false

  # Password managers: skip delivery entirely
  - match: ["1password.exe", "bitwarden.exe"]
    suppress: true

Hot-reloads β€” edit and the next transcription picks it up.

🧹 Optional LLM cleanup

If you have Ollama running locally, Whisper Local can pipe each transcript through a small local model for punctuation and capitalisation polish. Off by default and fully local β€” set postprocess.ollama.enabled: true in user_settings.yaml to enable.

postprocess:
  capitalize_first: true        # works without Ollama
  ensure_punctuation: true      # works without Ollama
  strip_filler_words: true      # works without Ollama
  ollama:
    enabled: false              # set true to opt in
    endpoint: http://localhost:11434
    model: llama3.2
    timeout: 5

βš™οΈ Configuration

Local settings live at:

  • Windows: %APPDATA%\whisperkey\user_settings.yaml
  • macOS: ~/.whisperkey/user_settings.yaml

Delete the file and restart to reset to defaults. Highlights:

Option Default Notes
whisper.model base Any model from whisper.models. tiny = smallest/fastest, larger = more accurate/slower
whisper.device cpu cpu or cuda (NVIDIA/AMD)
whisper.compute_type int8 int8/float16/float32
whisper.language auto Auto-detect or specific language code
whisper.hotwords [] Words the model should favour β€” names, jargon
hotkey.recording_hotkey ctrl+win Configurable
hotkey.recording_mode push_to_talk push_to_talk (hold to talk) or toggle
vad.vad_realtime_enabled true Auto-stop on silence
clipboard.auto_paste true false = copy only
clipboard.delivery_method paste paste (Ctrl+V) or type (direct injection)
voice_commands.enabled true Enable command mode
audio.host null WASAPI recommended on Windows for low latency

Full reference: config.defaults.yaml.


πŸ› οΈ CLI Reference

whisper-local                      # Run the app (or use `wl`)
whisper-local --setup              # Interactive setup wizard (model, mode, mic)
whisper-local --doctor             # Run diagnostics
whisper-local --stats              # Transcription history & time saved
whisper-local --version            # Print version
whisper-local --quit               # Stop the running instance
whisper-local --export-settings DIR        # Back up user_settings + commands
whisper-local --import-settings DIR        # Restore from a backup
whisper-local --export-transcripts FILE    # Dump history (.txt/.md/.csv)
whisper-local --import-vocab FOLDER        # Mine a folder for hotwords
whisper-local --settings           # Open the settings GUI (no YAML editing required)
whisper-local --history            # Browse and search transcript history
whisper-local --cheat-sheet        # Show your currently configured hotkeys
whisper-local --selftest           # Run an automated self-test (mic, model, transcription)
whisper-local --bundle-logs        # Create a redacted diagnostic zip for bug reports
whisper-local --serve              # Run a local OpenAI-compatible Whisper API on :7777
whisper-local --enable-autostart   # Launch automatically at login (--disable-autostart to undo)
whisper-local --test               # Run a separate test instance (own mutex)

Launching while an instance is already running takes over β€” the old one is replaced cleanly, no manual quit needed.


πŸ—οΈ How it works

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚  global-hotkeys /   β”‚  β”‚   sounddevice +  β”‚  β”‚  faster-whisper /   β”‚
β”‚  NSEvent (macOS)    │─▢│  500ms ring buf  │─▢│  ctranslate2 (GPU)  β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜  β”‚  + TEN VAD       β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                         β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜             β”‚
                                                          β–Ό
                         β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
                         β”‚  Voice command   │◀─│  Transcribed text   β”‚
                         β”‚  matcher         β”‚  β”‚                     β”‚
                         β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                                                          β–Ό
                                                β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
                                                β”‚  ctypes SendInput / β”‚
                                                β”‚  Quartz CGEvent     β”‚
                                                β”‚  β†’ cursor           β”‚
                                                β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

πŸ”’ Privacy pledge

Whisper Local makes the following network calls and no others:

  1. First launch only: downloads the Whisper model from huggingface.co into your local cache.
  2. GPU onboarding (opt-in): if you accept the GPU setup prompt, pip install pulls CUDA / ROCm runtime packages from PyPI / repo.radeon.com.

After setup, zero network traffic. Confirm by running whisper-local --doctor and inspecting the source β€” every network entry point lives in onboarding.py and is gated behind explicit user prompts.


πŸ“¦ Tech stack

faster-whisper Β· ctranslate2 Β· sounddevice Β· ten-vad Β· pyperclip Β· pystray Β· ruamel.yaml Β· playsound3 Windows-only: global-hotkeys Β· pywin32 Β· ctypes SendInput macOS-only: pyobjc-framework-Quartz Β· pyobjc-framework-ApplicationServices


πŸ“š Documentation

Hit a wall? Run whisper-local --doctor or whisper-local --selftest first β€” they catch 90% of issues.


🀝 Contributing

Contributions of all kinds are welcome β€” bug fixes, new features, docs improvements, or just opening an issue with a clear reproduction. This project is maintained on a best-effort basis with no SLA; please be patient with response times.

git clone https://github.com/drajb/whisper-local.git
pip install -e .
python -m unittest tests.test_smoke   # smoke suite β€” should report OK

See CONTRIBUTING.md for the full guide and CODE_OF_CONDUCT.md for community standards. By contributing you agree your code will be MIT licensed. Found a security issue? See SECURITY.md β€” please don't open a public issue.

Good first issues are tagged here. The full credit list is in AUTHORS.md.


β˜• Support

Whisper Local is free and always will be. If it saves you time or a monthly subscription, consider starring the repo and sharing it with people who'd find it useful β€” it helps the project grow.

No pressure. Starring the repo and sharing it with people who'd find it useful is just as helpful.


πŸ™ Credit

Forked from whisper-key-local by Pin Wang β€” huge thanks to the original work that made this fork possible. The full list of credits, including every open-source library Whisper Local builds on, is in AUTHORS.md.

MIT licensed; original copyright preserved in LICENSE.


⭐ If you find this useful, please star the repo β€” it helps others discover it.

Maintained by Rohit Burani (@drajb)

Website Β· GitHub Β· Discussions Β· Report a bug Β· Request a feature

Tags: whisper Β· dictation Β· speech-to-text Β· voice-typing Β· transcription Β· ai-dictation Β· local-ai Β· offline Β· push-to-talk Β· voice-recognition Β· accessibility Β· faster-whisper Β· privacy Β· self-hosted Β· wispr-flow-alternative Β· dragon-naturallyspeaking-alternative Β· otter-alternative Β· ollama Β· voice-commands Β· windows Β· macos Β· python

About

Free, open-source, 100% offline AI dictation for Windows & macOS. Wispr Flow / Dragon alternative. Push-to-talk hotkey, voice commands, transforms, sub-second latency. Powered by Whisper. No cloud, no subscription, no telemetry.

Topics

Resources

License

Code of conduct

Contributing

Security policy

Stars

Watchers

Forks

Packages

 
 
 

Contributors

Languages

  • Python 96.9%
  • PowerShell 2.7%
  • Other 0.4%