Toggle voice dictation for Linux / Wayland / KDE Plasma via OpenAI Whisper. Press your hotkey → speak → press it again → text is copied to your clipboard and auto-pasted into the focused window.
- Stateless — ~250 lines of bash, no daemons of our own, no GUI. The KDE Shortcut runs the script, the script exits. Next press runs it again.
- Clipboard first, auto-paste second — the transcript always goes into the clipboard via
wl-copy. On top of that,ydotoolsimulates Ctrl+V in the focused window for zero-friction pasting. If the focus was elsewhere, the text is still one Ctrl+V away. - Works anywhere on Wayland — VS Code, Claude Code CLI, browser, Slack, terminal.
ydotoolgoes through/dev/uinputat the kernel level, bypassing Wayland's input-injection restrictions. - Layout-independent — sends raw Ctrl+V keycodes; apps pull Unicode text from the clipboard. Russian, English, and mixed text all work the same.
- Three-stage audio feedback — pip-pip-pip on start, pup-pup-pup on stop, ding-dong when the transcript is ready. You always know what's happening without looking.
- Persistent status bubble — one notification transforms Listening → Processing → Ready, never stacks.
- Never loses your speech — on API/network failure the WAV is preserved and a retry command is copied into your clipboard.
- Whisper hallucination filter — a post-processing regex scrubs the subtitle artifacts Whisper leaks on short audio («Редактор субтитров», «Subtitles by», «Thanks for watching»).
Tested on: Kubuntu 25.10 + KDE Plasma 6 + PipeWire. Should work on any Linux with Wayland + wl-clipboard + libnotify-bin.
Hotkey (KDE Custom Shortcut)
└─ kwispr.sh toggle
├─ start: ffmpeg -f pulse → ~/.cache/kwispr/TS.wav (via FIFO)
│ notify-send "🎙 Listening" (persistent)
└─ stop: write 'q' to FIFO → ffmpeg flushes WAV
notify-send replace → "⏳ Processing"
curl POST https://api.openai.com/v1/audio/transcriptions
model=whisper-1, temperature=0
sed-filter subtitle hallucinations
wl-copy $transcript (always)
ydotool key 29:1 47:1 47:0 29:0 (Ctrl+V, if enabled)
notify-send replace → "✅ Pasted" or "Ready (in clipboard)"
No daemons of our own. The script is stateless, launched from the KDE Shortcut and exits after each press. State lives in lockfiles under ~/.cache/kwispr/. The only persistent component is the optional ydotoold systemd service (for auto-paste).
| Package | Purpose | Source |
|---|---|---|
ffmpeg |
record from pulse (via FIFO + 'q' for proper flush) | APT |
curl |
OpenAI API request | APT |
jq |
parse JSON response | APT |
wl-clipboard |
wl-copy for clipboard |
APT |
libnotify-bin |
notify-send for the persistent status bubble |
APT |
pipewire-pulse |
pulse-compat layer on PipeWire | ships with Kubuntu 25.10 |
ydotool v1.0.4 (optional) |
auto-paste Ctrl+V via /dev/uinput |
GitHub release → ~/.local/bin/ |
Why ydotool is downloaded from GitHub instead of APT: the Ubuntu APT version is an ancient 0.1.8 without the key command or daemon. We need 1.0.4+.
git clone git@github.com:MaksBoi/kwispr.git
cd kwispr
./setup.sh # installs deps, prompts for ydotool install
cp .env.example .env
chmod 600 .env
# Put your OPENAI_API_KEY into .envWhen ydotool is installed by setup.sh:
- v1.0.4 downloaded into
~/.local/bin/ - udev rule
/etc/udev/rules.d/80-uinput.rulesmakes/dev/uinputowned by theinputgroup - your user is added to the
inputgroup (re-login required after this) - system-level systemd service
/etc/systemd/system/ydotoold.serviceis created
After re-login: systemctl status ydotoold should show active (running).
Pick any key or combination — a regular F5, a modifier combo, a multimedia key. Bind it through KDE:
- System Settings → Shortcuts → Shortcuts → Add New → Command/URL Shortcut
- Trigger: press the key (or combo) you want to use
- Action:
<absolute_path>/kwispr.sh - Apply
Some keyboards have a "multimedia mode" that remaps F-keys to send different keysyms (e.g. Meta+H instead of F5). KDE won't capture them as plain F-keys — you need to press them in that mode when setting the trigger. If KDE reports unusual keysyms, capture the real one first:
sudo apt install -y wev
wevPress the key inside the wev window, read the sym ... line, close.
The same physical key can send different keysyms depending on the active layout (e.g. Meta+H on en vs. Meta+Р on ru). Add a second trigger for each layout — all pointing to the same script.
| Step | What happens |
|---|---|
| Press hotkey | Recording starts. Persistent "🎙 Listening" notification. |
| (speak) | ffmpeg writes to ~/.cache/kwispr/TS.wav |
| Press hotkey again | ffmpeg shuts down gracefully (FIFO + 'q' → valid WAV). Notification → "⏳ Processing" |
| ~1-3 s | Whisper transcribes, hallucination filter cleans subtitle artifacts |
| done | Text lands in the clipboard, and ydotool simulates Ctrl+V in the focused window. Notification → "✅ Pasted" (auto-paste worked) or "Ready (in clipboard)" (paste manually with Ctrl+V) |
The text is always in the clipboard — even if the wrong window was focused, or auto-paste is disabled, Ctrl+V still works.
Minimum 1 second of audio is required — otherwise "⚠ Too short" (Whisper reliably hallucinates on <1s).
kwispr.sh(orkwispr.sh toggle) — start/stop recordingkwispr.sh retry <path.wav>— retry transcription of an old WAV file
OPENAI_API_KEY=sk-... # required
KWISPR_LANGUAGE= # empty = autodetect (ok for mixed ru/en); or 'ru', 'en'
KWISPR_AUTOPASTE=1 # 1 = auto Ctrl+V; 0 = clipboard only
KWISPR_SOUNDS=1 # 1 = audio cues; 0 = silent
# KWISPR_SOUND_START=/path.wav # optional custom sounds
# KWISPR_SOUND_STOP=/path.wav
# KWISPR_SOUND_READY=/path.wavAll recordings + transcripts are kept in ~/.cache/kwispr/. Files older than 30 days are deleted automatically on each run (only *.wav and *.txt, service files are not touched).
On API failure:
- the WAV stays in the archive
last-failed.txtholds the retry command- the retry command is also copied into the clipboard (paste it into a terminal)
Why FIFO + 'q' instead of SIGTERM for ffmpeg:
ffmpeg -f pulse on Wayland/PipeWire sometimes ignores SIGINT (ffmpeg trac #8369) and can leave a 0-byte WAV on SIGTERM (no trailer written). The documented graceful shutdown is writing q to stdin — done here via a FIFO held open by a background sleep.
Why ydotool (not wtype / xdotool):
wtypedoesn't work on KDE Plasma Wayland — KWin doesn't support the virtual-keyboard protocol (reference)xdotoolis X11 onlyydotooluses/dev/uinputat the kernel level, bypassing Wayland's input injection restrictions
Why keyboard layouts don't break:
ydotool sends raw keycodes (29=Ctrl, 47=V) — a stable hotkey regardless of layout. The target app pulls the text from wl-copy's clipboard, where the correct Unicode already lives. We never "type" the text through ydotool — so the known unicode-type bug doesn't apply.
Prompt-less transcription: Whisper was tested with various prompts. A bilingual prompt (e.g. "Voice dictation. Голосовая диктовка.") made the model occasionally translate speech into the prompt's language instead of transcribing as-is. We removed the prompt entirely — Whisper transcribes what it hears, and known hallucinations («Редактор субтитров», «Subtitles by ...», «Thanks for watching») are scrubbed by a post-processing regex.
Plus a minimum 1 second of audio before the API call (below that — immediate "Too short", saving an API round-trip).
| Symptom | Cause | Fix |
|---|---|---|
| "No .env" | Config not created | cp .env.example .env; chmod 600 .env |
| "OPENAI_API_KEY not set" | Placeholder instead of a key | Put a real sk-... into .env |
| "Too short" on normal speech | pulse hadn't opened yet (0.05s sleep too short) | Increase the sleep in start_recording |
| Records but doesn't paste | ydotoold not running or /dev/uinput not accessible |
systemctl status ydotoold + ls -la /dev/uinput (should be crw-rw---- root input) |
| Pasted into wrong window | Focus was elsewhere when you pressed the hotkey | Place cursor in the target before pressing the hotkey to stop |
| "API 401" | Wrong API key | Verify on platform.openai.com |
| "API 429" | Rate limit / billing | Top up OpenAI balance |
| Empty clipboard after ✅ | Wayland clipboard glitch | systemctl --user restart xdg-desktop-portal |
| No notifications | libnotify-bin missing |
sudo apt install libnotify-bin |
| Recording doesn't start | ffmpeg can't see mic | pactl list sources short — check default |
| Green mic LED stays on after a crash | Stale ffmpeg process | pkill -f "ffmpeg.*pulse" |
In .env: KWISPR_AUTOPASTE=0. Text stays in the clipboard — paste manually with Ctrl+V.
sudo systemctl disable --now ydotoold
sudo rm /etc/systemd/system/ydotoold.service
sudo rm /etc/udev/rules.d/80-uinput.rules
sudo gpasswd -d "$USER" input
rm ~/.local/bin/ydotool ~/.local/bin/ydotooldThen set KWISPR_AUTOPASTE=0 in .env.
- Local Whisper as a fallback (cloud-only for now)
- Push-to-talk mode (toggle only)
- GUI / tray icon (the single persistent notification is enough)
Note: KDE Custom Shortcuts can't bind to mouse buttons directly. If you want a mouse-button trigger, use a tool like
input-remapperto remap the button to a keyboard shortcut, then bind that shortcut to kwispr.
MIT — see LICENSE.
