Skip to content

adjacentai/cream-typer

Folders and files

NameName
Last commit message
Last commit date

Latest commit

ย 

History

4 Commits
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 

Repository files navigation

Cream Typer

CI License: MIT Python 3.10+ Code style: ruff macOS

๐ŸŽ™๏ธ Voice translation in any direction. Locally on Apple Silicon.

Tap Caps Lock, speak any language, get any other. Whisper.cpp, no cloud, no GPU rental.

demo.mp4

๐ŸŒ What it does

You speak in any language. Whatever language you picked in the menu bar โ€” that's what comes out. Real examples:

You said (any language goes) Active mode What got pasted at the cursor
๐Ÿ‡ท๐Ÿ‡บ ยซะŸั€ะธะฒะตั‚, ะบะฐะบ ัƒ ั‚ะตะฑั ะดะตะปะฐ?ยป ๐Ÿ‡ฌ๐Ÿ‡ง en Hello, how are you doing?
๐Ÿ‡ฌ๐Ÿ‡ง "Let's ship it on Friday" ๐Ÿ‡ท๐Ÿ‡บ ru ะ”ะฐะฒะฐะน ะฒั‹ะบะฐั‚ะธะผ ะฒ ะฟัั‚ะฝะธั†ัƒ
๐Ÿ‡ฉ๐Ÿ‡ช "Kรถnnen wir morgen reden?" ๐Ÿ‡ฏ๐Ÿ‡ต ja ๆ˜Žๆ—ฅ่ฉฑใ›ใพใ™ใ‹๏ผŸ
๐Ÿ‡ฐ๐Ÿ‡ท "์•ˆ๋…•ํ•˜์„ธ์š”, ๋งŒ๋‚˜์„œ ๋ฐ˜๊ฐ‘์Šต๋‹ˆ๋‹ค" ๐Ÿ‡ธ๐Ÿ‡ฆ ar ู…ุฑุญุจู‹ุงุŒ ุชุดุฑูุช ุจู„ู‚ุงุฆูƒ
๐Ÿ‡ฏ๐Ÿ‡ต ใ€Œใ‚ณใƒผใƒ‰ใƒฌใƒ“ใƒฅใƒผใ‚ใ‚ŠใŒใจใ†ใ€ ๐Ÿ‡บ๐Ÿ‡ฆ uk ะ”ัะบัƒัŽ ะทะฐ ั€ะตะฒ'ัŽ ะบะพะดัƒ
anything ๐ŸŒ โ†’ English always English โ€” flagship mode

Why this works at all. Whisper's encoder produces a language-agnostic representation of audio โ€” meaning, not words. The decoder writes that meaning down in whichever language you asked for. Swap the language token, get a different output language. Same speech in, different writing out.

This is something the native task=translate flag can't do on large-v3-turbo โ€” that model was fine-tuned without translation data and the flag is broken. We sidestep it.

16 modes in the menu bar: 15 target languages + the flagship ๐ŸŒ โ†’ English (from any) shortcut. Click to switch, next dictation lands in the new language.


โœจ Where this beats typing

  • Multilingual teams โ€” speak Russian to your dev chat, English to your PR description, German to your designer Slack โ€” without changing keyboard layouts.
  • Coding while talking โ€” narrate the logic out loud, get clean prose in your PR, RFC, commit message, or Notion doc.
  • Faster than typing for non-English natives โ€” your brain composes in your native language, the text lands in whichever language the chat needs.
  • Voice notes during meetings โ€” instant text in Notes / Obsidian, no ยซrecord now, transcribe laterยป loop.
  • Translating quotes / tweets / headlines โ€” read out loud in any language, get any other language back.

โšก Architecture

Caps Lock (tap)  โ†’  ๐ŸŽ™๏ธ recordingโ€ฆ
Caps Lock (tap)  โ†’  whisper.cpp (localhost:8080)  โ†’  clipboard  โ†’  Cmd+V

Tap Caps Lock โ†’ speak โ†’ tap again โ†’ text appears at your cursor in any app: Slack, Notes, VS Code, your browser, terminal. The previous clipboard contents are saved and restored automatically.

Stage What happens
Hotkey Caps Lock (keycode 57) via Quartz CGEventTap. Toggle: 1st tap starts, 2nd tap stops. Doesn't block input to other applications.
Capture sounddevice at 16 kHz mono float32. WAV stays in memory (io.BytesIO) โ€” never written to disk.
STT POST to localhost:8080/inference (whisper.cpp). The large-v3-turbo-q5_0 model runs on Metal GPU.
Paste pbcopy + Cmd+V via CGEvent. The previous clipboard is saved and restored.

Latency: ~0.3-0.5 s for 10 s of speech on Apple Silicon (Metal GPU). Privacy: zero network egress โ€” audio never leaves your Mac. Cost: zero โ€” model is downloaded once (~550 MB), inference is free forever.

Platform: macOS (Apple Silicon flagship; Intel Mac works without Metal). Windows / Linux backends โ€” TBD.


Install

Requires Python 3.10+, cmake, and git. Everything else (whisper.cpp + the model) is installed by one command:

cd cream_typer
make setup

What make setup does:

  1. Creates venv/ and installs the package in editable mode with macOS- and dev-extras (pip install -e '.[macos,dev]').
  2. Clones whisper.cpp into vendor/whisper.cpp and builds whisper-server via cmake (Metal is enabled automatically on Apple Silicon).
  3. Downloads the ggml-large-v3-turbo-q5_0.bin model (~550 MB) into vendor/whisper.cpp/models/.

Already have whisper.cpp installed elsewhere? Override the paths via env (export them or pass them to make):

WHISPER_DIR=~/code/whisper.cpp make whisper
# or per-file:
WHISPER_SERVER=/path/to/whisper-server WHISPER_MODEL=/path/to/ggml.bin make whisper

Subtargets if something specific failed: make install, make whisper-build, make whisper-model. Full wipe โ€” make distclean.


Run

In two terminals:

# Terminal 1 โ€” whisper server (run once, keep it up)
make whisper

# Terminal 2 โ€” the app itself
make run

A ๐ŸŽ™ icon appears in the menu bar. Press Caps Lock, speak, press again โ€” text is pasted wherever the cursor is.


Development

make lint    # ruff check + format check
make fmt     # ruff format + ruff check --fix
make test    # pytest

CI runs the same commands on every PR (see .github/workflows/ci.yml). Architecture notes and how to add your own backend live in CONTRIBUTING.md.


macOS permissions

Permission Where to enable Why
Input Monitoring Settings โ†’ Privacy โ†’ Input Monitoring CGEventTap (Caps Lock interception)
Microphone Settings โ†’ Privacy โ†’ Microphone Audio capture
Accessibility Settings โ†’ Privacy โ†’ Accessibility CGEventPost (Cmd+V simulation)

Add Terminal (or iTerm) โ€” not Python itself โ€” since the app inherits permissions from its parent.

macOS will pop up the permission dialogs automatically the first time you run the app.


Project layout

cream_typer/
โ”œโ”€โ”€ src/                   # imported as `cream_typer` (see pyproject.toml)
โ”‚   โ”œโ”€โ”€ __init__.py        # __version__
โ”‚   โ”œโ”€โ”€ __main__.py        # `python -m cream_typer`
โ”‚   โ”œโ”€โ”€ app.py             # business logic, NO platform-specific code
โ”‚   โ”œโ”€โ”€ config.py          # constants and transcription modes
โ”‚   โ”œโ”€โ”€ recorder.py        # sounddevice โ†’ WAV in memory (io.BytesIO)
โ”‚   โ”œโ”€โ”€ transcriber.py     # HTTP client for the whisper.cpp server
โ”‚   โ””โ”€โ”€ backend/           # platform adapters (hotkey / paste / tray)
โ”‚       โ”œโ”€โ”€ __init__.py    # dispatch by sys.platform
โ”‚       โ”œโ”€โ”€ _base.py       # Protocol contracts for contributors
โ”‚       โ”œโ”€โ”€ _macos.py      # Quartz CGEventTap + Cmd+V + rumps  โœ…
โ”‚       โ”œโ”€โ”€ _windows.py    # pynput + pystray                   ๐Ÿšง TBD
โ”‚       โ””โ”€โ”€ _linux.py      # pynput + pystray (X11)             ๐Ÿšง TBD
โ”œโ”€โ”€ tests/                 # pytest smoke + transcriber mocks
โ”œโ”€โ”€ scripts/
โ”‚   โ””โ”€โ”€ whisper_server.sh  # alternative to `make whisper`
โ”œโ”€โ”€ .github/
โ”‚   โ”œโ”€โ”€ workflows/ci.yml   # lint + tests on macOS
โ”‚   โ”œโ”€โ”€ ISSUE_TEMPLATE/    # bug / feature
โ”‚   โ””โ”€โ”€ PULL_REQUEST_TEMPLATE.md
โ”œโ”€โ”€ pyproject.toml         # build / deps / ruff / pytest config
โ”œโ”€โ”€ Makefile               # setup / run / lint / fmt / test / whisper / distclean
โ”œโ”€โ”€ CHANGELOG.md           # Keep a Changelog
โ”œโ”€โ”€ CONTRIBUTING.md        # how to contribute
โ””โ”€โ”€ LICENSE                # MIT

Available languages

Pick a mode by clicking inside the ๐ŸŒ Languages submenu in the menu bar. The active one is checkmarked. The current set:

๐ŸŒ โ†’ English (from any)        โ† flagship "translate anything to English" shortcut
๐Ÿ‡ฌ๐Ÿ‡ง English        ๐Ÿ‡บ๐Ÿ‡ฆ ะฃะบั€ะฐั—ะฝััŒะบะฐ     ๐Ÿ‡ช๐Ÿ‡ธ Espaรฑol
๐Ÿ‡ฉ๐Ÿ‡ช Deutsch        ๐Ÿ‡ซ๐Ÿ‡ท Franรงais       ๐Ÿ‡ฎ๐Ÿ‡น Italiano
๐Ÿ‡ต๐Ÿ‡น Portuguรชs      ๐Ÿ‡ณ๐Ÿ‡ฑ Nederlands     ๐Ÿ‡ต๐Ÿ‡ฑ Polski
๐Ÿ‡ฏ๐Ÿ‡ต ๆ—ฅๆœฌ่ชž         ๐Ÿ‡จ๐Ÿ‡ณ ไธญๆ–‡           ๐Ÿ‡ฐ๐Ÿ‡ท ํ•œ๊ตญ์–ด
๐Ÿ‡น๐Ÿ‡ท Tรผrkรงe         ๐Ÿ‡น๐Ÿ‡ญ เน„เธ—เธข            ๐Ÿ‡ป๐Ÿ‡ณ Tiแบฟng Viแป‡t
๐Ÿ‡ธ๐Ÿ‡ฆ ุงู„ุนุฑุจูŠุฉ         ๐Ÿ‡ท๐Ÿ‡บ ะ ัƒััะบะธะน

Want a different set? Edit MODES, MODE_LABELS, and MENU_MODES in src/config.py. Whisper supports 99 languages โ€” adding any of them is a single line in three dicts, no UI code required.

Note on Thai (th): the large-v3-turbo model has noticeably degraded performance on Thai compared to large-v3 (Whisper's original quality matrix). It still works, but expect more errors. Vietnamese is fine.


Configuration

Everything lives in src/config.py:

HOTKEY_KEYCODE = 57              # Caps Lock. 60=Right Shift, 61=Right Option, 54=Right Cmd
SAMPLE_RATE    = 16000           # whisper.cpp expects 16 kHz
WHISPER_URL    = "http://localhost:8080/inference"
DEFAULT_MODE   = "en"            # "en" / "ru" / "translate" / "uk" / ...
MIN_RECORDING_SEC = 0.3          # shorter taps are ignored
CLIPBOARD_RESTORE_DELAY = 0.15   # delay before the previous clipboard is restored

The whisper-server's default language lives in the Makefile under WHISPER_LANG, but it's only a fallback: the client always passes language explicitly from MODES.


Troubleshooting

Symptom Cause Fix
Hotkey doesn't fire No Input Monitoring permission Settings โ†’ Privacy โ†’ Input Monitoring โ†’ Terminal
Text isn't pasted No Accessibility permission Settings โ†’ Privacy โ†’ Accessibility โ†’ Terminal
โš ๏ธ Whisper not running in the menu Server isn't up make whisper in a separate terminal
Empty transcription / โš ๏ธ Silence Too quiet, or shorter than MIN_RECORDING_SEC Speak louder / hold the tap longer than 0.3s
โš ๏ธ Too short Caps Lock tap shorter than MIN_RECORDING_SEC Hold longer โ€” this guards against accidental taps
Double-fire on Shift+Caps macOS clears AlphaShift on shift+caps Already handled in src/backend/_macos.py โ€” events with Shift are ignored
Wrong output language Wrong mode active Menu bar โ†’ ๐ŸŒ Languages โ†’ pick the right one

How this differs from other OSS apps

The space is crowded; here's what I'd pick depending on what you need:

Project Stack Hotkey Backend Notable
cream_typer (this) Python + rumps Caps Lock toggle whisper.cpp HTTP On-the-fly translation by swapping language instead of translate=true
foges/whisper-dictation Python + rumps Cmd+Option (toggle) openai-whisper (PyTorch) The well-known reference โ€” but it loads the model into RAM every time
pindrop Swift native hold-to-talk WhisperKit (Core ML) Fully native, the best perf/battery story
vocamac Swift hold-to-talk WhisperKit Tiny model bundled in the box, works out of the box
open-wispr Electron hold Globe (๐ŸŒ) whisper.cpp Friendly onboarding, but Electron
openwhispr Electron, cross-platform hold local + cloud (BYOK) Mac/Windows/Linux in one binary
GoWhisper Go hold whisper.cpp Built around terminal / Claude Code
AudioWhisper Swift hold OpenAI API / Gemini Not local โ€” sends audio to the cloud

Pick cream_typer when:

  • You want a toggle (tap-talk-tap) instead of holding a key.
  • You already have a whisper.cpp build and don't want yet another process bundled.
  • You want dictation in any language with auto-translation to English (and back).
  • You care about codebase size โ€” this is ~300 lines, the whole thing reads in 10 minutes.

Pick something else when:

  • You want a native macOS app without Python and cmake โ†’ pindrop or vocamac.
  • You need Windows/Linux today โ†’ openwhispr.
  • You'd rather not deal with config โ†’ open-wispr (better onboarding).

Acknowledgments

This project stands on the shoulders of others:

  • ggerganov/whisper.cpp โ€” the C++ inference engine that makes local Whisper fast enough on consumer hardware. The entire reason this project is possible.
  • OpenAI Whisper โ€” the speech-recognition model architecture and the language-agnostic encoder we exploit for translation.
  • rumps โ€” Pythonic macOS menu-bar bindings.
  • The dictation OSS scene โ€” foges/whisper-dictation, pindrop, open-wispr, vocamac โ€” for showing the path.

โญ Star history

Star History Chart


Made by

Built by NeCL โ€” AI engineering studio shipping local-first AI: production RAG, real-time voice agents, on-prem deployment.

Need something custom? neclco.com ยท Telegram @ownerai ยท neclcompany@gmail.com

About

๐ŸŽ™๏ธ Voice translation in any direction. Locally on Apple Silicon. Tap Caps Lock, speak your language, get any other. Whisper.cpp, no cloud, no GPU rental.

Topics

Resources

License

Contributing

Stars

Watchers

Forks

Packages

 
 
 

Contributors