Speak to Type and Edit
Voice keyboard for iOS. Works in every app.
On-device, cloud, or self-hosted - no limits.
Website • Self-Hosting Guide • Privacy Policy
- Works in every app. Tap the mic, speak, watch text land in whatever app you're in - Telegram, Mail, Notes, the search bar, anywhere a keyboard appears.
- Self-hosted in minutes.
docker compose up -dand paste your server's IP. Your hardware, your models, your data. - Works with any Whisper-compatible server. The gateway speaks the OpenAI transcription API (
POST /v1/audio/transcriptions). Point it at any endpoint that implements it. - On-device. Whisper runs locally on your iPhone via WhisperKit. No network, no server, nothing leaves the device.
- AI transcript cleanup. Wire any OpenAI-compatible LLM - OpenAI, Groq, Ollama, Anthropic - into the gateway to strip filler words and fix punctuation before text reaches the app. BYO prompt.
- End-to-end encrypted. AES-256-GCM with X25519 key exchange between the app and the gateway. Same primitives used by Signal and WireGuard.
- Zero tracking in the app. No analytics, no telemetry, no data collection. Audit the source yourself.
- Free and unlimited. On-device and self-hosted modes have no caps, no word limits, no expiry.
The Diction app streams audio over a WebSocket connection, so you need the Diction Gateway in front of whatever speech model you run. The gateway handles the WebSocket protocol, end-to-end encryption, optional LLM cleanup, and model routing.
Full walkthrough with screenshots: How to Set Up Diction - the self-hosted speech-to-text alternative to Wispr Flow
Requirements:
- Any machine that can run Docker: Mac, Linux box, NUC, home server, VPS. Apple Silicon works (via Rosetta).
- iPhone running iOS 17.0 or later.
Create a folder for the stack and save this as docker-compose.yml:
services:
whisper-small:
image: fedirz/faster-whisper-server:latest-cpu
container_name: diction-whisper-small
restart: unless-stopped
volumes:
- whisper-models:/root/.cache/huggingface
environment:
WHISPER__MODEL: Systran/faster-whisper-small
WHISPER__INFERENCE_DEVICE: cpu
gateway:
image: ghcr.io/omachala/diction-gateway:latest
platform: linux/amd64
container_name: diction-gateway
restart: unless-stopped
ports:
- "8080:8080"
depends_on:
- whisper-small
environment:
DEFAULT_MODEL: small
volumes:
whisper-models:The whisper-models volume persists the model weights (~500 MB for small) so they survive container rebuilds. DEFAULT_MODEL: small maps to the service named whisper-small - see Swap the Speech Model if you change the model.
docker compose up -dFirst run pulls the images and downloads model weights - give it 2–3 minutes.
docker compose logs -f # watch progress
docker compose ps # check statusExpected:
NAME STATUS
diction-gateway Up 30 seconds
diction-whisper-small Up 2 minutes (healthy)
| Error | Fix |
|---|---|
pull access denied on gateway image |
docker logout ghcr.io and retry |
exec format error on Apple Silicon |
Enable Rosetta in Docker Desktop → Settings → General |
health: starting for > 3 minutes |
Model still downloading - docker compose logs -f whisper-small |
| Gateway exits immediately | Whisper container failed - check its logs |
Generate a test audio file (macOS):
say -o test.aiff "Hello from my home server"Or record a voice memo on your phone and AirDrop it over.
curl -X POST http://localhost:8080/v1/audio/transcriptions \
-F "file=@test.aiff" \
-F "model=small"{"text":"Hello from my home server."}# Check timing headers
curl -sS -D - -o /dev/null \
-X POST http://localhost:8080/v1/audio/transcriptions \
-F "file=@test.aiff" -F "model=small" | grep -i dictionX-Diction-Whisper-Ms shows the speech model's inference latency.
| Response | Cause |
|---|---|
| Connection refused | Gateway not running - docker compose ps |
| 504 Gateway Timeout | Whisper still loading - wait 60s |
| 404 Not Found | URL typo - path must be exactly /v1/audio/transcriptions |
| OOM / container crash | Model too large for available RAM |
macOS:
ipconfig getifaddr en0
# or
ifconfig | grep 'inet ' | grep -v 127.0.0.1Linux:
hostname -I | awk '{print $1}'Windows:
ipconfig | findstr IPv4Pick the 192.168.x.x or 10.x.x.x address. Ignore anything starting with 100. - that's Tailscale.
Set a DHCP reservation in your router so the IP doesn't change on reboot. Or use Tailscale for a stable address that follows the machine anywhere.
Install Diction on your iPhone. On first launch:
- Settings → General → Keyboard → Keyboards → Add New Keyboard → Diction
- Tap Diction in the list → enable Allow Full Access
- Grant microphone access when prompted
Point it at your server:
- Open Diction → Preferences → Mode → Self-Hosted
- Enter your endpoint:
http://192.168.1.42:8080(your IP from Step 4) - Tap Test connection - you should get a green check within a second
To dictate: open any app, tap a text field, long-press the globe icon (bottom-left of the iOS keyboard), pick Diction, tap the mic, speak, release.
Tailscale (recommended)
Tailscale creates a private WireGuard mesh between your devices. Install it on the server and iPhone, sign in to the same account, and use the 100.x.x.x Tailscale IP as your Diction endpoint. Works on cellular, café WiFi, anywhere. Free for personal use.
Cloudflare Tunnel (public URL, no port forwarding)
Add to your compose file:
cloudflared:
image: cloudflare/cloudflared:latest
container_name: diction-cloudflared
restart: unless-stopped
command: tunnel --no-autoupdate run
environment:
TUNNEL_TOKEN: "${CLOUDFLARE_TUNNEL_TOKEN}"Create a tunnel in the Cloudflare Zero Trust dashboard, grab the token, add it to .env, route the public hostname to http://gateway:8080. Free tier. Note: transcripts pass through Cloudflare's network (HTTPS-encrypted, but a third party is in the path).
ngrok (quick testing)
ngrok http 8080Free tier URLs change on restart - good for a demo, not daily use.
Change two lines in your compose file:
DEFAULT_MODEL |
Service name | WHISPER__MODEL |
RAM | Notes |
|---|---|---|---|---|
small |
whisper-small |
Systran/faster-whisper-small |
~850 MB | Best for CPU |
medium |
whisper-medium |
Systran/faster-whisper-medium |
~2.1 GB | More accurate, slower on CPU |
large-v3-turbo |
whisper-large-turbo |
deepdml/faster-whisper-large-v3-turbo-ct2 |
~2.3 GB | Best with NVIDIA GPU |
parakeet-v3 |
parakeet |
- (baked into image) | ~2 GB | NVIDIA GPU, 25 European languages |
Both DEFAULT_MODEL and the service name must match the table - the gateway resolves backends by Docker hostname. A mismatch returns 404 on every request.
docker compose up -d # recreates only the changed containerInstall the NVIDIA Container Toolkit on the host first.
Parakeet transcribes a 5-second clip in well under a second on a consumer GPU.
| Whisper Large-v3 | Parakeet TDT 0.6B v3 | |
|---|---|---|
| WER (English) | 7.4% | ~6.3% |
| Latency (GPU) | Under 2s | Sub-second |
| VRAM (INT8) | ~2.3 GB | ~2 GB |
| Languages | 99 | 25 European |
Supported languages: English, Bulgarian, Croatian, Czech, Danish, Dutch, Estonian, Finnish, French, German, Greek, Hungarian, Italian, Latvian, Lithuanian, Maltese, Polish, Portuguese, Romanian, Slovak, Slovenian, Spanish, Swedish, Russian, Ukrainian.
For languages outside this list, use Option B.
services:
parakeet:
image: ghcr.io/achetronic/parakeet:latest-int8
container_name: diction-parakeet
restart: unless-stopped
deploy:
resources:
reservations:
devices:
- driver: nvidia
count: 1
capabilities: [gpu]
gateway:
image: ghcr.io/omachala/diction-gateway:latest
platform: linux/amd64
container_name: diction-gateway
restart: unless-stopped
ports:
- "8080:8080"
depends_on:
- parakeet
environment:
DEFAULT_MODEL: parakeet-v3Model weights are baked into the image - no download on first start. Or use the profile from this repo:
docker compose --profile parakeet up -dservices:
whisper-large-turbo:
image: fedirz/faster-whisper-server:latest-cuda
container_name: diction-whisper-large-turbo
restart: unless-stopped
volumes:
- whisper-models:/root/.cache/huggingface
environment:
WHISPER__MODEL: deepdml/faster-whisper-large-v3-turbo-ct2
WHISPER__INFERENCE_DEVICE: cuda
WHISPER__COMPUTE_TYPE: float16
deploy:
resources:
reservations:
devices:
- driver: nvidia
count: 1
capabilities: [gpu]
gateway:
image: ghcr.io/omachala/diction-gateway:latest
platform: linux/amd64
container_name: diction-gateway
restart: unless-stopped
ports:
- "8080:8080"
depends_on:
- whisper-large-turbo
environment:
DEFAULT_MODEL: large-v3-turbo
volumes:
whisper-models:First boot downloads ~1.6 GB of model weights into the volume. Subsequent starts are instant.
Keep it. Use CUSTOM_BACKEND_URL to put the Diction Gateway in front of your existing server for WebSocket streaming and end-to-end encryption:
services:
gateway:
image: ghcr.io/omachala/diction-gateway:latest
platform: linux/amd64
container_name: diction-gateway
restart: unless-stopped
ports:
- "8080:8080"
environment:
CUSTOM_BACKEND_URL: http://your-existing-server:8000
CUSTOM_BACKEND_MODEL: Systran/faster-whisper-small| Variable | Description |
|---|---|
CUSTOM_BACKEND_AUTH |
Authorization header forwarded to your backend, e.g. Bearer sk-xxx |
CUSTOM_BACKEND_NEEDS_WAV |
Set to "true" if your backend only accepts WAV - the gateway converts with ffmpeg |
CUSTOM_BACKEND_CANONICAL_ID |
HuggingFace-style ID advertised via /v1/models (default: CUSTOM_BACKEND_MODEL) |
The gateway passes transcripts through any OpenAI-compatible LLM before returning them. You say "so um basically the meeting went well and uh they agreed to the timeline." The LLM returns "The meeting went well. They agreed to the timeline."
Enable the AI Companion toggle in the app. The gateway forwards the transcript to {LLM_BASE_URL}/chat/completions with your prompt, then returns the cleaned text. If the LLM fails, the raw transcript is returned - dictation never breaks.
| Variable | Required | Description |
|---|---|---|
LLM_BASE_URL |
Yes | OpenAI-compatible endpoint, e.g. https://api.openai.com/v1 |
LLM_MODEL |
Yes | Model identifier, e.g. gpt-4o-mini |
LLM_API_KEY |
No | Bearer token. Not needed for local Ollama. |
LLM_PROMPT |
No | System prompt string, or a file path starting with / (mount via volume) |
Both LLM_BASE_URL and LLM_MODEL must be set or the feature stays off.
echo "OPENAI_API_KEY=sk-your-key-here" > .env gateway:
environment:
DEFAULT_MODEL: small
LLM_BASE_URL: "https://api.openai.com/v1"
LLM_API_KEY: "${OPENAI_API_KEY}"
LLM_MODEL: "gpt-4o-mini"
LLM_PROMPT: "Clean up this voice transcription. Remove filler words (um, uh, like). Fix punctuation and capitalization. Return only the cleaned text, nothing else."Docker Compose reads ${OPENAI_API_KEY} from .env automatically. Works with any OpenAI-compatible provider - Groq, Together, Fireworks, Mistral, OpenRouter - swap LLM_BASE_URL and LLM_MODEL.
ollama:
image: ollama/ollama:latest
container_name: diction-ollama
restart: unless-stopped
volumes:
- ollama-models:/root/.ollama
gateway:
environment:
DEFAULT_MODEL: small
LLM_BASE_URL: "http://ollama:11434/v1"
LLM_MODEL: "gemma2:9b"
LLM_PROMPT: "Clean up this voice transcription. Remove filler words. Fix punctuation and capitalization. Return only the cleaned text, nothing else."
volumes:
whisper-models:
ollama-models:docker compose up -d
docker exec diction-ollama ollama pull gemma2:9b| Model | Memory | Notes |
|---|---|---|
gemma2:9b |
~6 GB | Best cleanup quality at this size |
qwen2.5:7b |
~5 GB | Strong instruction following |
llama3.1:8b |
~5 GB | Most popular, well-tested |
gemma3:4b |
~3 GB | For tighter machines |
Models under 7B tend to answer questions about the transcript instead of cleaning it up. 7B or larger recommended.
curl -X POST "http://localhost:8080/v1/audio/transcriptions?enhance=true" \
-F "file=@test.aiff" \
-F "model=small"# Confirm LLM fired - look for X-Diction-LLM-Ms in the output
curl -sS -D - -o /dev/null \
-X POST "http://localhost:8080/v1/audio/transcriptions?enhance=true" \
-F "file=@test.aiff" -F "model=small" | grep -i dictionMount a file and point LLM_PROMPT at the path:
gateway:
volumes:
- ./cleanup-prompt.txt:/config/prompt.txt:ro
environment:
LLM_PROMPT: "/config/prompt.txt"If LLM_PROMPT starts with /, the gateway reads it as a file. Otherwise it uses the string directly.
The repo ships a flake with a hardened systemd module - no Docker needed.
nix run github:omachala/diction#diction-gatewayEnable as a service:
{
inputs.diction.url = "github:omachala/diction";
outputs = { nixpkgs, diction, ... }: {
nixosConfigurations.your-host = nixpkgs.lib.nixosSystem {
modules = [
diction.nixosModules.default
{
services.diction-gateway = {
enable = true;
openFirewall = true;
# customBackend.url = "http://127.0.0.1:8000";
# llm.baseUrl = "http://127.0.0.1:11434/v1";
# llm.model = "gemma2:9b";
# environmentFile = "/run/secrets/diction-gateway.env";
};
}
];
};
};
}The unit runs under DynamicUser with ProtectSystem=strict, NoNewPrivileges, and a narrow syscall filter. Use environmentFile for secrets - they don't end up in the world-readable Nix store. Full option list: nix/module.nix.
The gateway implements the OpenAI audio transcription API - any client that works against api.openai.com/v1/audio/transcriptions works against a Diction gateway.
from openai import OpenAI
client = OpenAI(
base_url="http://your-server:8080/v1",
api_key="anything", # not checked when AUTH_ENABLED=false
)
with open("audio.wav", "rb") as f:
result = client.audio.transcriptions.create(
file=f,
model="small", # or "Systran/faster-whisper-small"
response_format="text",
)
print(result)Works with the Node SDK, LangChain, Flowise, n8n, or any tool that expects OpenAI's speech API.
Supported:
POST /v1/audio/transcriptions-file,model,language,prompt,response_format=json|textGET /v1/models- returns an OpenAI-compatibledata[]array plus aproviders[]grouping consumed by the iOS app. Both HuggingFace IDs (Systran/faster-whisper-small,nvidia/parakeet-tdt-0.6b-v3) and short aliases (small,medium,large-v3-turbo,parakeet-v3) are accepted.- WebSocket
/v1/audio/stream- used by the Diction app for low-latency streaming
Not supported:
- TTS (
/v1/audio/speech) response_format=verbose_json|srt|vtt(no word-level timestamps)- SSE streaming on REST (use WebSocket
/v1/audio/streaminstead) - Model download/delete (
POST/DELETE /v1/models/{id}) - OpenAI Realtime API (
/v1/realtime)
Authentication is off by default (AUTH_ENABLED=false). Pass any non-empty string as the API key from the client - the gateway doesn't check it. To lock down a public-facing deployment, set AUTH_ENABLED=true and configure tokens in the gateway env.
Error shape: errors return {"error":"<message>"}, not OpenAI's nested {"error":{"message":"...","type":"..."}}. Most SDKs surface these as HTTPError rather than APIError.
- On-device: Everything stays on your phone. No network connection is made.
- Self-hosted: Audio goes to your server only. Neither the gateway nor
faster-whisper-serverpersists audio - it's transcribed and discarded. - AI cleanup enabled: The transcript (plain text, no audio) goes to your configured LLM. If you use Ollama locally, nothing leaves your machine.
- Diction One (cloud): Audio is transcribed and immediately discarded. Not stored, not used for training.
- Zero third-party SDKs in the app. No analytics, no tracking, no telemetry.
- Full Access is required by iOS for any keyboard that makes network requests. Diction has no QWERTY input - the only data that leaves the app is the audio recording, sent to the endpoint you configured.
Read the full Privacy Policy.
On-device and self-hosted are completely free with no word limits.
If you don't want to run a server, Diction One gives you a fine-tuned cloud model with advanced audio filtering - without the setup. Audio is sent to the Diction endpoint, transcribed, and immediately discarded. Pricing and trial details are in the app.
Contributions are welcome. See CONTRIBUTING.md.
MIT. See LICENSE.



