GitHub - omarelkhal/voxpost: Hear new Gmail as one short spoken line — fully on your machine. Your OAuth token, Ollama summarizer, and Supertonic TTS stay local. No mail archive, no cloud inference, no cloud voice. Private by design.

📖 Full documentation site — search, dark mode, all guides in one place.

Hear what arrived — without opening another tab.

Voxpost is a local, on-device desktop companion: when something important lands in Gmail, you get a short spoken line (who, and what it’s about), not a wall of text read aloud and not a cloud text-to-speech API.

Free for everyone — open source, no catch.
This is my way of giving back to the community, the same spirit as Aftertone.
If it helps you, you can optionally buy me a coffee — never required.

_{voxpost listen --speak · detect → summarize → speak · human-readable log on stderr}

The problem

Many of us watch the inbox while coding or deep in another app — tab back to Gmail or notifications just to see if something finished or if a message matters. That breaks focus. Existing “read aloud” tools often dump too much audio or send your text to hosted TTS.

Voxpost is the opposite: one actionable sentence, on your machine, when you choose to allow it.

The idea (v1)

Today	Later (not v1)
Gmail only	Other sources (RSS, calendar, system notifications, chat apps, …)

v1 scope: personal Gmail → one speakable line → local voice synthesis. Filter rules (VIP senders, keywords, quiet hours) ship with the desktop settings UI, not as a CLI-only block before that.

Not v1: WhatsApp/Instagram APIs, multi-OS polish, smart LLM summaries, or replacing your mail client.

Building blocks

Development proceeds in layers. Block 1 must work before anything else:

Block	Scope	Status
1 — Gmail events	OAuth, `watch`, Pub/Sub, `history.list` → ephemeral event	Done
1b — Attachments	Metadata only (filenames, count)	Done
3 — Speakable line	Local summarizer, one sentence, no storage	Done (CLI)
4 — Local TTS	Supertonic 3 on-device playback	Done (CLI)
5 — Desktop UI	Connect, listen, TTS + rule settings	Planned
2 — Rules	VIP, keywords, quiet hours	Deferred until UI (Block 5)

Filter rules are not a headless milestone — users configure them in the settings UI. Until then, every inbox message can flow through summarize → TTS.

See docs/BLOCK_1_GMAIL_EVENTS.md, docs/BLOCK_3_SUMMARIZE.md, docs/BLOCK_4_TTS.md, and docs/GMAIL_EVENTS_RESEARCH.md.

Design principles

Local speech — notification content used for TTS stays on the device; no subscription TTS vendor required for the core path.
Short by default — e.g. “Alex says the staging deploy failed,” not the full email body.
Silence is a feature — newsletters and noise should be filtered; digest mode is a later UX knob.
Gmail proves the loop — if spoken mail isn’t useful daily, more connectors won’t fix it.
Other services as plugins — each future source feeds the same “one line to speak” pipeline.
Event-driven, ephemeral — detect → process in memory → speak → discard; no mail archive.

Privacy-first (open source)

You bring your own credentials — nothing is bundled:

Your Google Cloud project and Pub/Sub (see docs/SETUP.md)
Your OAuth Desktop client JSON (path below)
Your Gmail account via voxpost connect (refresh token stays local)

Platform	Config directory
Linux	`~/.config/voxpost/`
macOS	`~/.config/voxpost/`
Windows	`%USERPROFILE%\.config\voxpost\` (same as `~/.config/voxpost/` in Python)

Files: client_secret.json, gcp.json, token.json, voxpost.toml — see docs/SETUP.md for GCP and OAuth setup (same in the browser on every OS).

Configuration by platform

Same everywhere: Google Cloud + OAuth (browser), voxpost.toml (Ollama + local TTS), voxpost connect, voxpost listen --speak.
Differs by OS: install commands, virtualenv activation, and audio dependencies.

Full GCP/OAuth walkthrough: docs/SETUP.md.

Shared `voxpost.toml`

Copy the example into your config dir on every platform, then edit if needed:

[summarize]
backend = "ollama"
model = "qwen3.5:4b"          # see “Recommended Ollama models” below
ollama_host = "http://localhost:11434"

[tts]
model = "supertonic-3"
device = "auto"       # linux: cuda/cpu · macOS: often cpu · windows: cuda/cpu
voice = "M1"
lang = "en"

[speech]
mode = "fixed"
target_lang = "en"

Summarizer models (any local model)

Voxpost is not locked to Qwen. Use any local model via Ollama or Hugging Face + transformers — Phi, Gemma, Mistral, SmolLM, Llama, Qwen, etc. Set [summarize].model to the exact tag from ollama list or a HF model id.

Maintainers mostly test with Qwen 3.5 on Ollama today; that is a starting point, not a requirement.

Quality rule of thumb: sub-4B models often hallucinate senders and intent on real mail. The model leaderboard collects community runs on 24 fixed email fixtures so small vs mid-size models can be compared objectively.

What	Where
Leaderboard (PASS / WEAK / FAIL)	docs/MODEL_LEADERBOARD.md
How to submit a new model	Leaderboard doc + chat review prompt
Add a new test email	PR to `speech_check_cases.py`
GitHub issue template	New issue → Model speech-check benchmark

Contributor flow (short):

Pick a model not on the leaderboard → ollama pull …
Run voxpost summarize speech-check --model YOUR_TAG → unique auto-named .md under docs/benchmarks/runs/
Paste the markdown report into MODEL_REVIEW_PROMPT in a judge chat → PASS/WEAK/FAIL + name the judge model
PR: leaderboard row + the graded run log ({model}__{backend}__{n}of24__{status}__{date}.md)

No Gmail, no full listen pipeline — just the fixture suite and your judgment (optionally chat-assisted).

Example families (not exhaustive)

Family	Example Ollama tags	Notes
Qwen 3.5	`qwen3.5:4b`, `qwen3.5:9b`, `qwen3.5:4b-mlx`	Maintainer default; many quants
Phi	`phi4-mini`, `phi3:mini`	Try on leaderboard
Gemma	`gemma3:4b`, `gemma2:2b`	Try on leaderboard
Mistral	`mistral:7b`, `ministral-3:8b`	Heavier; needs RAM
SmolLM	`smollm2:360m`, `smollm2:1.7b`	Expect weak scores — good baseline

Avoid *:cloud tags — not local.

ollama pull qwen3.5:4b    # example; use whatever you are benchmarking

[summarize]
backend = "ollama"
model = "qwen3.5:4b"     # or phi4-mini, gemma3:4b, …
ollama_host = "http://localhost:11434"

Browse Ollama: ollama.com/library. After changing models, run speech-check before daily listen --speak.

Configuration reference

Settings live in ~/.config/voxpost/voxpost.toml (or %USERPROFILE%\.config\voxpost\voxpost.toml on Windows).
Precedence: environment variable → TOML → built-in default. See .env.example.

Credential & runtime files (not in TOML)

File	Purpose
`client_secret.json`	OAuth Desktop client from Google Cloud Console
`token.json`	Gmail refresh token (created by `voxpost connect`)
`gcp.json`	GCP project ID, Pub/Sub topic and subscription names (`voxpost setup-gcp`)
`state.json`	Gmail `historyId` cursor and watch metadata (auto-updated)

GCP & daemon (`gcp.json` / environment)

Variable	Env override	Default / source	Meaning
GCP project ID	`VOXPOST_GCP_PROJECT`	`gcp.json` → `gcloud config`	Google Cloud project that hosts Pub/Sub
Pub/Sub topic	`VOXPOST_PUBSUB_TOPIC`	`voxpost-gmail`	Topic Gmail push notifications publish to
Pub/Sub subscription	`VOXPOST_PUBSUB_SUBSCRIPTION`	`voxpost-gmail-pull`	Pull subscription the daemon listens on
OAuth client JSON path	`VOXPOST_OAUTH_CLIENT_SECRETS`	`client_secret.json` in config dir	Path to Desktop OAuth credentials
Pub/Sub service account key	`VOXPOST_PUBSUB_CREDENTIALS`	ADC if unset	Optional JSON key; else `gcloud auth application-default login`
Config directory	`VOXPOST_CONFIG_DIR`	`~/.config/voxpost`	Override where all config files are stored
Post-notification fetch delay	`VOXPOST_FETCH_DELAY_SECONDS`	`2`	Seconds to wait after Pub/Sub before fetching mail (race guard)

`[summarize]` — local speakable line (Ollama or transformers)

Key	Env override	Default	Meaning
`backend`	`VOXPOST_SUMMARIZER_BACKEND`	`transformers`*	`ollama` (recommended) or `transformers` (Hugging Face + PyTorch)
`model`	`VOXPOST_SUMMARIZER_MODEL`	(see code)	Ollama tag (`phi4-mini`, `qwen3.5:4b`, …) or HF id for transformers — see leaderboard
`ollama_host`	`VOXPOST_OLLAMA_HOST`	`http://localhost:11434`	Ollama API base URL
`device`	`VOXPOST_SUMMARIZER_DEVICE`	`auto`	transformers only: `auto`, `cpu`, `cuda`, `mps`
`cpu_threads`	`VOXPOST_SUMMARIZER_CPU_THREADS`	`0`	transformers CPU: thread count (`0` = half of logical cores)
`idle_unload_minutes`	`VOXPOST_SUMMARIZER_IDLE_UNLOAD_MINUTES`	`10`	Unload transformers weights after idle minutes (`0` = never)
`chat_max_new_tokens`	`VOXPOST_SUMMARIZER_MAX_NEW_TOKENS`	`96`	Max tokens generated (raise to ~128 for long mail)
`load_dtype`	`VOXPOST_SUMMARIZER_LOAD_DTYPE`	`auto`	transformers: `auto`, `float16`, `float32`, `bfloat16`
`chat_input_format`	`VOXPOST_SUMMARIZER_INPUT`	`plain`	chat LMs: `plain` or `structured` (sender, forward flag, attachments)

*Ship backend = "ollama" in your voxpost.toml — the code default is legacy transformers when the file is missing.

`[tts]` — Supertonic on-device speech

Key	Env override	Default	Meaning
`model`	`VOXPOST_TTS_MODEL`	`supertonic-3`	`supertonic`, `supertonic-2`, or `supertonic-3`
`device`	`VOXPOST_TTS_DEVICE`	`auto`	`auto`, `cpu`, `cuda` / `gpu` (needs `onnxruntime-gpu`)
`voice`	`VOXPOST_TTS_VOICE`	`M1`	Voice id: `M1`–`M5`, `F1`–`F5`
`lang`	`VOXPOST_TTS_LANG`	`en`	TTS language code, or `na` for language-agnostic
`total_steps`	`VOXPOST_TTS_TOTAL_STEPS`	`8`	Denoising steps (higher = better quality, slower)
`speed`	`VOXPOST_TTS_SPEED`	`1.05`	Speech rate (`>1` faster, `<1` slower)
`playback`	`VOXPOST_TTS_PLAYBACK`	`auto`	`auto`, `sounddevice` (PortAudio), or `aplay` (Linux)
`auto_download`	`VOXPOST_TTS_AUTO_DOWNLOAD`	`true`	Download ONNX assets on first use
`chime_before_speak`	`VOXPOST_TTS_CHIME`	`true`	Play a short chime before each spoken line
`chime_pause_ms`	`VOXPOST_TTS_CHIME_PAUSE_MS`	`350`	Pause after chime (milliseconds)
`chime_file`	`VOXPOST_TTS_CHIME_FILE`	(none)	Optional path to custom chime WAV

`[speech]` — language of the speakable line (not the email)

Key	Env override	Default	Meaning
`mode`	`VOXPOST_SPEECH_LANG_MODE`	`fixed`	`fixed` → use `target_lang`; `auto` → use `[tts].lang`
`target_lang`	`VOXPOST_SPEECH_TARGET_LANG`	`en`	Language for briefings when `mode = "fixed"` (never inferred from email body)

Deeper tuning: docs/BLOCK_3_SUMMARIZE.md, docs/BLOCK_4_TTS.md.

Linux

Best tested. Debian/Ubuntu-style example; adapt package names for Fedora, Arch, etc.

Prerequisites

sudo apt update
sudo apt install -y python3 python3-venv python3-pip git libportaudio2
# gcloud: https://cloud.google.com/sdk/docs/install#linux
# Ollama: curl -fsSL https://ollama.com/install.sh | sh

Install Voxpost

git clone https://github.com/omarelkhal/voxpost.git
cd voxpost
python3 -m venv .venv
source .venv/bin/activate
pip install -e ".[dev,tts]"

Config paths

mkdir -p ~/.config/voxpost
chmod 700 ~/.config/voxpost
# OAuth Desktop JSON from Google Cloud Console:
cp ~/Downloads/client_secret_*.json ~/.config/voxpost/client_secret.json
chmod 600 ~/.config/voxpost/client_secret.json
cp voxpost.toml.example ~/.config/voxpost/voxpost.toml

GCP, Ollama, TTS, run (after SETUP.md Steps 2–10)

gcloud auth login
gcloud config set project YOUR_PROJECT_ID
voxpost setup-gcp
gcloud auth application-default login

ollama pull qwen3.5:4b
voxpost tts download
voxpost tts test "Linux audio check."

voxpost connect
voxpost listen --speak

Linux TTS notes: [tts] playback = "auto" uses PortAudio (sounddevice) or aplay. For NVIDIA GPU TTS, install CUDA + onnxruntime-gpu, then set [tts] device = "gpu".

macOS

Prerequisites (Homebrew)

brew install python@3.11 git portaudio ollama
brew install --cask google-cloud-sdk   # or: https://cloud.google.com/sdk/docs/install#mac

Install Voxpost

git clone https://github.com/omarelkhal/voxpost.git
cd voxpost
python3 -m venv .venv
source .venv/bin/activate
pip install -e ".[dev,tts]"

Config paths

mkdir -p ~/.config/voxpost
chmod 700 ~/.config/voxpost
cp ~/Downloads/client_secret_*.json ~/.config/voxpost/client_secret.json
chmod 600 ~/.config/voxpost/client_secret.json
cp voxpost.toml.example ~/.config/voxpost/voxpost.toml

GCP, Ollama, TTS, run

gcloud auth login
gcloud config set project YOUR_PROJECT_ID
voxpost setup-gcp
gcloud auth application-default login

ollama pull qwen3.5:4b
# Start Ollama if not running: open the Ollama app or `ollama serve`
voxpost tts download
voxpost tts test "macOS audio check."

voxpost connect
voxpost listen --speak

macOS notes: Summarizer runs via Ollama (Metal when Ollama uses it). Supertonic TTS is ONNX — [tts] device = "auto" usually picks CPU; that is normal. Allow microphone/speaker access if macOS prompts during tts test.

Windows

Prerequisites

Python 3.11+ — enable “Add python.exe to PATH”
Git for Windows
Google Cloud SDK
Ollama for Windows

Install Voxpost (PowerShell)

git clone https://github.com/omarelkhal/voxpost.git
cd voxpost
py -m venv .venv
.\.venv\Scripts\Activate.ps1
pip install -e ".[dev,tts]"

Config paths (PowerShell)

New-Item -ItemType Directory -Force -Path "$env:USERPROFILE\.config\voxpost"
Copy-Item "$env:USERPROFILE\Downloads\client_secret_*.json" "$env:USERPROFILE\.config\voxpost\client_secret.json"
Copy-Item voxpost.toml.example "$env:USERPROFILE\.config\voxpost\voxpost.toml"

GCP, Ollama, TTS, run (Command Prompt or PowerShell)

gcloud auth login
gcloud config set project YOUR_PROJECT_ID
voxpost setup-gcp
gcloud auth application-default login

ollama pull qwen3.5:4b
voxpost tts download
voxpost tts test "Windows audio check."

voxpost connect
voxpost listen --speak

Windows notes: Voxpost reads config from %USERPROFILE%\.config\voxpost\. OAuth opens your default browser; allow the localhost callback. For GPU TTS, install NVIDIA drivers + CUDA-compatible onnxruntime-gpu if you use [tts] device = "gpu". If audio fails, try [tts] playback = "sounddevice" in voxpost.toml. WSL2 (Ubuntu) is an alternative — follow the Linux steps inside WSL if you prefer a Unix-like environment.

Quick reference

Step	Command (any OS, venv active)
Prefetch TTS	`voxpost tts download`
Test audio	`voxpost tts test "Hello."`
Link Gmail	`voxpost connect`
Run pipeline	`voxpost listen --speak`

Runtime: Python 3.11+ (docs/RUNTIME.md).

Status

Blocks 1, 1b, 3, 4: CLI pipeline (connect, listen --summarize --speak). Block 5: desktop UI planned.

Name

Voxpost — your mail, spoken locally.

Name		Name	Last commit message	Last commit date
Latest commit History 20 Commits
.github		.github
docs		docs
overrides/partials		overrides/partials
scripts		scripts
src/voxpost		src/voxpost
tests		tests
.env.example		.env.example
.gitignore		.gitignore
.txt		.txt
AGENTS.md		AGENTS.md
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
CONTRIBUTING.md		CONTRIBUTING.md
DESIGN.md		DESIGN.md
LICENSE		LICENSE
README.md		README.md
SECURITY.md		SECURITY.md
mkdocs.yml		mkdocs.yml
pyproject.toml		pyproject.toml
uv.lock		uv.lock
voxpost.toml.example		voxpost.toml.example

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

The problem

The idea (v1)

Building blocks

Design principles

Privacy-first (open source)

Configuration by platform

Shared `voxpost.toml`

Summarizer models (any local model)

Example families (not exhaustive)

Configuration reference

Credential & runtime files (not in TOML)

GCP & daemon (`gcp.json` / environment)

`[summarize]` — local speakable line (Ollama or transformers)

`[tts]` — Supertonic on-device speech

`[speech]` — language of the speakable line (not the email)

Linux

macOS

Windows

Quick reference

Status

Name

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

The problem

The idea (v1)

Building blocks

Design principles

Privacy-first (open source)

Configuration by platform

Shared voxpost.toml

Summarizer models (any local model)

Example families (not exhaustive)

Configuration reference

Credential & runtime files (not in TOML)

GCP & daemon (gcp.json / environment)

[summarize] — local speakable line (Ollama or transformers)

[tts] — Supertonic on-device speech

[speech] — language of the speakable line (not the email)

Linux

macOS

Windows

Quick reference

Status

Name

About

Topics

Resources

License

Code of conduct

Contributing

Security policy

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Shared `voxpost.toml`

GCP & daemon (`gcp.json` / environment)

`[summarize]` — local speakable line (Ollama or transformers)

`[tts]` — Supertonic on-device speech

`[speech]` — language of the speakable line (not the email)

Packages