feat(voice): dual-provider TTS (Supertonic local + ElevenLabs cloud) with Linux support by Trei-D · Pull Request #1301 · danielmiessler/LifeOS

Trei-D · 2026-05-24T06:25:27Z

Problem

The v5.0.0 voice module is macOS-only (uses afplay + osascript) and ElevenLabs-only (requires API key + quota). This means:

Linux PAI users have no voice — afplay doesn't exist on Linux
Voice costs money — every notification burns ElevenLabs API credits
Voice requires internet — no offline/local option

Solution

Dual-provider TTS architecture with cross-platform audio playback.

New: Supertonic as local-first provider

Zero cost, zero internet, zero API key. Supertonic runs TTS inference on CPU using ONNX models that auto-download on first use.

Installation

cd ~/.claude/PAI/PULSE/VoiceServer

# Create Python venv and install Supertonic
python3 -m venv .venv
.venv/bin/pip install supertonic

# Verify installation
.venv/bin/python supertonic-tts.py --text "Hello from PAI" --voice M1 --output /tmp/test.wav

# Play the result (Linux)
paplay /tmp/test.wav

Requirements:

Python 3.10+ (tested with 3.12)
~158 MB disk for the venv
~386 MB disk for model cache (~/.cache/supertonic3/, downloaded on first run)

Available voices

Voice	Gender	Notes
M1–M5	Male	5 distinct male voices
F1–F5	Female	5 distinct female voices

Configure in settings.json:

{
  "daidentity": {
    "voices": {
      "provider": "supertonic",
      "main": {
        "supertonicVoice": "M1"
      }
    }
  }
}

Performance (CPU-only, no GPU required)

Benchmarked on a 2-core Intel Skylake VM (worst case — most desktops will be faster):

Message	Synthesis time	End-to-end (+ playback)
Short (3 words)	~1.6s	~3.5s
Medium (8 words)	~2.0s	~5.5s
Long (12 words)	~2.0s	~5.5s

First run: adds ~10–30s for model download (~386 MB), then cached permanently
CPU usage: uses all available cores during synthesis (~8s user time on 2 cores = full parallel), then idle
Memory: ~200 MB RSS during synthesis

For comparison, ElevenLabs cloud TTS takes ~1–2s network round-trip but costs $0.30/1K characters.

New: Cross-platform audio playback

Audio player discovery chain (first available wins):

Player	Platform	Package
`paplay`	Linux (PulseAudio/PipeWire)	`pulseaudio-utils` or `pipewire-pulse`
`ffplay`	Universal (FFmpeg)	`ffmpeg`
`afplay`	macOS	Built-in

Linux system dependencies:

# Ubuntu/Debian
sudo apt install pulseaudio-utils libnotify-bin

# Fedora
sudo dnf install pulseaudio-utils libnotify

# Arch
sudo pacman -S libpulse libnotify

New: Linux desktop notifications

notify-send on Linux (libnotify) — visual popup alongside audio
osascript on macOS (existing behavior preserved)

Homeserver → Desktop audio routing

For users running PAI on a headless server (VM, NAS, homelab), voice audio can play on a remote desktop machine via PulseAudio/PipeWire network streaming:

On the desktop (audio sink):

# PulseAudio: allow network connections
pactl load-module module-native-protocol-tcp auth-anonymous=1

# PipeWire: add to ~/.config/pipewire/pipewire-pulse.conf.d/network.conf
# context.modules = [{ name = libpipewire-module-protocol-pulse
#   args = { server.address = ["unix:native", "tcp:4713"] } }]

On the server (PAI host):

# Add to ~/.claude/.env or shell profile
export PULSE_SERVER=tcp:<DESKTOP_IP>:4713

Audio from paplay/ffplay on the server routes to the desktop's speakers over the LAN. Works with both WAV (Supertonic) and MP3 (ElevenLabs).

Troubleshooting

Issue	Fix
`No audio player found`	Install `pulseaudio-utils` (Linux) or `ffmpeg`
`Supertonic TTS failed`	Check `.venv/bin/python` exists; re-run `pip install supertonic`
`Voice: Supertonic not installed — falling back to elevenlabs`	Normal if you haven't installed Supertonic; set `provider: "elevenlabs"` to suppress
No sound on remote server	Set `PULSE_SERVER=tcp:<desktop-ip>:4713` in `.env`
`Connection refused` on PulseAudio TCP	Run `pactl load-module module-native-protocol-tcp` on the desktop

Backward compatibility

ElevenLabs users: set "provider": "elevenlabs" in settings.json — everything works exactly as before
macOS users: afplay + osascript still in the discovery chain — zero behavior change
No Supertonic installed: auto-fallback to ElevenLabs with a log warning
All existing HTTP endpoints (/notify, /notify/personality, /voice, /voice/health) unchanged
3-tier config resolution preserved (caller body → voice_id lookup → defaults)

Files changed

File	Change
`VoiceServer/voice.ts`	Dual-provider architecture, Linux audio/notification support
`VoiceServer/supertonic-tts.py`	NEW — Python wrapper for Supertonic TTS synthesis

Testing

Verified on:

Ubuntu 24.04 (paplay + notify-send) with Supertonic provider
Homeserver → desktop audio routing via PulseAudio TCP (VM → desktop over LAN)
ElevenLabs fallback when Supertonic not installed
macOS compatibility preserved (afplay + osascript in discovery chain)

…with Linux support - Add Supertonic as local CPU-based TTS provider (zero cost, no API key needed) - Add Linux audio playback: paplay (PulseAudio) → ffplay (FFmpeg) → afplay (macOS) - Add Linux desktop notifications via notify-send - Add VoiceProvider type for provider selection in settings.json - Add per-voice Supertonic voice mapping (M1-M5, F1-F5) - Add supertonic-tts.py wrapper script - Preserve full backward compatibility with ElevenLabs-only setups - Auto-fallback: if Supertonic not installed, falls back to ElevenLabs

danielmiessler · 2026-06-21T04:05:07Z

Hey @Trei-D, thanks for raising this, and sorry it sat for a while.

We're changing how LifeOS ships. Instead of cloning a full ~/.claude directory and running it as a complete system, LifeOS is becoming a skill you install through an agentic installer. The installer hands integration to your own AI, which reads your actual machine (your OS, your paths, your harness) and wires the hooks and system prompt in where they belong.

That's aimed right at what you hit here. The old "one directory, one layout, hope it matches your setup" approach is exactly what broke for so many people, and the new model should handle it far better because your AI does the integration per machine instead of us guessing.

So we're closing this in prep for that release. If it still bites you once the skill-based version is out, reopen or file a fresh one and we'll jump on it. Appreciate you taking the time.

danielmiessler closed this Jun 21, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat(voice): dual-provider TTS (Supertonic local + ElevenLabs cloud) with Linux support#1301

feat(voice): dual-provider TTS (Supertonic local + ElevenLabs cloud) with Linux support#1301
Trei-D wants to merge 1 commit into
danielmiessler:mainfrom
Trei-D:feat/dual-provider-voice-linux

Trei-D commented May 24, 2026 •

edited

Loading

Uh oh!

danielmiessler commented Jun 21, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

Trei-D commented May 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Problem

Solution

New: Supertonic as local-first provider

Installation

Available voices

Performance (CPU-only, no GPU required)

New: Cross-platform audio playback

New: Linux desktop notifications

Homeserver → Desktop audio routing

Troubleshooting

Backward compatibility

Files changed

Testing

Uh oh!

danielmiessler commented Jun 21, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Trei-D commented May 24, 2026 •

edited

Loading