feat(voice): dual-provider TTS (Supertonic local + ElevenLabs cloud) with Linux support#1301
feat(voice): dual-provider TTS (Supertonic local + ElevenLabs cloud) with Linux support#1301Trei-D wants to merge 1 commit into
Conversation
…with Linux support - Add Supertonic as local CPU-based TTS provider (zero cost, no API key needed) - Add Linux audio playback: paplay (PulseAudio) → ffplay (FFmpeg) → afplay (macOS) - Add Linux desktop notifications via notify-send - Add VoiceProvider type for provider selection in settings.json - Add per-voice Supertonic voice mapping (M1-M5, F1-F5) - Add supertonic-tts.py wrapper script - Preserve full backward compatibility with ElevenLabs-only setups - Auto-fallback: if Supertonic not installed, falls back to ElevenLabs
|
Hey @Trei-D, thanks for raising this, and sorry it sat for a while. We're changing how LifeOS ships. Instead of cloning a full That's aimed right at what you hit here. The old "one directory, one layout, hope it matches your setup" approach is exactly what broke for so many people, and the new model should handle it far better because your AI does the integration per machine instead of us guessing. So we're closing this in prep for that release. If it still bites you once the skill-based version is out, reopen or file a fresh one and we'll jump on it. Appreciate you taking the time. |
Problem
The v5.0.0 voice module is macOS-only (uses
afplay+osascript) and ElevenLabs-only (requires API key + quota). This means:afplaydoesn't exist on LinuxSolution
Dual-provider TTS architecture with cross-platform audio playback.
New: Supertonic as local-first provider
Zero cost, zero internet, zero API key. Supertonic runs TTS inference on CPU using ONNX models that auto-download on first use.
Installation
Requirements:
~/.cache/supertonic3/, downloaded on first run)Available voices
Configure in
settings.json:{ "daidentity": { "voices": { "provider": "supertonic", "main": { "supertonicVoice": "M1" } } } }Performance (CPU-only, no GPU required)
Benchmarked on a 2-core Intel Skylake VM (worst case — most desktops will be faster):
For comparison, ElevenLabs cloud TTS takes ~1–2s network round-trip but costs $0.30/1K characters.
New: Cross-platform audio playback
Audio player discovery chain (first available wins):
paplaypulseaudio-utilsorpipewire-pulseffplayffmpegafplayLinux system dependencies:
New: Linux desktop notifications
notify-sendon Linux (libnotify) — visual popup alongside audioosascripton macOS (existing behavior preserved)Homeserver → Desktop audio routing
For users running PAI on a headless server (VM, NAS, homelab), voice audio can play on a remote desktop machine via PulseAudio/PipeWire network streaming:
On the desktop (audio sink):
On the server (PAI host):
Audio from
paplay/ffplayon the server routes to the desktop's speakers over the LAN. Works with both WAV (Supertonic) and MP3 (ElevenLabs).Troubleshooting
No audio player foundpulseaudio-utils(Linux) orffmpegSupertonic TTS failed.venv/bin/pythonexists; re-runpip install supertonicVoice: Supertonic not installed — falling back to elevenlabsprovider: "elevenlabs"to suppressPULSE_SERVER=tcp:<desktop-ip>:4713in.envConnection refusedon PulseAudio TCPpactl load-module module-native-protocol-tcpon the desktopBackward compatibility
"provider": "elevenlabs"in settings.json — everything works exactly as beforeafplay+osascriptstill in the discovery chain — zero behavior change/notify,/notify/personality,/voice,/voice/health) unchangedFiles changed
VoiceServer/voice.tsVoiceServer/supertonic-tts.pyTesting
Verified on: