A comprehensive ComfyUI integration for Microsoft's VibeVoice text-to-speech model, enabling high-quality single and multi-speaker voice synthesis directly within your ComfyUI workflows.
-
Updated
Feb 18, 2026 - Python
A comprehensive ComfyUI integration for Microsoft's VibeVoice text-to-speech model, enabling high-quality single and multi-speaker voice synthesis directly within your ComfyUI workflows.
A ComfyUI custom node integration for local multi-engine multi-language Text-to-Speech and Voice Conversion. Supports: RVC, Echo-TTS, Qwen3-TTS, Cozy Voice 3, Step Audio EditX, IndexTTS-2, Chatterbox (classic and multilingual), F5-TTS, Higgs Audio 2 and VibeVoice with unlimited text length, SRT timing, Character support, and many audio tools
ComfyUI custom node for the VibeVoice TTS. Expressive, long-form, multi-speaker conversational audio
VibeVoiceFusion is a full-stack, multi-speaker voice generation web system featuring LoRA fine-tuning, batch generation, and VRAM optimization. Based on Microsoft's VibeVoice (AR + diffusion architecture)
A fully local and private Speech-To-Text app with cross-platform support, speaker diarization, Audio Notebook mode, LM Studio integration, and both longform and live transcription.
Audiobook creation tool supporting too many TTS models (Qwen3-TTS, OmniVoice, VibeVoice, etc), focused on high-quality output. Plus audio-synced reader app and standalone server component.
Beautiful voice app: record or upload to train a voice, generate speech from text or files, save & download voices.
Archive of the official Microsoft VibeVoice repository (7B & 1.5B). Backup of the deleted source code for the open-source TTS models, including the removed 7B version. Try the VibeVoice online service
ONNX speech pipeline library for ASR, diarization, VAD, and denoising
Create multi-voice podcasts with AI text-to-speech
A Gradio-based demo for end-to-end vision-to-speech inference: Extract text or descriptions from images using Qwen2.5-VL-7B-Instruct, then convert to natural speech audio via Microsoft VibeVoice-Realtime-0.5B.
HOW TO RUN MICROSOFT VIBEVOICE LOCALLY
A ready-to-use Google Colab notebook for running the open-source VibeVoice TTS model from Microsoft, using the quantized Large Q8 variant (~12 GB VRAM) for multi-speaker long-form audio generation
🐟 Enhance communication with Fish Speech, a powerful multilingual Text-to-Speech system featuring speaker management, auto-transcription, and emotion control.
Simplified scripts for fine-tuning VibeVoice speech synthesis models with LoRA. Painless fine-tuning with reasonable defaults, supporting both local GPU and Google Colab workflows.
A suckless, high-performance CLI tool for audio transcription using Microsoft VibeVoice-ASR.
🎙️ Enhance voice synthesis with ComfyUI-Qwen3-TTS, featuring advanced voice cloning, emotion-aware ASR, and unlimited multi-role dubbing.
Add a description, image, and links to the vibevoice topic page so that developers can more easily learn about it.
To associate your repository with the vibevoice topic, visit your repo's landing page and select "manage topics."