A modern desktop application that converts EPUB ebooks into high-quality audiobooks using Kokoro TTS with GPU acceleration. Features a beautiful Electron-based UI with real-time progress tracking.
- π GPU Accelerated - Utilizes NVIDIA GPUs for fast processing
- π¨ Modern Desktop UI - Beautiful Electron app with React frontend
- π EPUB Support - Direct upload and conversion from EPUB files
- βοΈ Text Editing - Review and edit extracted text before conversion
- π Real-Time Progress - Track conversion progress with live updates
- π§ Smart Chunking - Handles large books by splitting into manageable chunks
- π΅ MP3 Output - Creates compressed audiobook files
- π₯ File Management - Download audio files or open file location
- π§Ή Auto Cleanup - Removes temporary files after processing
- Backend: FastAPI (Python) - Handles EPUB extraction and TTS conversion
- Frontend: Electron + React - Modern desktop application
- TTS Engine: Kokoro TTS with PyTorch
- Audio Processing: FFmpeg for audio combination
- Windows (tested), Linux, or macOS
- NVIDIA GPU (recommended) with CUDA support
- FFmpeg installed and in PATH
- Python 3.8+
- Node.js 16+ (for Electron frontend)
- GPU: NVIDIA RTX 3060 or better
- RAM: 8GB+ (16GB+ recommended for large books)
- Storage: 2-3GB free space per book
git clone <repository-url>
cd audiobook_generator# Create virtual environment (recommended)
python -m venv venv
source venv/bin/activate # On Windows: venv\Scripts\activate
# Install requirements
pip install -r requirements.txt# For CUDA 12.1 (check your CUDA version with: nvidia-smi)
pip install torch torchaudio --index-url https://download.pytorch.org/whl/cu121
# For CUDA 11.8
pip install torch torchaudio --index-url https://download.pytorch.org/whl/cu118Windows:
- Download from ffmpeg.org or use
winget install ffmpeg - Extract and add to PATH
- Test:
ffmpeg -version
Linux:
sudo apt update
sudo apt install ffmpegmacOS:
brew install ffmpegcd frontend
npm install# Activate virtual environment if not already active
source venv/bin/activate # On Windows: venv\Scripts\activate
# Start FastAPI server
uvicorn main:app --reload --port 8000The backend will be available at http://localhost:8000
cd frontend
npm start- Click "Choose EPUB File" to select your EPUB book
- Click "Extract Text" to extract and process the text content
- The extracted text will be saved to the
output/folder
- Review the extracted text in the text editor
- Make any edits or corrections as needed
- Optionally download the
.txtfile - Click "Continue to Convert" when ready
- Watch the real-time progress bar as your audiobook is generated
- Progress updates show:
- Current chunk being processed
- Overall completion percentage
- Status messages
- When complete, you can:
- Download Audio: Download the MP3 file directly
- Open File Location: Open the file in your system file manager
- Convert Another File: Start a new conversion
audiobook_generator/
βββ main.py # FastAPI backend server
βββ requirements.txt # Python dependencies
βββ frontend/ # Electron frontend
β βββ src/
β β βββ main.js # Electron main process
β β βββ preload.js # Preload script (IPC)
β β βββ renderer.jsx # React entry point
β β βββ pages/
β β β βββ Home.jsx # Main application page
β β βββ components/ # React components
β β βββ css/ # Stylesheets
β βββ package.json # Node.js dependencies
βββ uploads/ # Temporary EPUB storage
βββ output/ # Generated files (txt, mp3)
Upload and extract text from an EPUB file.
Request:
file: EPUB file (multipart/form-data)
Response:
{
"message": "EPUB extracted successfully",
"output": "book_name.txt",
"text": "Full extracted text content..."
}Convert text to audiobook.
Request:
{
"text": "Text content to convert...",
"filename": "book_name.epub"
}Response:
{
"message": "Conversion started",
"task_id": "uuid-here"
}Get conversion progress.
Response:
{
"status": "processing",
"progress": 45,
"current_chunk": 10,
"total_chunks": 25,
"message": "Processing chunk 10 of 25...",
"output_file": null
}Download the generated audio file.
- The application automatically detects and uses your GPU
- Larger chunks use more GPU memory but process faster
- RTX 4070 users: Can process chunk sizes up to 150,000 characters
| Hardware | Processing Speed | 2.4M Character Book |
|---|---|---|
| RTX 4070 | ~400+ chars/sec | ~19 minutes |
| RTX 3070 | ~300+ chars/sec | ~25 minutes |
| CPU only | ~50-100 chars/sec | 2-4 hours |
Edit main.py to customize:
# Voice selection (line ~213)
voice = 'af_sarah' # Options: 'af_sarah', 'af_heart', etc.
# Speech speed (line ~214)
speed = 1.0 # 0.8 = slower, 1.2 = faster
# Chunk size (line ~216)
chunk_size = 100000 # Larger = fewer files, more GPU memory usageEdit frontend/src/pages/Home.jsx to change the backend URL:
const CONST_BASE_URL = "http://localhost:8000";"CUDA not available"
# Check CUDA installation
python -c "import torch; print(torch.cuda.is_available())"
# Reinstall PyTorch with CUDA
pip install torch --index-url https://download.pytorch.org/whl/cu121"FFmpeg not found"
- Ensure FFmpeg is installed and in your system PATH
- Test:
ffmpeg -version
Backend connection errors
- Ensure the FastAPI server is running on port 8000
- Check that CORS is properly configured
- Verify the backend URL in the frontend matches your server
Progress bar not updating
- Check browser console for errors
- Verify the task_id is being received from the convert endpoint
- Ensure the polling interval is working (check Network tab)
File download not working
- Check that the file exists in the
output/folder - Verify file permissions
- Check Electron IPC handlers are properly set up
- Slow processing: Ensure GPU is being used (check backend console output)
- High memory usage: Reduce chunk_size in main.py
- Frontend lag: Close DevTools if open, reduce polling frequency
-
Text Extraction:
- User uploads EPUB file via the frontend
- Backend parses EPUB and extracts clean text
- Text is cleaned and formatted
- Extracted text is returned to frontend and saved to
output/folder
-
Text Review:
- User can review and edit the extracted text
- Edits are stored in memory (not saved to file)
- User can download the original extracted text
-
TTS Conversion:
- User submits text for conversion
- Backend creates a unique task ID
- Text is split into GPU-manageable chunks
- Each chunk is processed with Kokoro TTS
- Progress is tracked and updated in real-time
-
Audio Combination:
- Individual audio chunks are combined using FFmpeg
- Final MP3 file is created in
output/folder - Temporary files are cleaned up
-
File Access:
- User can download the MP3 file directly
- Or open the file location in system file manager (Electron only)
Edit main.py around line 213:
# Voice Options
voice = 'af_sarah' # Default female voice
voice = 'af_heart' # Alternative female voice
# Speed Control
speed = 0.8 # Slower, more deliberate
speed = 1.0 # Normal speed (default)
speed = 1.2 # Faster narrationEdit main.py around line 216:
# Chunk Size (characters per processing chunk)
chunk_size = 50000 # Conservative (4GB+ GPU)
chunk_size = 100000 # Balanced (8GB+ GPU, like RTX 4070)
chunk_size = 150000 # Maximum (12GB+ GPU, like RTX 4080/4090)Edit the clean_text() function in main.py:
def clean_text(text):
# Basic cleanup
text = re.sub(r'\s+', ' ', text.strip())
# Handle censored words
text = re.sub(r'F\s*\*\s*ck', 'Fuck', text)
# Custom replacements
text = re.sub(r'Dr\.', 'Doctor', text)
text = re.sub(r'Mr\.', 'Mister', text)
return textThis project uses:
- Kokoro TTS: Apache 2.0 License
- FastAPI: MIT License
- Electron: MIT License
- React: MIT License
- Other dependencies: Various open-source licenses
- Fork the repository
- Create a feature branch
- Make your improvements
- Submit a pull request
For issues:
- Check the troubleshooting section
- Ensure all dependencies are installed
- Verify GPU/CUDA setup
- Check FFmpeg installation
- Review browser/Electron console for errors
- Kokoro TTS team for the excellent TTS model
- PyTorch for GPU acceleration framework
- FastAPI for the modern Python web framework
- Electron for cross-platform desktop app framework
- FFmpeg for audio processing
- ebooklib for EPUB parsing
Happy audiobook generation! π§π