Web2Speech 🎵

A beautiful, mobile-friendly Progressive Web App that transforms web content and documents into natural speech with an elegant reader experience.

🖼️ UI Preview

✨ Features

🔄 Flexible Input Methods

URL Input: Paste any website URL to extract and read content
File Upload: Upload PDF files or text documents
Easy Toggle: Switch between input methods with a single click

🎯 Processing Modes

Generate Mode: Create downloadable MP3 audio files for offline listening
Stream Mode: Beautiful reader view with synchronized audio playback

📖 Beautiful Reader Experience

Word Highlighting: Real-time word-by-word highlighting during playback
Progress Tracking: Visual progress bar and completion percentage
Playback Controls: Play, pause, skip forward/backward controls
Customizable Settings: Adjust speech rate and pitch
Reading Statistics: Word count and estimated reading time

🎙️ Advanced Voice Features

Multiple TTS Engines: Choose between Web Speech API and Hugging Face Kokoro-82M
High-Quality AI Voices: Premium neural TTS with natural-sounding speech
Multiple Voices: Choose from all available system TTS voices plus AI voices
Language Support: Automatic language detection with flag indicators
Voice Testing: Test voices before starting playback
Local/Cloud Voices: Clear indicators for voice types

📱 Mobile-First Design

Responsive Layout: Optimized for all screen sizes
PWA Ready: Install on home screen for app-like experience
Touch Friendly: Large touch targets and smooth interactions
Offline Capable: Service worker for offline functionality

🚀 Technology Stack

Frontend Framework

React 18 with TypeScript for type safety
Vite for lightning-fast development and building
Zustand for lightweight state management

Styling & UI

Tailwind CSS for utility-first styling
Headless UI for accessible components
Lucide React for beautiful icons
Custom gradients for modern visual appeal

PWA & Performance

Vite PWA Plugin for service worker generation
Workbox for advanced caching strategies
Web App Manifest for installation support
Responsive images with SVG icons

File Processing

React Dropzone for drag-and-drop file uploads
PDF.js integration ready for PDF text extraction
Readability.js ready for web content extraction

Speech Technology

Web Speech API for native browser TTS
Hugging Face Kokoro-82M for high-quality AI-powered text-to-speech
Real-time word tracking during playback
Extensible architecture for external TTS services

🔧 Configuration

Hugging Face TTS Setup

To use the premium Hugging Face Kokoro-82M TTS engine:

Get API Key: Visit Hugging Face Settings to create an API token
Configure in App: Select "Hugging Face Kokoro-82M" from the TTS Engine dropdown
Enter API Key: Enter your API key when prompted
Enjoy Premium Audio: Experience high-quality neural text-to-speech

The API key is stored securely in your browser's local storage and is only used to communicate with Hugging Face's inference API.

Voice Selection

Browser TTS: Uses your system's built-in voices (free, works offline)
Kokoro-82M: Premium AI voices with natural intonation (requires API key)

🛠️ Development Setup

Prerequisites

Node.js 18+ and npm
Modern browser with Web Speech API support

Installation

# Clone the repository
git clone <repository-url>
cd web2speech

# Install dependencies
npm install

# Start development server
npm run dev

# Build for production
npm run build

# Preview production build
npm run preview

Available Scripts

npm run dev        # Start development server
npm run build      # Build for production
npm run preview    # Preview production build
npm run lint       # Run ESLint

🎨 Design System

Color Palette

Primary: Blue gradient (#3b82f6 to #2563eb)
Background: Gradient from blue to purple tones
Glass Effect: Semi-transparent white overlays
Interactive States: Smooth transitions and hover effects

Typography

Font: Inter (Google Fonts)
Responsive Sizes: Mobile-optimized text scaling
Reading Experience: Optimized for extended reading

Components

Modular Architecture: Reusable React components
Accessibility First: ARIA labels and keyboard navigation
Loading States: Beautiful loading indicators
Error Handling: User-friendly error messages

🔧 Configuration

PWA Configuration

The app is configured as a Progressive Web App with:

Service worker for offline functionality
App manifest for installation
Caching strategies for optimal performance

Tailwind Configuration

Custom Tailwind setup with:

Extended color palette
Typography plugin
Responsive breakpoints
Custom utilities

🚀 Deployment

Build Process

npm run build

The build creates:

Optimized React bundle
Service worker for PWA functionality
Web app manifest
Compressed assets with gzip

Hosting Options

Vercel: Zero-config deployment
Netlify: Static site hosting with PWA support
GitHub Pages: Free hosting for open source
Any static hosting: Compatible with any CDN

📱 PWA Installation

Users can install the app on their devices:

Chrome/Edge: Click install button in address bar
Safari: Share → Add to Home Screen
Mobile: Add to Home Screen from browser menu

🔄 Future Enhancements

Planned Features

External TTS Services: ElevenLabs, Azure, Google Cloud integration
Real PDF Processing: Advanced PDF text extraction
Web Scraping: Live content extraction from URLs
Voice Cloning: Custom voice training capabilities
Bookmarks: Save and organize favorite content
Themes: Light/dark mode and custom themes

Technical Improvements

Better Word Tracking: More accurate speech synchronization
Offline Content: Cache extracted content for offline reading
Performance: Code splitting and lazy loading
Accessibility: Enhanced screen reader support

🤝 Contributing

We welcome contributions! Please see our Contributing Guidelines for details.

Development Workflow

Fork the repository
Create a feature branch
Make your changes
Add tests if applicable
Submit a pull request

📄 License

This project is licensed under the Apache License 2.0 - see the LICENSE file for details.

🙏 Acknowledgments

React Team for the amazing framework
Tailwind CSS for the utility-first approach
Headless UI for accessible components
Web Speech API for native browser TTS
All contributors who make this project better

Built with ❤️ for accessibility and beautiful reading experiences

Powered by modern web technologies: React, TypeScript, Tailwind CSS, Web Speech API, Hugging Face Kokoro-82M

Name		Name	Last commit message	Last commit date
Latest commit History 47 Commits
.devcontainer		.devcontainer
.github/workflows		.github/workflows
public		public
scripts		scripts
src		src
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
eslint.config.js		eslint.config.js
index.html		index.html
package-lock.json		package-lock.json
package.json		package.json
postcss.config.cjs		postcss.config.cjs
tailwind.config.js		tailwind.config.js
tsconfig.app.json		tsconfig.app.json
tsconfig.json		tsconfig.json
tsconfig.node.json		tsconfig.node.json
vite.config.ts		vite.config.ts

Folders and files

Latest commit

History

Repository files navigation

Web2Speech 🎵

🖼️ UI Preview

✨ Features

🔄 Flexible Input Methods

🎯 Processing Modes

📖 Beautiful Reader Experience

🎙️ Advanced Voice Features

📱 Mobile-First Design

🚀 Technology Stack

Frontend Framework

Styling & UI

PWA & Performance

File Processing

Speech Technology

🔧 Configuration

Hugging Face TTS Setup

Voice Selection

🛠️ Development Setup

Prerequisites

Installation

Available Scripts

🎨 Design System

Color Palette

Typography

Components

🔧 Configuration

PWA Configuration

Tailwind Configuration

🚀 Deployment

Build Process

Hosting Options

📱 PWA Installation

🔄 Future Enhancements

Planned Features

Technical Improvements

🤝 Contributing

Development Workflow

📄 License

🙏 Acknowledgments

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages