Industrial-grade speech recognition toolkit: 170x realtime, 50+ languages, speaker diarization, emotion detection, streaming, and OpenAI-compatible API.
-
Updated
May 26, 2026 - Python
Industrial-grade speech recognition toolkit: 170x realtime, 50+ languages, speaker diarization, emotion detection, streaming, and OpenAI-compatible API.
End-to-end speech recognition large model: 31 languages, dialects, accents, lyrics, hotwords, timestamps, speaker diarization. Trained on tens of millions of hours.
High-performance Google Colab Notebook for fast & accurate audio transcription/translation using OpenAI Whisper. Accelerated on TPUs with PyTorch/XLA. Features an interactive UI for model selection, multi-language support, and long-form audio processing.
Point of Interest Error Rate (PIER) Metric for Code-Switching ASR: A specialized evaluation metric designed to focus on critical points in multilingual speech recognition, providing a more accurate analysis of code-switched utterances.
Real-time transformer-based ASR supporting 100+ languages - Google Cloud integration with noise cancellation & low-latency optimization
AISRT - 本地 AI 字幕生成工具 / local AI subtitle generator for video/audio to SRT, multilingual ASR, timestamp alignment, GUI/CLI batch processing, and local SRT translation.
Add a description, image, and links to the multilingual-asr topic page so that developers can more easily learn about it.
To associate your repository with the multilingual-asr topic, visit your repo's landing page and select "manage topics."