Give it a URL or a file, get back a transcript with timestamps.
Built on whisper.cpp via whisper-rs. Handles the whole pipeline — downloading, audio decoding, resampling, transcription, output formatting — so you don't have to glue it together yourself.
graph LR
A["URL or File"] --> B["yt-dlp
1000+ sites"]
A --> C
B --> C["ffmpeg
decode any format"]
C --> D["compand + loudnorm
compress & normalize"]
D --> E["whisper.cpp
CPU or GPU"]
E --> F["Text"]
E --> G["SRT"]
E --> H["WebVTT"]
E --> I["JSON"]
style A fill:#2563eb,color:#fff,stroke:none
style D fill:#7c3aed,color:#fff,stroke:none
style E fill:#047857,color:#fff,stroke:none
style F fill:#b45309,color:#fff,stroke:none
style G fill:#b45309,color:#fff,stroke:none
style H fill:#b45309,color:#fff,stroke:none
style I fill:#b45309,color:#fff,stroke:none
Without transcriber you'd wire up yt-dlp, ffmpeg, sample rate conversion, and whisper yourself — different tools, different formats, lots of glue code. With transcriber it's one function call:
let transcript = transcriber::transcribe(url).await?;let transcript = transcriber::transcribe("https://youtube.com/watch?v=dQw4w9WgXcQ").await?;
println!("{}", transcript.text());Or from a local file:
let transcript = transcriber::transcribe_file("meeting.mp3").await?;
// Get subtitles
println!("{}", transcript.to_srt());
println!("{}", transcript.to_vtt());
// Or structured data
let json = transcript.to_json_pretty()?;Models are downloaded automatically from HuggingFace on first use and cached locally.
[dependencies]
transcriber = "0.1"For GPU acceleration:
[dependencies]
transcriber = { version = "0.1", features = ["cuda"] }
# or
transcriber = { version = "0.1", features = ["vulkan"] }cargo install transcriber-cli
# Transcribe a YouTube video
transcriber-cli https://youtube.com/watch?v=... --format srt --output subtitles.srt
# Transcribe a local file
transcriber-cli recording.mp3 --model small --language de
# List available models
transcriber-cli --list-modelsuse transcriber::{TranscribeOptions, Model};
let opts = TranscribeOptions::new()
.model(Model::LargeV3Turbo)
.language("de")?
.word_timestamps(true)
.translate(true) // translate to English
.gpu(true)
.beam_size(5)?;
let transcript = transcriber::transcribe_with_options(url, &opts).await?;
for segment in &transcript.segments {
println!("[{:.1}s - {:.1}s] {}", segment.start, segment.end, segment.text);
}All audio is automatically conditioned before transcription via ffmpeg:
- Dynamic range compression (
compand) — boosts quiet speech (distant speakers, soft voices) while limiting loud peaks. Tuned for speech dynamics with 0.3s attack / 0.8s decay. - Loudness normalization (
loudnorm) — EBU R128 normalization targeting -16 LUFS, the optimal input level for whisper.
This is critical for real-world recordings (meetings, lectures, interviews) where speakers are at different distances from the microphone.
Whisper has a known failure mode where the decoder enters a repetition loop, generating the same phrase endlessly — especially on long recordings with quiet passages. transcriber prevents this at two levels:
- Decoder isolation: each 30-second window starts with a clean decoder slate (
n_max_text_ctx=0), preventing hallucinated text from poisoning subsequent windows. - Post-processing filter: a rolling-window detector catches repeated segments (exact repeats, alternating A/B patterns, short cycles) and removes them.
Audio formats: anything ffmpeg can handle — mp3, wav, ogg, opus, flac, aac, m4a, webm, and more.
URL downloading: YouTube and 1000+ other sites via yt-dlp.
Output formats: plain text, SRT, WebVTT, JSON.
Models: tiny through large-v3-turbo, plus custom GGML files.
Languages: 100 languages with auto-detection.
- ffmpeg — for audio decoding (
apt install ffmpeg/brew install ffmpeg) - yt-dlp — for URL downloads (
pip install yt-dlp). Not needed if you only use local files — build without thedownloadfeature to drop this dependency.
| Flag | Default | What it does |
|---|---|---|
download |
yes | URL downloading via yt-dlp |
cuda |
no | NVIDIA GPU acceleration |
vulkan |
no | Vulkan GPU acceleration |
Licensed under either of Apache License, Version 2.0 or MIT license at your option.