transcriber

Give it a URL or a file, get back a transcript with timestamps.

Built on whisper.cpp via whisper-rs. Handles the whole pipeline — downloading, audio decoding, resampling, transcription, output formatting — so you don't have to glue it together yourself.

How it works

graph LR
    A["URL or File"] --> B["yt-dlp
    1000+ sites"]
    A --> C

    B --> C["ffmpeg
    decode any format"]

    C --> D["compand + loudnorm
    compress & normalize"]

    D --> E["whisper.cpp
    CPU or GPU"]

    E --> F["Text"]
    E --> G["SRT"]
    E --> H["WebVTT"]
    E --> I["JSON"]

    style A fill:#2563eb,color:#fff,stroke:none
    style D fill:#7c3aed,color:#fff,stroke:none
    style E fill:#047857,color:#fff,stroke:none
    style F fill:#b45309,color:#fff,stroke:none
    style G fill:#b45309,color:#fff,stroke:none
    style H fill:#b45309,color:#fff,stroke:none
    style I fill:#b45309,color:#fff,stroke:none

Without transcriber you'd wire up yt-dlp, ffmpeg, sample rate conversion, and whisper yourself — different tools, different formats, lots of glue code. With transcriber it's one function call:

let transcript = transcriber::transcribe(url).await?;

Quick start

let transcript = transcriber::transcribe("https://youtube.com/watch?v=dQw4w9WgXcQ").await?;
println!("{}", transcript.text());

Or from a local file:

let transcript = transcriber::transcribe_file("meeting.mp3").await?;

// Get subtitles
println!("{}", transcript.to_srt());
println!("{}", transcript.to_vtt());

// Or structured data
let json = transcript.to_json_pretty()?;

Models are downloaded automatically from HuggingFace on first use and cached locally.

Install

[dependencies]
transcriber = "0.1"

For GPU acceleration:

[dependencies]
transcriber = { version = "0.1", features = ["cuda"] }
# or
transcriber = { version = "0.1", features = ["vulkan"] }

CLI

cargo install transcriber-cli

# Transcribe a YouTube video
transcriber-cli https://youtube.com/watch?v=... --format srt --output subtitles.srt

# Transcribe a local file
transcriber-cli recording.mp3 --model small --language de

# List available models
transcriber-cli --list-models

Options

use transcriber::{TranscribeOptions, Model};

let opts = TranscribeOptions::new()
    .model(Model::LargeV3Turbo)
    .language("de")?
    .word_timestamps(true)
    .translate(true)           // translate to English
    .gpu(true)
    .beam_size(5)?;

let transcript = transcriber::transcribe_with_options(url, &opts).await?;

for segment in &transcript.segments {
    println!("[{:.1}s - {:.1}s] {}", segment.start, segment.end, segment.text);
}

Audio conditioning

All audio is automatically conditioned before transcription via ffmpeg:

Dynamic range compression (compand) — boosts quiet speech (distant speakers, soft voices) while limiting loud peaks. Tuned for speech dynamics with 0.3s attack / 0.8s decay.
Loudness normalization (loudnorm) — EBU R128 normalization targeting -16 LUFS, the optimal input level for whisper.

This is critical for real-world recordings (meetings, lectures, interviews) where speakers are at different distances from the microphone.

Hallucination prevention

Whisper has a known failure mode where the decoder enters a repetition loop, generating the same phrase endlessly — especially on long recordings with quiet passages. transcriber prevents this at two levels:

Decoder isolation: each 30-second window starts with a clean decoder slate (n_max_text_ctx=0), preventing hallucinated text from poisoning subsequent windows.
Post-processing filter: a rolling-window detector catches repeated segments (exact repeats, alternating A/B patterns, short cycles) and removes them.

What it supports

Audio formats: anything ffmpeg can handle — mp3, wav, ogg, opus, flac, aac, m4a, webm, and more.

URL downloading: YouTube and 1000+ other sites via yt-dlp.

Output formats: plain text, SRT, WebVTT, JSON.

Models: tiny through large-v3-turbo, plus custom GGML files.

Languages: 100 languages with auto-detection.

Requirements

ffmpeg — for audio decoding (apt install ffmpeg / brew install ffmpeg)
yt-dlp — for URL downloads (pip install yt-dlp). Not needed if you only use local files — build without the download feature to drop this dependency.

Feature flags

Flag	Default	What it does
`download`	yes	URL downloading via yt-dlp
`cuda`	no	NVIDIA GPU acceleration
`vulkan`	no	Vulkan GPU acceleration

License

Licensed under either of Apache License, Version 2.0 or MIT license at your option.

Name		Name	Last commit message	Last commit date
Latest commit History 18 Commits
transcriber-cli		transcriber-cli
transcriber		transcriber
.gitignore		.gitignore
Cargo.lock		Cargo.lock
Cargo.toml		Cargo.toml
LICENSE-APACHE		LICENSE-APACHE
LICENSE-MIT		LICENSE-MIT
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

transcriber

How it works

Quick start

Install

CLI

Options

Audio conditioning

Hallucination prevention

What it supports

Requirements

Feature flags

License

About

Licenses found

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

transcriber

How it works

Quick start

Install

CLI

Options

Audio conditioning

Hallucination prevention

What it supports

Requirements

Feature flags

License

About

Resources

License

Licenses found

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages