Skip to content

claymore666/transcriber

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

18 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

transcriber

crates.io Downloads docs.rs License Issues Stars

Give it a URL or a file, get back a transcript with timestamps.

Built on whisper.cpp via whisper-rs. Handles the whole pipeline — downloading, audio decoding, resampling, transcription, output formatting — so you don't have to glue it together yourself.

How it works

graph LR
    A["URL or File"] --> B["yt-dlp
    1000+ sites"]
    A --> C

    B --> C["ffmpeg
    decode any format"]

    C --> D["compand + loudnorm
    compress & normalize"]

    D --> E["whisper.cpp
    CPU or GPU"]

    E --> F["Text"]
    E --> G["SRT"]
    E --> H["WebVTT"]
    E --> I["JSON"]

    style A fill:#2563eb,color:#fff,stroke:none
    style D fill:#7c3aed,color:#fff,stroke:none
    style E fill:#047857,color:#fff,stroke:none
    style F fill:#b45309,color:#fff,stroke:none
    style G fill:#b45309,color:#fff,stroke:none
    style H fill:#b45309,color:#fff,stroke:none
    style I fill:#b45309,color:#fff,stroke:none
Loading

Without transcriber you'd wire up yt-dlp, ffmpeg, sample rate conversion, and whisper yourself — different tools, different formats, lots of glue code. With transcriber it's one function call:

let transcript = transcriber::transcribe(url).await?;

Quick start

let transcript = transcriber::transcribe("https://youtube.com/watch?v=dQw4w9WgXcQ").await?;
println!("{}", transcript.text());

Or from a local file:

let transcript = transcriber::transcribe_file("meeting.mp3").await?;

// Get subtitles
println!("{}", transcript.to_srt());
println!("{}", transcript.to_vtt());

// Or structured data
let json = transcript.to_json_pretty()?;

Models are downloaded automatically from HuggingFace on first use and cached locally.

Install

[dependencies]
transcriber = "0.1"

For GPU acceleration:

[dependencies]
transcriber = { version = "0.1", features = ["cuda"] }
# or
transcriber = { version = "0.1", features = ["vulkan"] }

CLI

cargo install transcriber-cli

# Transcribe a YouTube video
transcriber-cli https://youtube.com/watch?v=... --format srt --output subtitles.srt

# Transcribe a local file
transcriber-cli recording.mp3 --model small --language de

# List available models
transcriber-cli --list-models

Options

use transcriber::{TranscribeOptions, Model};

let opts = TranscribeOptions::new()
    .model(Model::LargeV3Turbo)
    .language("de")?
    .word_timestamps(true)
    .translate(true)           // translate to English
    .gpu(true)
    .beam_size(5)?;

let transcript = transcriber::transcribe_with_options(url, &opts).await?;

for segment in &transcript.segments {
    println!("[{:.1}s - {:.1}s] {}", segment.start, segment.end, segment.text);
}

Audio conditioning

All audio is automatically conditioned before transcription via ffmpeg:

  1. Dynamic range compression (compand) — boosts quiet speech (distant speakers, soft voices) while limiting loud peaks. Tuned for speech dynamics with 0.3s attack / 0.8s decay.
  2. Loudness normalization (loudnorm) — EBU R128 normalization targeting -16 LUFS, the optimal input level for whisper.

This is critical for real-world recordings (meetings, lectures, interviews) where speakers are at different distances from the microphone.

Hallucination prevention

Whisper has a known failure mode where the decoder enters a repetition loop, generating the same phrase endlessly — especially on long recordings with quiet passages. transcriber prevents this at two levels:

  • Decoder isolation: each 30-second window starts with a clean decoder slate (n_max_text_ctx=0), preventing hallucinated text from poisoning subsequent windows.
  • Post-processing filter: a rolling-window detector catches repeated segments (exact repeats, alternating A/B patterns, short cycles) and removes them.

What it supports

Audio formats: anything ffmpeg can handle — mp3, wav, ogg, opus, flac, aac, m4a, webm, and more.

URL downloading: YouTube and 1000+ other sites via yt-dlp.

Output formats: plain text, SRT, WebVTT, JSON.

Models: tiny through large-v3-turbo, plus custom GGML files.

Languages: 100 languages with auto-detection.

Requirements

  • ffmpeg — for audio decoding (apt install ffmpeg / brew install ffmpeg)
  • yt-dlp — for URL downloads (pip install yt-dlp). Not needed if you only use local files — build without the download feature to drop this dependency.

Feature flags

Flag Default What it does
download yes URL downloading via yt-dlp
cuda no NVIDIA GPU acceleration
vulkan no Vulkan GPU acceleration

License

Licensed under either of Apache License, Version 2.0 or MIT license at your option.

About

Rust library for audio/video transcription — URL or file in, transcript with timestamps out. Powered by whisper.cpp.

Resources

License

Apache-2.0, MIT licenses found

Licenses found

Apache-2.0
LICENSE-APACHE
MIT
LICENSE-MIT

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages