PAVED is a Dockerized, offline-first command-line toolkit that does two hard things well: it repairs broken / unplayable video containers that other tools give up on, and it transcribes speech with four swappable local engines β all without ever touching the cloud.
No API keys. No uploads. No telemetry. Your media never leaves your machine.
docker compose run --rm app repair /data/broken.mp4
docker compose run --rm app transcribe /data/talk.mp4 --llm summaryMost "video repair" tools are either expensive black-box GUIs or shell one-liners that re-encode (and degrade) your footage on the first try. Most transcription tools ship your audio to someone else's servers. PAVED was built to do neither.
It was born from a real disaster: a Clipchamp export that wrote a valid index but left the
mdat box header zeroed and the opening media as an unflushed hole β a file every player
refused to open. PAVED reconstructs exactly that kind of damage, losslessly when possible,
and tells you the truth when it can't.
| π‘οΈ Never destroys your source | Every fix runs on a copy. The original is read-only, always. |
| π¬ Honest about loss | Lossless fixes are tried first. A lossy salvage reports exactly what was lost β it never lies about a clean recovery. |
| β Verifies before it claims success | Every repaired file must pass a full ffmpeg decode before PAVED calls it fixed. |
| π Four transcription engines, swappable | faster-whisper, whisper.cpp, Vosk, PocketSphinx β auto-selected or pick your own. |
| π€ Optional LLM polish | Post-process transcripts through any of 8 providers β Ollama (local), Claude, Gemini, OpenAI, DeepSeek, Qwen, or any OpenAI-compatible endpoint. Always fail-soft. |
| π³ One-command Docker | ffmpeg + every engine baked into the image. Zero host setup. |
| π΄ Truly offline | No accounts, no keys, no network calls in the hot path. |
| π§© Scriptable | --json on every command for clean automation and CI pipelines. |
- Quick Start
- Repair Β· Transcribe Β· Local-LLM Polish
- Running Without Docker
- CLI Reference
- Supported Formats
- How It Works
- Configuration
- Project Layout
- Testing
- Roadmap
- FAQ
- Contributing Β· License
Everything runs in Docker with ffmpeg bundled β no host setup beyond Docker itself.
git clone https://github.com/williamblair333/paved.git
cd paved
docker compose build # builds the image (ffmpeg + all engines)
mkdir -p data # drop your video files here β mounted at /dataThen point PAVED at any file or folder under ./data:
docker compose run --rm app probe /data/broken.mp4
docker compose run --rm app repair /data/broken.mp4
docker compose run --rm app transcribe /data/talk.mp4
docker compose run --rm app enginesDiagnose and salvage broken, truncated, or unplayable video containers.
# Diagnose only β write nothing:
docker compose run --rm app repair /data/broken.mp4 --dry-run
# Repair one file (output written alongside as <name>.repaired.mp4):
docker compose run --rm app repair /data/broken.mp4
# Repair an entire folder (e.g. a mounted USB recovery copy):
docker compose run --rm app repair /data --recursiveThe pipeline: probe β copy β apply strategies (on the copy) β decode-verify β report
The source file is never modified. Every fix runs on a copy, and the result must pass a full ffmpeg decode before success is claimed. When a salvage is lossy (e.g. an unrecoverable damaged head region), the report says exactly what was lost. It never claims a lossy salvage is lossless.
Fault strategies, tried cheapest-first:
| Strategy | Fixes | Lossy? |
|---|---|---|
reconstruct_mdat_header |
missing / zeroed mdat box header |
β no β lossless |
remux_faststart |
index / streaming / faststart quirks | β no β lossless |
salvage_playable_span |
damaged / unflushed head region | |
transcode_rescue |
otherwise-undecodable streams |
The taxonomy is extensible β add a function to
src/paved/repair/strategies.pyand register it inFAULT_STRATEGIES.
Offline speech-to-text from video or audio files, with four swappable engines.
# Best available engine (defaults to faster-whisper); writes <name>.txt + <name>.json:
docker compose run --rm app transcribe /data/talk.mp4
# Pick an engine and/or model:
docker compose run --rm app transcribe /data/talk.mp4 --engine vosk
docker compose run --rm app transcribe /data/talk.mp4 --engine faster-whisper --model small
# Transcribe a whole folder:
docker compose run --rm app transcribe /data --recursive
# See which engines are installed and which is the default:
docker compose run --rm app engines| Engine | Priority | Accuracy | Notes |
|---|---|---|---|
| faster-whisper β | 10 (default) | Excellent | CTranslate2 Whisper, CPU int8 β fast & accurate on plain CPUs |
| whisper.cpp | 20 | Excellent | Pure-C++ Whisper via pywhispercpp |
| Vosk | 30 | Good | Lightweight Kaldi models, low memory |
| PocketSphinx | 90 | Basic | Legacy fallback, kept by request |
PAVED auto-selects the highest-priority installed engine, or transcribes with whatever you
pass to --engine. Every run emits both a plain-text .txt and a structured .json (with
per-segment timestamps where the engine provides them).
Optionally post-process a raw transcript through any of 8 LLM providers for cleanup or summarization:
# Local (default β no keys needed)
docker compose run --rm app transcribe /data/talk.mp4 --llm clean
# Cloud providers
ANTHROPIC_API_KEY=sk-... paved transcribe talk.mp4 --llm clean --llm-provider anthropic
GOOGLE_API_KEY=... paved transcribe talk.mp4 --llm summary --llm-provider google
DEEPSEEK_API_KEY=... paved transcribe talk.mp4 --llm clean --llm-provider deepseek
# Your own Claude subscription (no API key β uses local claude CLI session)
paved transcribe talk.mp4 --llm clean --llm-provider claude-cli| Provider | --llm-provider |
Auth | Default model |
|---|---|---|---|
| Ollama (local) | ollama (default) |
none | llama3.2:3b |
| Anthropic Claude | anthropic |
ANTHROPIC_API_KEY |
claude-sonnet-4-6 |
| Claude CLI (subscription) | claude-cli |
OAuth session | CLI default |
| Google Gemini | google |
GOOGLE_API_KEY |
gemini-2.0-flash |
| OpenAI | openai |
OPENAI_API_KEY |
gpt-4o-mini |
| DeepSeek | deepseek |
DEEPSEEK_API_KEY |
deepseek-chat |
| Qwen / Alibaba | qwen |
QWEN_API_KEY |
qwen-plus |
| Any OpenAI-compat | openai-compat |
PAVED_LLM_API_KEY + PAVED_LLM_BASE_URL |
set PAVED_LLM_MODEL |
The LLM step is always fail-soft: if the provider is unreachable, the key is missing, or the call fails, transcription still succeeds and emits the raw transcript with a warning.
Set the default provider via PAVED_LLM_PROVIDER to avoid typing --llm-provider every time.
All providers fall back to PAVED_LLM_API_KEY if a provider-specific key isn't set.
pip install -e ".[all,ffmpeg]" # or pick specific extras (see below)
paved repair /path/to/video.mp4
paved transcribe /path/to/video.mp4 --engine faster-whisperPick only what you need via optional extras:
| Extra | Installs |
|---|---|
faster-whisper |
faster-whisper engine |
whispercpp |
whisper.cpp engine |
vosk |
Vosk engine |
sphinx |
PocketSphinx engine |
ffmpeg |
a bundled static ffmpeg (via imageio-ffmpeg) |
all |
every transcription engine |
dev |
the test suite (pytest) |
Repair-only? Install the bare package β it has zero required dependencies beyond ffmpeg, so a lightweight install never fails on an engine build you don't need.
paved probe PATH [--json]
paved repair PATH [--out DIR] [--dry-run] [--recursive] [--json]
paved transcribe PATH [--engine E] [--model M] [--llm off|clean|summary]
[--llm-provider PROVIDER] [--llm-model MODEL]
[--out DIR] [--recursive]
paved engines
paved --version
PATH may be a single file or a directory (use --recursive to descend).
Exit codes are script-friendly: 0 success, 1 a real failure, 2 nothing found / fault present.
| Extensions | |
|---|---|
| Video (repair + transcribe) | .mp4 .mov .m4v .mkv .webm .avi |
| Audio (transcribe) | .mp3 .wav .m4a .aac .flac .ogg |
PAVED is a small, dependency-light Python package with a clean separation of concerns:
mp4boxβ a pure-Python ISO-BMFF (MP4/MOV) box walker. No native libs.probeβ a fault classifier that names what's wrong with a container (and scans formoovviammap, so it won't OOM on multi-GB files).repairβ orchestrates the copy β strategy β decode-verify β report loop.transcribeβ an engine registry that lazy-imports each backend, so one missing optional package never breaks the others.llmβ fail-soft multi-provider LLM post-processor (8 providers, stdlib HTTP only).ffmpegβ a thin, configurable wrapper around the system (or bundled) ffmpeg binary.
Every transcription engine runs fully offline, and the repair path makes no network calls at all.
| Environment variable | Purpose | Default |
|---|---|---|
PAVED_LLM_PROVIDER |
Default LLM provider | ollama |
PAVED_LLM_MODEL |
Model override for chosen provider | provider default |
PAVED_LLM_API_KEY |
Fallback API key (all cloud providers) | β |
ANTHROPIC_API_KEY |
Anthropic-specific key | β |
GOOGLE_API_KEY |
Google Gemini key | β |
OPENAI_API_KEY |
OpenAI key | β |
DEEPSEEK_API_KEY |
DeepSeek key | β |
QWEN_API_KEY |
Qwen / Alibaba key | β |
PAVED_LLM_BASE_URL |
Base URL for openai-compat provider |
β |
OLLAMA_HOST |
Ollama endpoint | http://host.docker.internal:11434 |
PAVED_FFMPEG |
Path to a specific ffmpeg binary | auto-detected / bundled |
The Compose service mounts ./data β /data and wires host.docker.internal so the container
can reach an Ollama instance running on your host.
paved/
βββ src/paved/
β βββ cli.py # argparse entry point (probe/repair/transcribe/engines)
β βββ mp4box.py # pure-Python ISO-BMFF box walker
β βββ probe.py # container fault classifier
β βββ ffmpeg.py # ffmpeg wrapper (decode-verify, remux, transcode)
β βββ report.py # human + JSON reporting
β βββ repair/
β β βββ strategies.py # fault β fix strategies (FAULT_STRATEGIES registry)
β βββ transcribe/
β β βββ base.py # Engine ABC, audio extraction, Transcript model
β β βββ engines.py # faster-whisper / whisper.cpp / Vosk / PocketSphinx
β βββ llm/ # fail-soft multi-provider LLM post-step
β βββ _base.py # LLMResult, PROMPTS, LLMProvider ABC
β βββ _providers.py # 8 providers (Ollama, Anthropic, claude-cli, Gemini, OpenAI, DeepSeek, Qwen, openai-compat)
βββ tests/ # 46 unit tests β no ffmpeg/models/network needed
βββ docs/ # design spec
βββ Dockerfile # native deps + ffmpeg FIRST, then pip
βββ docker-compose.yml
βββ pyproject.toml
pip install -e ".[dev]"
pytestThe suite (46/46 passing) covers the box walker, fault classification, mdat
reconstruction, the engine registry/selection logic, all 8 LLM providers (mocked), and CLI
parsing β and needs no ffmpeg, no models, and no network, so it runs anywhere in seconds.
v1.0 is complete and merged. Candidate work for future releases:
- π₯
yt-dlpURL download mode - π΅ First-class audio extract / convert mode
- π Watch-folder daemon for hands-off batch processing
- π§ͺ Dedicated test for the 64-bit extended-size
moovscan path
Does anything get uploaded to the cloud? The repair path and all transcription engines make zero network calls. The optional LLM post-step defaults to a local Ollama instance; cloud providers are opt-in and require you to supply your own API key.
Will repair re-encode and degrade my video? Only as a last resort, and only when nothing lossless works β and the report tells you when that happens. Lossless strategies are always tried first.
Can it hurt my original file? No. The source is opened read-only and every fix is applied to a copy.
Do I need a GPU?
No. The target is CPU-only; faster-whisper uses int8 quantization and runs comfortably on a
plain CPU.
What if I only want repair? Install the bare package β it pulls in no transcription engines and stays light.
Extending PAVED is intentionally easy:
- New repair strategy β add a function to
src/paved/repair/strategies.pyand register it inFAULT_STRATEGIES. - New transcription engine β subclass
Engineinsrc/paved/transcribe/engines.pyand append it toALL_ENGINES. - New LLM provider β subclass
LLMProviderinsrc/paved/llm/_providers.pyand register it insrc/paved/llm/__init__.py's_PROVIDERSdict.
Run pytest before opening a PR. See the full design spec in
docs/superpowers/specs/2026-06-18-paved-toolkit-design.md.
AGPL-3.0 Β© William Blair