yt-vision-pro

YouTube visual analysis pipeline — chapter-aware frame extraction, OCR-first slide preservation, quality filtering, and LLM-ready manifest synthesis.

The engine still lives in the yt_vision_v2 package during migration, but the primary tool name is now yt-vision-pro. The legacy ytv2 console alias is still available.

What it does

Downloads a YouTube video + captions via yt-dlp
Parses chapters (or generates synthetic 15-min chunks for unchaptered videos)
Detects scene boundaries with ContentDetector or AdaptiveDetector
Extracts scene-start frames plus optional within-scene samples
Runs OCR on each frame before deduplication
Filters black/blurry frames and deduplicates with slide-aware pHash thresholds
Aligns captions to frames (YouTube VTT or whisper fallback)
Generates chunked Markdown manifests with density metadata for LLM synthesis

Example Images:

Video from: https://www.youtube.com/watch?v=24t04HzoIXY (2hour 20min video) Rough estimate: ~100-120K tokens for the full deep-research run

Video from: https://youtu.be/KZPo15M2DbM (6 min video) Rough estimate: ~50-70K tokens for the full deep-research run

Prerequisites

Python 3.10+
ffmpeg on PATH (winget install ffmpeg on Windows)

Install

pip install -e ".[dev]"

# Optional: whisper fallback for videos without captions
pip install -e ".[whisper]"

Usage

# Basic — process a YouTube video
yt-vision-pro <youtube-url>

# Custom cache directory
yt-vision-pro <youtube-url> --cache-dir ./my-cache

# Skip OCR (faster)
yt-vision-pro <youtube-url> --no-ocr

# Skip quality filters
yt-vision-pro <youtube-url> --no-filter

# Use the adaptive detector instead of content-based detection
yt-vision-pro <youtube-url> --detector adaptive

# Sample the first hour densely for lecture-heavy videos
yt-vision-pro <youtube-url> --dense-until 01:00:00

# Force specific chapters to high density
yt-vision-pro <youtube-url> --dense-chapters 0,1,2

# Re-run from scratch
yt-vision-pro <youtube-url> --force

# Resume from a specific stage (fetch, extract, ocr, dedup-with-ocr-context, align, manifest)
yt-vision-pro <youtube-url> --from-stage dedup-with-ocr-context

# Legacy alias still works
ytv2 <youtube-url>

Density model

high: 3s within-scene sampling, loose near-duplicate removal, strongest slide preservation
normal: 5s within-scene sampling, balanced deduplication
low: 15s within-scene sampling, aggressive deduplication for conversational videos

Use --density to set the default tier, --dense-chapters to promote specific chapter indices, and --dense-until to promote everything before a time cutoff.

Pipeline stages

Stage	Description	Sentinel
fetch	Download video, captions, info.json via yt-dlp	`.stages/fetch.done`
extract	Scene detection + raw frame extraction	`.stages/extract.done`
ocr	RapidOCR on each frame	`.stages/ocr.done`
dedup-with-ocr-context	Quality filtering + slide-aware dedup	`.stages/dedup-with-ocr-context.done`
align	Parse captions (YouTube VTT or whisper fallback)	`.stages/align.done`
manifest	Generate chunked Markdown manifests	`.stages/manifest.done`

Each stage writes a sentinel file. On re-run, completed stages are skipped. Use --force to clear all sentinels or --from-stage <name> to re-run from a specific point.

Output

Single-chapter videos: cache/manifest.md
Multi-chapter videos: cache/manifests/manifest-00-intro.md, etc.

Feed the manifest(s) to an LLM (Copilot Chat, Claude Code) for synthesis into research notes.

Tests

pytest tests/ -v

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
skills		skills
src/yt_vision_pro		src/yt_vision_pro
tests		tests
.gitignore		.gitignore
CONTEXT.md		CONTEXT.md
README.md		README.md
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

yt-vision-pro

What it does

Example Images:

Prerequisites

Install

Usage

Density model

Pipeline stages

Output

Tests

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

yt-vision-pro

What it does

Example Images:

Prerequisites

Install

Usage

Density model

Pipeline stages

Output

Tests

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages