Turn long-form videos into viral-ready vertical shorts, entirely on your local machine. Free, no API keys, no cloud.
Saw a bunch of paid tools doing this — why pay when you can run it locally? Skate uses faster-whisper for transcription, Ollama for AI ranking, OpenCV for face tracking, and FFmpeg for rendering. Everything runs on your machine.
Input (URL or file)
→ Download (yt-dlp)
→ Transcribe (faster-whisper)
→ Chunk transcript into segments
→ Score heuristically (no AI needed)
→ Rank with local LLM (Ollama)
→ Select best clips
→ Track faces for smart vertical crop
→ Render clips with subtitles burned in
→ Output organized shorts
| Tool | Purpose | Check |
|---|---|---|
| Bun | Runtime & package manager | bun --version |
| FFmpeg | Video cutting & processing | ffmpeg -version |
| yt-dlp | YouTube downloading | yt-dlp --version |
| Ollama | Local LLM for AI ranking | ollama --version |
| Python 3 | Whisper & OpenCV scripts | python3 --version |
| faster-whisper | Local transcription | installed via setup-python |
| OpenCV | Face detection | installed via setup-python |
ollama pull llama3.2:3bgit clone https://github.com/yourusername/skate.git
cd skate
bun installbun run setup-pythonThis creates a virtual environment at ~/.skate/venv and installs:
faster-whisper— speech-to-text with word-level timestampsopencv-contrib-python— face detection via Haar cascadesnumpy— numerical processing
bun linkThen you can run skate from anywhere.
bun start -- clip video.mp4bun start -- youtube https://youtube.com/watch?v=abc123bun start -- https://youtube.com/watch?v=abc123
bun start -- video.mp4bun start -- analyze video.mp4bun start -- render video.mp4bun start -- watch ./videosbun start -- clip video.mp4 --no-crop
bun start -- youtube https://youtube.com/watch?v=abc123 --no-cropBy default, Skate tracks faces for smart vertical framing. Pass --no-crop to use a static center crop instead.
bun start -- doctorConfig is stored at ~/.skate/config.json and auto-created on first run.
{
"model": "llama3.2:3b",
"clips": 10,
"minLength": 20,
"maxLength": 90,
"subtitleStyle": "minimal",
"outputDir": "./output",
"cacheDir": "~/.skate/cache",
"ollamaUrl": "http://localhost:11434"
}| Field | Default | Description |
|---|---|---|
model |
llama3.2:3b |
Ollama model for ranking |
clips |
10 |
Number of clips to produce |
minLength |
20 |
Minimum clip length (seconds) |
maxLength |
90 |
Maximum clip length (seconds) |
subtitleStyle |
minimal |
Subtitle style (minimal, tiktok, mrbeast) |
outputDir |
./output |
Output directory |
cacheDir |
~/.skate/cache |
Cache directory |
ollamaUrl |
http://localhost:11434 |
Ollama API URL |
output/
└── <video-name>/
├── clips/
│ ├── clip-01.mp4
│ ├── clip-02.mp4
│ └── clip-03.mp4
├── captions/
│ ├── clip-01.srt
│ └── clip-02.srt
└── metadata.json
skate/
├── scripts/
│ ├── face_detect.py # OpenCV face detection
│ ├── whisper_transcribe.py # faster-whisper transcription
│ └── requirements.txt # Python dependencies
├── src/
│ ├── commands/
│ │ ├── clip.ts # Process local video
│ │ ├── analyze.ts # Analysis only pipeline
│ │ ├── render.ts # Render from cached analysis
│ │ ├── watch.ts # Watch directory mode
│ │ └── doctor.ts # Dependency checker
│ ├── core/
│ │ ├── pipeline.ts # Main pipeline orchestrator
│ │ ├── downloader.ts # yt-dlp integration
│ │ ├── transcriber.ts # Whisper bridge
│ │ ├── chunker.ts # Transcript chunking
│ │ ├── scorer.ts # Heuristic scoring
│ │ ├── ranker.ts # AI ranking bridge
│ │ ├── tracker.ts # Face tracking
│ │ ├── renderer.ts # FFmpeg rendering
│ │ └── subtitles.ts # SRT/ASS generation
│ ├── ai/
│ │ ├── prompts.ts # LLM prompt templates
│ │ ├── ollama.ts # Ollama API client
│ │ └── ranking.ts # AI ranking logic
│ ├── vision/
│ │ ├── face.ts # Face detection
│ │ ├── scene.ts # Scene detection
│ │ └── crop.ts # Smart crop path
│ ├── ui/
│ │ └── tui.ts # Terminal spinner UI
│ ├── config.ts # Configuration loader
│ ├── types.ts # TypeScript types
│ └── index.tsx # CLI entry point
├── output/ # Rendered clips
├── cache/ # Cached downloads
├── models/ # Local models
├── temp/ # Working files
├── package.json
├── tsconfig.json
└── README.md
| Step | Description |
|---|---|
| Download | Pulls video from YouTube via yt-dlp (or uses local file) |
| Transcribe | Runs faster-whisper for speech-to-text with word-level timestamps |
| Chunk | Splits transcript into 30–90 second natural segments |
| Score | Heuristic scoring — speaking rate, emotion, story structure, hooks |
| Rank | Sends top candidates to Ollama for virality scoring |
| Select | Picks best clips based on combined heuristic + AI scores |
| Track Faces | Detects faces per frame via OpenCV for smart vertical crop |
| Render | Cuts clips, applies crop, burns in subtitles |
| Script | Command |
|---|---|
bun start |
Run Skate |
bun run dev |
Run with watch mode (auto-restart on changes) |
bun run typecheck |
TypeScript type checking |
bun run setup-python |
Create venv and install Python deps |
Skate caches aggressively at ~/.skate/cache:
- Downloaded video/audio files
- Transcripts
- Face tracking data
- Analysis results
Re-running is fast — only changed steps are re-executed.
Every "AI shorts" tool out there charges $20–$50/month or requires API keys that bill per minute. Skate is:
- 100% local — nothing leaves your machine
- Free — no subscriptions, no API costs
- Private — your videos never hit a third-party server
- Customizable — swap models, tweak prompts, adjust scoring