A modular, AI-native content-to-video pipeline for structured storytelling
StorySynth Engine is not a video automation tool. It is a deterministic, provider-based, multimodal generation system that bridges structured knowledge extraction, narrative construction, and audiovisual rendering. Built as a foundation for AI-native content systems, it transforms structured knowledge sources into coherent, reproducible video narratives through a validated, extensible pipeline.
The engine operates on a clear architectural principle: knowledge → facts → scenes → assets → timeline → video. Each stage is deterministic, validated, and provider-agnostic, enabling composition of production-grade storytelling pipelines without vendor lock-in or mock dependencies.
Modern content generation systems face a fundamental challenge: bridging structured knowledge sources with narrative coherence and deterministic media output. Most solutions either rely on brittle, single-vendor pipelines or sacrifice determinism for flexibility.
StorySynth Engine addresses this by providing:
- Structured knowledge extraction with validation and sanitization layers
- Deterministic narrative construction that preserves factual accuracy while enabling storytelling flow
- Provider-based extensibility that allows swapping components without architectural changes
- Reproducible outputs through strict validation gates and caching strategies
The architecture is designed for composability: each provider is a discrete, testable unit. The pipeline enforces validation at every stage, ensuring that failures are caught early and outputs are consistent. This makes it suitable for production systems where reliability and extensibility matter more than quick demos.
Knowledge Source
↓
[Extraction & Sanitization]
↓
Clean Facts
↓
[Segmentation & Structuring]
↓
Structured Scenes
↓
[Asset Generation]
↓
Images + Audio Assets
↓
[Timeline Assembly]
↓
Storyboard
↓
[Media Rendering]
↓
Video Output
Each stage is independently cacheable, testable, and replaceable. The pipeline enforces strict validation before advancing, ensuring that downstream stages receive only validated inputs.
┌─────────────┐
│ CLI │ Commander + @inquirer/prompts
└──────┬──────┘
│
▼
┌─────────────┐
│ Pipeline │ Step orchestration with caching & validation
└──────┬──────┘
│
├──► Wikipedia Provider ──► Cache
├──► Sanitize & Segment ──► Validation
├──► Keywords (RAKE) ──► Cache
├──► Images (Unsplash) ──► Download & Cache
├──► TTS (ElevenLabs) ──► Generate & Cache
├──► Storyboard Generator ──► Validation
└──► Renderer (FFmpeg) ──► Video Output
The pipeline orchestrates discrete providers through a step-based execution model. Each step can be cached, validated, and independently tested. Providers are registered through a plugin system, enabling runtime composition of different content sources, processing algorithms, and output formats.
- Deterministic pipeline: Given the same inputs and configuration, the pipeline produces identical outputs. Caching ensures reproducibility.
- Provider-based extensibility: New content sources, processing algorithms, or renderers can be added without modifying core pipeline logic.
- Strict validation: Every stage validates its inputs and outputs. Invalid data is rejected before advancing to the next stage.
- Reproducible outputs: Caching strategies and deterministic algorithms ensure that runs are consistent and debuggable.
-
Node.js 20+
-
FFmpeg (required for video rendering with
--render mp4)Install FFmpeg based on your platform:
- macOS:
brew install ffmpeg - Ubuntu/Debian:
sudo apt-get install ffmpeg - Arch Linux:
sudo pacman -S ffmpeg - Windows:
choco install ffmpegor download from ffmpeg.org
Verify installation:
ffmpeg -version - macOS:
npm install
npm run build- Copy
.env.exampleto.env - Add your API keys:
- Unsplash: Get a free API key at unsplash.com/developers
- ElevenLabs: Get an API key at elevenlabs.io
- OpenAI (optional): Get an API key at platform.openai.com for enhanced script generation
cp .env.example .env
# Edit .env with your API keysStoryboard only (no video):
npm run build
./dist/cli/index.js run --query "Alan Turing" --lang en --sentences 8 --render storyboardFull video generation (with MP4 output):
./dist/cli/index.js run --query "Alan Turing" --lang en --sentences 8 --render mp4This will generate a complete video at out/<jobId>/render/video.mp4.
Interactive mode (prompts for missing options):
./dist/cli/index.js runEach run creates a job directory with all artifacts:
out/
└── <jobId>/
├── config.resolved.json # Resolved configuration
├── content.json # Processed content (sentences, keywords)
├── storyboard.json # Timeline with asset references
├── assets/
│ ├── images/
│ │ ├── <imageId1>.jpg
│ │ └── <imageId2>.jpg
│ └── audio/
│ ├── <sentenceId1>.mp3
│ └── <sentenceId2>.mp3
└── render/ # Only if --render mp4
├── segments/
│ ├── 0000.mp4 # Individual video segments
│ ├── 0001.mp4
│ └── concat.txt # FFmpeg concat file
└── video.mp4 # Final rendered video
Run the video generation pipeline.
videomaker run [options]
Options:
--source <source> Content source (default: wikipedia)
--query <query> Search query
--lang <lang> Language code (en, pt, es, fr)
--sentences <number> Maximum sentences (1-50)
--out <dir> Output directory (default: ./out)
--render <mode> Render mode: storyboard or mp4 (default: storyboard)Inspect a storyboard file.
videomaker inspect <storyboard.json>Clear the cache directory.
videomaker cleanList available providers.
videomaker providers| Category | Provider | Description | API Key Required |
|---|---|---|---|
| Source | wikipedia |
Wikipedia REST API | No |
| Processing | keywords |
RAKE algorithm (local) | No |
| Images | unsplash |
Unsplash API | Yes |
| TTS | elevenlabs |
ElevenLabs API | Yes |
| Renderer | ffmpeg |
FFmpeg video rendering | No |
The engine transforms raw knowledge sources into structured, validated narratives through a multi-stage pipeline with explicit guardrails and deterministic processing.
- Uses MediaWiki API with
prop=extractsfor clean plaintext (no HTML parsing) - Fetches summary separately from REST API for introduction/hook
- Filters out infobox content and structured data dumps
- Never parses HTML - only uses official API endpoints
Content passes through multiple sanitization layers before segmentation:
- Citation removal: Removes
[1],[note 1],[citation needed]markers - HTML fragment removal: Strips Wikipedia UI elements (
mw-,srcset,upload.wikimedia.org, etc.) - Section filtering: Removes "See also", "References", "External links", "Further reading" sections
- Short line removal: Filters out lines shorter than 30 characters (unless ending with punctuation)
- Whitespace normalization: Collapses multiple spaces, removes excessive parentheses
Invalid content is rejected at this stage, preventing downstream errors.
Content is split by paragraphs first, then into sentences with strict constraints:
- Minimum length: 40 characters
- Maximum length: 220 characters
- Filters bad sentences:
- Too many numbers (>30% digits)
- Too many uppercase characters (>30% uppercase)
- Suspicious tokens (mw-, src=, id=, etc.)
The system generates 6-8 structured scenes using one of two approaches:
Option A: LLM-Enhanced (Optional)
- Uses GPT-4o-mini or GPT-3.5-turbo to create engaging scenes
- Each scene includes:
- Narration: 1-2 short sentences (max 300 chars)
- Caption: Brief caption (max 100 chars)
- Visual Keywords: 3-5 keywords for image search
- Scene structure:
- Hook (impact statement)
- Background (who + why important) 3-5. Main contributions 6+. Impact/legacy/modern relevance
Option B: Rule-based (Default)
- Deterministic storytelling that operates without LLM dependencies
- Uses summary for hook, top sentences for contributions
- Produces structured scenes with narration, captions, and keywords
- Guarantees consistent output regardless of API availability
LLM usage is an optional enhancement layer, not a dependency. The system is designed to function deterministically without external AI services.
Before rendering, all scenes are validated against strict criteria:
- No "click here" phrases
- No Wikipedia UI elements
- No HTML tags
- Narration length ≤ 300 characters
- No excessive whitespace
- Caption length ≤ 100 characters
- Visual keywords present
Scenes that fail validation are rejected, ensuring only high-quality content reaches the rendering stage.
OpenAI Settings (optional, for enhanced script generation):
OPENAI_API_KEY: Your OpenAI API keyOPENAI_MODEL: Model to use (default:gpt-4o-mini)SCRIPT_TEMPERATURE: Creativity level 0-2 (default:0.7)USE_LLM_FOR_SCRIPT: Enable/disable LLM (default:true)
If OpenAI is not configured, the system automatically falls back to rule-based scene generation.
- ✅ Real Integrations: No mocks - all providers use real APIs
- ✅ Clean Narration: Advanced sanitization removes Wikipedia garbage
- ✅ Structured Scenes: 6-8 engaging scenes with proper storytelling flow
- ✅ LLM Support: Optional OpenAI integration for enhanced script quality
- ✅ Caching: Disk-based caching for expensive operations
- ✅ Type Safety: Full TypeScript with strict mode
- ✅ Modular: Provider-based architecture for easy extension
- ✅ CLI: Modern CLI with flags and interactive prompts
- ✅ Testing: Comprehensive test suite with Vitest
- ✅ CI/CD: GitHub Actions workflow
- ✅ Documentation: Architecture docs and examples
# Install dependencies
npm install
# Development mode (watch)
npm run dev
# Build
npm run build
# Test
npm test
# Lint
npm run lint
# Format
npm run format
# Type check
npm run typecheckIf you see ffmpeg not found error:
- Verify FFmpeg is installed: Run
ffmpeg -versionin your terminal - Check PATH: Ensure FFmpeg is in your system PATH
- Reinstall if needed: Follow platform-specific installation instructions above
- Restart terminal: After installation, restart your terminal session
If video rendering fails with an error:
- Check asset files: Ensure all image and audio files exist in
out/<jobId>/assets/ - Check FFmpeg version: Some older versions may have compatibility issues
- Check disk space: Video rendering requires sufficient disk space
- Review error message: FFmpeg error output is included in the error message for debugging
If storyboard is empty or missing assets:
- Check API keys: Ensure
UNSPLASH_ACCESS_KEYandELEVENLABS_API_KEYare set in.env - Check API quotas: Verify you haven't exceeded API rate limits
- Review logs: Check the console output for specific error messages
- Additional image providers (Pexels, Pixabay)
- Additional TTS providers (Google Cloud TTS, AWS Polly)
- Video templates and transitions
- YouTube upload integration
- Batch processing mode
- Web UI
MIT