StorySynth Engine

A modular, AI-native content-to-video pipeline for structured storytelling

StorySynth Engine is not a video automation tool. It is a deterministic, provider-based, multimodal generation system that bridges structured knowledge extraction, narrative construction, and audiovisual rendering. Built as a foundation for AI-native content systems, it transforms structured knowledge sources into coherent, reproducible video narratives through a validated, extensible pipeline.

The engine operates on a clear architectural principle: knowledge → facts → scenes → assets → timeline → video. Each stage is deterministic, validated, and provider-agnostic, enabling composition of production-grade storytelling pipelines without vendor lock-in or mock dependencies.

Why

Modern content generation systems face a fundamental challenge: bridging structured knowledge sources with narrative coherence and deterministic media output. Most solutions either rely on brittle, single-vendor pipelines or sacrifice determinism for flexibility.

StorySynth Engine addresses this by providing:

Structured knowledge extraction with validation and sanitization layers
Deterministic narrative construction that preserves factual accuracy while enabling storytelling flow
Provider-based extensibility that allows swapping components without architectural changes
Reproducible outputs through strict validation gates and caching strategies

The architecture is designed for composability: each provider is a discrete, testable unit. The pipeline enforces validation at every stage, ensuring that failures are caught early and outputs are consistent. This makes it suitable for production systems where reliability and extensibility matter more than quick demos.

Conceptual Flow

Knowledge Source
    ↓
[Extraction & Sanitization]
    ↓
Clean Facts
    ↓
[Segmentation & Structuring]
    ↓
Structured Scenes
    ↓
[Asset Generation]
    ↓
Images + Audio Assets
    ↓
[Timeline Assembly]
    ↓
Storyboard
    ↓
[Media Rendering]
    ↓
Video Output

Each stage is independently cacheable, testable, and replaceable. The pipeline enforces strict validation before advancing, ensuring that downstream stages receive only validated inputs.

Architecture

┌─────────────┐
│   CLI       │  Commander + @inquirer/prompts
└──────┬──────┘
       │
       ▼
┌─────────────┐
│  Pipeline   │  Step orchestration with caching & validation
└──────┬──────┘
       │
       ├──► Wikipedia Provider ──► Cache
       ├──► Sanitize & Segment ──► Validation
       ├──► Keywords (RAKE) ──► Cache
       ├──► Images (Unsplash) ──► Download & Cache
       ├──► TTS (ElevenLabs) ──► Generate & Cache
       ├──► Storyboard Generator ──► Validation
       └──► Renderer (FFmpeg) ──► Video Output

The pipeline orchestrates discrete providers through a step-based execution model. Each step can be cached, validated, and independently tested. Providers are registered through a plugin system, enabling runtime composition of different content sources, processing algorithms, and output formats.

Engineering Principles

Deterministic pipeline: Given the same inputs and configuration, the pipeline produces identical outputs. Caching ensures reproducibility.
Provider-based extensibility: New content sources, processing algorithms, or renderers can be added without modifying core pipeline logic.
Strict validation: Every stage validates its inputs and outputs. Invalid data is rejected before advancing to the next stage.
Reproducible outputs: Caching strategies and deterministic algorithms ensure that runs are consistent and debuggable.

Quick Start

Prerequisites

Node.js 20+
FFmpeg (required for video rendering with --render mp4)

Install FFmpeg based on your platform:
- macOS: brew install ffmpeg
- Ubuntu/Debian: sudo apt-get install ffmpeg
- Arch Linux: sudo pacman -S ffmpeg
- Windows: choco install ffmpeg or download from ffmpeg.org
Verify installation: ffmpeg -version

Installation

npm install
npm run build

Configuration

Copy .env.example to .env
Add your API keys:
- Unsplash: Get a free API key at unsplash.com/developers
- ElevenLabs: Get an API key at elevenlabs.io
- OpenAI (optional): Get an API key at platform.openai.com for enhanced script generation

cp .env.example .env
# Edit .env with your API keys

First Run

Storyboard only (no video):

npm run build
./dist/cli/index.js run --query "Alan Turing" --lang en --sentences 8 --render storyboard

Full video generation (with MP4 output):

./dist/cli/index.js run --query "Alan Turing" --lang en --sentences 8 --render mp4

This will generate a complete video at out/<jobId>/render/video.mp4.

Interactive mode (prompts for missing options):

./dist/cli/index.js run

Example Output

Each run creates a job directory with all artifacts:

out/
└── <jobId>/
    ├── config.resolved.json    # Resolved configuration
    ├── content.json            # Processed content (sentences, keywords)
    ├── storyboard.json         # Timeline with asset references
    ├── assets/
    │   ├── images/
    │   │   ├── <imageId1>.jpg
    │   │   └── <imageId2>.jpg
    │   └── audio/
    │       ├── <sentenceId1>.mp3
    │       └── <sentenceId2>.mp3
    └── render/                  # Only if --render mp4
        ├── segments/
        │   ├── 0000.mp4        # Individual video segments
        │   ├── 0001.mp4
        │   └── concat.txt      # FFmpeg concat file
        └── video.mp4           # Final rendered video

CLI Commands

`run`

Run the video generation pipeline.

videomaker run [options]

Options:
  --source <source>      Content source (default: wikipedia)
  --query <query>        Search query
  --lang <lang>          Language code (en, pt, es, fr)
  --sentences <number>   Maximum sentences (1-50)
  --out <dir>            Output directory (default: ./out)
  --render <mode>        Render mode: storyboard or mp4 (default: storyboard)

`inspect`

Inspect a storyboard file.

videomaker inspect <storyboard.json>

`clean`

Clear the cache directory.

videomaker clean

`providers`

List available providers.

videomaker providers

Providers

Category	Provider	Description	API Key Required
Source	`wikipedia`	Wikipedia REST API	No
Processing	`keywords`	RAKE algorithm (local)	No
Images	`unsplash`	Unsplash API	Yes
TTS	`elevenlabs`	ElevenLabs API	Yes
Renderer	`ffmpeg`	FFmpeg video rendering	No

Narrative Generation Pipeline

The engine transforms raw knowledge sources into structured, validated narratives through a multi-stage pipeline with explicit guardrails and deterministic processing.

1. Knowledge Extraction

Uses MediaWiki API with prop=extracts for clean plaintext (no HTML parsing)
Fetches summary separately from REST API for introduction/hook
Filters out infobox content and structured data dumps
Never parses HTML - only uses official API endpoints

2. Sanitization & Validation

Content passes through multiple sanitization layers before segmentation:

Citation removal: Removes [1], [note 1], [citation needed] markers
HTML fragment removal: Strips Wikipedia UI elements (mw-, srcset, upload.wikimedia.org, etc.)
Section filtering: Removes "See also", "References", "External links", "Further reading" sections
Short line removal: Filters out lines shorter than 30 characters (unless ending with punctuation)
Whitespace normalization: Collapses multiple spaces, removes excessive parentheses

Invalid content is rejected at this stage, preventing downstream errors.

3. Segmentation

Content is split by paragraphs first, then into sentences with strict constraints:

Minimum length: 40 characters
Maximum length: 220 characters
Filters bad sentences:
- Too many numbers (>30% digits)
- Too many uppercase characters (>30% uppercase)
- Suspicious tokens (mw-, src=, id=, etc.)

4. Scene Construction

The system generates 6-8 structured scenes using one of two approaches:

Option A: LLM-Enhanced (Optional)

Uses GPT-4o-mini or GPT-3.5-turbo to create engaging scenes
Each scene includes:
- Narration: 1-2 short sentences (max 300 chars)
- Caption: Brief caption (max 100 chars)
- Visual Keywords: 3-5 keywords for image search
Scene structure:
1. Hook (impact statement)
2. Background (who + why important) 3-5. Main contributions 6+. Impact/legacy/modern relevance

Option B: Rule-based (Default)

Deterministic storytelling that operates without LLM dependencies
Uses summary for hook, top sentences for contributions
Produces structured scenes with narration, captions, and keywords
Guarantees consistent output regardless of API availability

LLM usage is an optional enhancement layer, not a dependency. The system is designed to function deterministically without external AI services.

5. Quality Validation

Before rendering, all scenes are validated against strict criteria:

No "click here" phrases
No Wikipedia UI elements
No HTML tags
Narration length ≤ 300 characters
No excessive whitespace
Caption length ≤ 100 characters
Visual keywords present

Scenes that fail validation are rejected, ensuring only high-quality content reaches the rendering stage.

Configuration

OpenAI Settings (optional, for enhanced script generation):

OPENAI_API_KEY: Your OpenAI API key
OPENAI_MODEL: Model to use (default: gpt-4o-mini)
SCRIPT_TEMPERATURE: Creativity level 0-2 (default: 0.7)
USE_LLM_FOR_SCRIPT: Enable/disable LLM (default: true)

If OpenAI is not configured, the system automatically falls back to rule-based scene generation.

Features

✅ Real Integrations: No mocks - all providers use real APIs
✅ Clean Narration: Advanced sanitization removes Wikipedia garbage
✅ Structured Scenes: 6-8 engaging scenes with proper storytelling flow
✅ LLM Support: Optional OpenAI integration for enhanced script quality
✅ Caching: Disk-based caching for expensive operations
✅ Type Safety: Full TypeScript with strict mode
✅ Modular: Provider-based architecture for easy extension
✅ CLI: Modern CLI with flags and interactive prompts
✅ Testing: Comprehensive test suite with Vitest
✅ CI/CD: GitHub Actions workflow
✅ Documentation: Architecture docs and examples

Development

# Install dependencies
npm install

# Development mode (watch)
npm run dev

# Build
npm run build

# Test
npm test

# Lint
npm run lint

# Format
npm run format

# Type check
npm run typecheck

Troubleshooting

FFmpeg Not Found

If you see ffmpeg not found error:

Verify FFmpeg is installed: Run ffmpeg -version in your terminal
Check PATH: Ensure FFmpeg is in your system PATH
Reinstall if needed: Follow platform-specific installation instructions above
Restart terminal: After installation, restart your terminal session

Video Rendering Fails

If video rendering fails with an error:

Check asset files: Ensure all image and audio files exist in out/<jobId>/assets/
Check FFmpeg version: Some older versions may have compatibility issues
Check disk space: Video rendering requires sufficient disk space
Review error message: FFmpeg error output is included in the error message for debugging

Missing Assets

If storyboard is empty or missing assets:

Check API keys: Ensure UNSPLASH_ACCESS_KEY and ELEVENLABS_API_KEY are set in .env
Check API quotas: Verify you haven't exceeded API rate limits
Review logs: Check the console output for specific error messages

Roadmap

Additional image providers (Pexels, Pixabay)
Additional TTS providers (Google Cloud TTS, AWS Polly)
Video templates and transitions
YouTube upload integration
Batch processing mode
Web UI

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 27 Commits
.github/workflows		.github/workflows
docs		docs
src		src
tests		tests
.env.example		.env.example
.eslintrc.json		.eslintrc.json
.gitignore		.gitignore
.nvmrc		.nvmrc
.prettierrc		.prettierrc
LICENSE		LICENSE
README.md		README.md
package-lock.json		package-lock.json
package.json		package.json
tsconfig.json		tsconfig.json
vitest.config.ts		vitest.config.ts

Folders and files

Latest commit

History

Repository files navigation

StorySynth Engine

Why

Conceptual Flow

Architecture

Engineering Principles

Quick Start

Prerequisites

Installation

Configuration

First Run

Example Output

CLI Commands

run

inspect

clean

providers

Providers

Narrative Generation Pipeline

1. Knowledge Extraction

2. Sanitization & Validation

3. Segmentation

4. Scene Construction

5. Quality Validation

Configuration

Features

Development

Troubleshooting

FFmpeg Not Found

Video Rendering Fails

Missing Assets

Roadmap

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

`run`

`inspect`

`clean`

`providers`

Packages