Skip to content

procopio420/StorySynth

Repository files navigation

StorySynth Engine

A modular, AI-native content-to-video pipeline for structured storytelling

StorySynth Engine is not a video automation tool. It is a deterministic, provider-based, multimodal generation system that bridges structured knowledge extraction, narrative construction, and audiovisual rendering. Built as a foundation for AI-native content systems, it transforms structured knowledge sources into coherent, reproducible video narratives through a validated, extensible pipeline.

The engine operates on a clear architectural principle: knowledge → facts → scenes → assets → timeline → video. Each stage is deterministic, validated, and provider-agnostic, enabling composition of production-grade storytelling pipelines without vendor lock-in or mock dependencies.

Why

Modern content generation systems face a fundamental challenge: bridging structured knowledge sources with narrative coherence and deterministic media output. Most solutions either rely on brittle, single-vendor pipelines or sacrifice determinism for flexibility.

StorySynth Engine addresses this by providing:

  • Structured knowledge extraction with validation and sanitization layers
  • Deterministic narrative construction that preserves factual accuracy while enabling storytelling flow
  • Provider-based extensibility that allows swapping components without architectural changes
  • Reproducible outputs through strict validation gates and caching strategies

The architecture is designed for composability: each provider is a discrete, testable unit. The pipeline enforces validation at every stage, ensuring that failures are caught early and outputs are consistent. This makes it suitable for production systems where reliability and extensibility matter more than quick demos.

Conceptual Flow

Knowledge Source
    ↓
[Extraction & Sanitization]
    ↓
Clean Facts
    ↓
[Segmentation & Structuring]
    ↓
Structured Scenes
    ↓
[Asset Generation]
    ↓
Images + Audio Assets
    ↓
[Timeline Assembly]
    ↓
Storyboard
    ↓
[Media Rendering]
    ↓
Video Output

Each stage is independently cacheable, testable, and replaceable. The pipeline enforces strict validation before advancing, ensuring that downstream stages receive only validated inputs.

Architecture

┌─────────────┐
│   CLI       │  Commander + @inquirer/prompts
└──────┬──────┘
       │
       ▼
┌─────────────┐
│  Pipeline   │  Step orchestration with caching & validation
└──────┬──────┘
       │
       ├──► Wikipedia Provider ──► Cache
       ├──► Sanitize & Segment ──► Validation
       ├──► Keywords (RAKE) ──► Cache
       ├──► Images (Unsplash) ──► Download & Cache
       ├──► TTS (ElevenLabs) ──► Generate & Cache
       ├──► Storyboard Generator ──► Validation
       └──► Renderer (FFmpeg) ──► Video Output

The pipeline orchestrates discrete providers through a step-based execution model. Each step can be cached, validated, and independently tested. Providers are registered through a plugin system, enabling runtime composition of different content sources, processing algorithms, and output formats.

Engineering Principles

  • Deterministic pipeline: Given the same inputs and configuration, the pipeline produces identical outputs. Caching ensures reproducibility.
  • Provider-based extensibility: New content sources, processing algorithms, or renderers can be added without modifying core pipeline logic.
  • Strict validation: Every stage validates its inputs and outputs. Invalid data is rejected before advancing to the next stage.
  • Reproducible outputs: Caching strategies and deterministic algorithms ensure that runs are consistent and debuggable.

Quick Start

Prerequisites

  • Node.js 20+

  • FFmpeg (required for video rendering with --render mp4)

    Install FFmpeg based on your platform:

    • macOS: brew install ffmpeg
    • Ubuntu/Debian: sudo apt-get install ffmpeg
    • Arch Linux: sudo pacman -S ffmpeg
    • Windows: choco install ffmpeg or download from ffmpeg.org

    Verify installation: ffmpeg -version

Installation

npm install
npm run build

Configuration

  1. Copy .env.example to .env
  2. Add your API keys:
cp .env.example .env
# Edit .env with your API keys

First Run

Storyboard only (no video):

npm run build
./dist/cli/index.js run --query "Alan Turing" --lang en --sentences 8 --render storyboard

Full video generation (with MP4 output):

./dist/cli/index.js run --query "Alan Turing" --lang en --sentences 8 --render mp4

This will generate a complete video at out/<jobId>/render/video.mp4.

Interactive mode (prompts for missing options):

./dist/cli/index.js run

Example Output

Each run creates a job directory with all artifacts:

out/
└── <jobId>/
    ├── config.resolved.json    # Resolved configuration
    ├── content.json            # Processed content (sentences, keywords)
    ├── storyboard.json         # Timeline with asset references
    ├── assets/
    │   ├── images/
    │   │   ├── <imageId1>.jpg
    │   │   └── <imageId2>.jpg
    │   └── audio/
    │       ├── <sentenceId1>.mp3
    │       └── <sentenceId2>.mp3
    └── render/                  # Only if --render mp4
        ├── segments/
        │   ├── 0000.mp4        # Individual video segments
        │   ├── 0001.mp4
        │   └── concat.txt      # FFmpeg concat file
        └── video.mp4           # Final rendered video

CLI Commands

run

Run the video generation pipeline.

videomaker run [options]

Options:
  --source <source>      Content source (default: wikipedia)
  --query <query>        Search query
  --lang <lang>          Language code (en, pt, es, fr)
  --sentences <number>   Maximum sentences (1-50)
  --out <dir>            Output directory (default: ./out)
  --render <mode>        Render mode: storyboard or mp4 (default: storyboard)

inspect

Inspect a storyboard file.

videomaker inspect <storyboard.json>

clean

Clear the cache directory.

videomaker clean

providers

List available providers.

videomaker providers

Providers

Category Provider Description API Key Required
Source wikipedia Wikipedia REST API No
Processing keywords RAKE algorithm (local) No
Images unsplash Unsplash API Yes
TTS elevenlabs ElevenLabs API Yes
Renderer ffmpeg FFmpeg video rendering No

Narrative Generation Pipeline

The engine transforms raw knowledge sources into structured, validated narratives through a multi-stage pipeline with explicit guardrails and deterministic processing.

1. Knowledge Extraction

  • Uses MediaWiki API with prop=extracts for clean plaintext (no HTML parsing)
  • Fetches summary separately from REST API for introduction/hook
  • Filters out infobox content and structured data dumps
  • Never parses HTML - only uses official API endpoints

2. Sanitization & Validation

Content passes through multiple sanitization layers before segmentation:

  • Citation removal: Removes [1], [note 1], [citation needed] markers
  • HTML fragment removal: Strips Wikipedia UI elements (mw-, srcset, upload.wikimedia.org, etc.)
  • Section filtering: Removes "See also", "References", "External links", "Further reading" sections
  • Short line removal: Filters out lines shorter than 30 characters (unless ending with punctuation)
  • Whitespace normalization: Collapses multiple spaces, removes excessive parentheses

Invalid content is rejected at this stage, preventing downstream errors.

3. Segmentation

Content is split by paragraphs first, then into sentences with strict constraints:

  • Minimum length: 40 characters
  • Maximum length: 220 characters
  • Filters bad sentences:
    • Too many numbers (>30% digits)
    • Too many uppercase characters (>30% uppercase)
    • Suspicious tokens (mw-, src=, id=, etc.)

4. Scene Construction

The system generates 6-8 structured scenes using one of two approaches:

Option A: LLM-Enhanced (Optional)

  • Uses GPT-4o-mini or GPT-3.5-turbo to create engaging scenes
  • Each scene includes:
    • Narration: 1-2 short sentences (max 300 chars)
    • Caption: Brief caption (max 100 chars)
    • Visual Keywords: 3-5 keywords for image search
  • Scene structure:
    1. Hook (impact statement)
    2. Background (who + why important) 3-5. Main contributions 6+. Impact/legacy/modern relevance

Option B: Rule-based (Default)

  • Deterministic storytelling that operates without LLM dependencies
  • Uses summary for hook, top sentences for contributions
  • Produces structured scenes with narration, captions, and keywords
  • Guarantees consistent output regardless of API availability

LLM usage is an optional enhancement layer, not a dependency. The system is designed to function deterministically without external AI services.

5. Quality Validation

Before rendering, all scenes are validated against strict criteria:

  • No "click here" phrases
  • No Wikipedia UI elements
  • No HTML tags
  • Narration length ≤ 300 characters
  • No excessive whitespace
  • Caption length ≤ 100 characters
  • Visual keywords present

Scenes that fail validation are rejected, ensuring only high-quality content reaches the rendering stage.

Configuration

OpenAI Settings (optional, for enhanced script generation):

  • OPENAI_API_KEY: Your OpenAI API key
  • OPENAI_MODEL: Model to use (default: gpt-4o-mini)
  • SCRIPT_TEMPERATURE: Creativity level 0-2 (default: 0.7)
  • USE_LLM_FOR_SCRIPT: Enable/disable LLM (default: true)

If OpenAI is not configured, the system automatically falls back to rule-based scene generation.

Features

  • Real Integrations: No mocks - all providers use real APIs
  • Clean Narration: Advanced sanitization removes Wikipedia garbage
  • Structured Scenes: 6-8 engaging scenes with proper storytelling flow
  • LLM Support: Optional OpenAI integration for enhanced script quality
  • Caching: Disk-based caching for expensive operations
  • Type Safety: Full TypeScript with strict mode
  • Modular: Provider-based architecture for easy extension
  • CLI: Modern CLI with flags and interactive prompts
  • Testing: Comprehensive test suite with Vitest
  • CI/CD: GitHub Actions workflow
  • Documentation: Architecture docs and examples

Development

# Install dependencies
npm install

# Development mode (watch)
npm run dev

# Build
npm run build

# Test
npm test

# Lint
npm run lint

# Format
npm run format

# Type check
npm run typecheck

Troubleshooting

FFmpeg Not Found

If you see ffmpeg not found error:

  1. Verify FFmpeg is installed: Run ffmpeg -version in your terminal
  2. Check PATH: Ensure FFmpeg is in your system PATH
  3. Reinstall if needed: Follow platform-specific installation instructions above
  4. Restart terminal: After installation, restart your terminal session

Video Rendering Fails

If video rendering fails with an error:

  1. Check asset files: Ensure all image and audio files exist in out/<jobId>/assets/
  2. Check FFmpeg version: Some older versions may have compatibility issues
  3. Check disk space: Video rendering requires sufficient disk space
  4. Review error message: FFmpeg error output is included in the error message for debugging

Missing Assets

If storyboard is empty or missing assets:

  1. Check API keys: Ensure UNSPLASH_ACCESS_KEY and ELEVENLABS_API_KEY are set in .env
  2. Check API quotas: Verify you haven't exceeded API rate limits
  3. Review logs: Check the console output for specific error messages

Roadmap

  • Additional image providers (Pexels, Pixabay)
  • Additional TTS providers (Google Cloud TTS, AWS Polly)
  • Video templates and transitions
  • YouTube upload integration
  • Batch processing mode
  • Web UI

License

MIT

About

Modular multimodal content pipeline for transforming structured knowledge into narrated cinematic short-form video.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors