A comprehensive pipeline that converts text prompts into engaging short-form videos with voiceovers and captions.
- Script Generation: Create engaging scripts using AI models (DDC or Gemini)
- Voiceover Synthesis: Generate natural-sounding voiceovers
- Image Generation: Create images from script segments
- Video Creation: Assemble images into a video with animations
- Caption Generation: Add automatically generated captions in different styles
- Progress Tracking: Monitor the generation process with a visual progress indicator
python-dotenv==1.0.1
requests==2.32.3
openai==1.0.0
google-generativeai==0.8.4
whisperx==3.3.1
moviepy==1.0.3
ffmpeg-python==0.2.0
python-json-logger>=2.0.7
tqdm>=4.66.1
pillow>=10.0.0
numpy>=1.24.0
torch>=2.0.0
edge-tts>=6.1.9
- Clone the repository
- Create and activate a virtual environment:
# On Windows python -m venv venv .\venv\Scripts\activate # On macOS/Linux python -m venv venv source venv/bin/activate - Install dependencies:
pip install -r requirements.txt - Set up your API keys in
.env:DDC_API_KEY=your_ddc_api_key GEMINI_API_KEY=your_gemini_api_key - Run the generator:
python main.py --topic "Your topic here"
--topic,-t: Topic for video generation--output,-o: Output filename (default: "output_video.mp4")--no-captions: Skip caption generation--debug: Enable debug logging--skip-cleanup: Skip cleanup of temporary files
├── main.py # Main entry point
├── config.py # Configuration settings
├── progress_tracker.py # Progress tracking utilities
├── .env # Environment variables (API keys)
├── requirements.txt # Project dependencies
├── Models/ # Model implementations
│ ├── Script/ # Script generation models
│ ├── Voiceover/ # Voiceover generation models
│ ├── Image/ # Image generation models
│ ├── Video/ # Video creation models
│ ├── Captions/ # Caption generation models
│ └── Animations/ # Animation effect models
└── Data/ # Data storage
└── Temp/ # Temporary files during generation
You can customize the generation pipeline by modifying config.py:
SCRIPT_MODEL: Model for script generation (e.g., "ddc")IMG_MODEL: Model for image generation (e.g., "pixelmuse")AUDIO_MODEL: Model for voiceover generation (e.g., "openfm")AUDIO_MODEL_VOICE: Voice to use for voiceover (e.g., "shimmer")ANIMATION: Animation style (e.g., "zoom_fade_mix")VIDEO_MODEL: Video creation model (e.g., "moviepy")CAPTION_MODEL: Caption generation model (e.g., "whisperx")CAPTION_STYLE: Caption style (e.g., "comic_style")
- Script Generation: Create an engaging script based on the input topic
- Voiceover Generation: Convert the script to audio using text-to-speech
- Image Preparation: Format the script for image generation
- Image Generation: Create visuals for each script segment
- Video Creation: Assemble images with animations and audio
- Caption Generation: Add captions to the video
- Cleanup: Remove temporary files
You can extend the project by:
- Adding new script generation models in
Models/Script/Models/ - Adding new voiceover models in
Models/Voiceover/Models/ - Adding new image generation models in
Models/Image/Models/ - Adding new video creation models in
Models/Video/Models/ - Adding new caption models in
Models/Captions/Models/ - Adding new animation styles in
Models/Animations/Models/
This project is open source and available for personal and commercial use.
Contributions are welcome! Feel free to submit pull requests or open issues to improve the project.
Developed with 💘 by Krish.