This project is an innovative application that transforms images into audio stories. Utilizing cutting-edge AI models, it captions images, crafts stories based on those captions, and converts the text to speech, offering a unique auditory experience from visual inputs.
- Image Captioning: Leverages a pre-trained model to describe images.
- Story Generation: Creates short, engaging narratives based on image descriptions.
- Text-to-Speech Conversion: Transforms generated stories into audio format using Hugging Face's API.
- Streamlit Web App: Provides an interactive interface for users to upload images and receive audio stories.
- Python
- Streamlit
- Hugging Face Transformers
- Langchain
- dotenv for environment management
- Install dependencies:
pip install -r requirements.txt - Rename
.env.exampleto.envand update it with your API Keys. - Run the Streamlit app:
streamlit run describe_image.py
Check out our project in action! Uploaded a demonstration video showcasing how the Image to Audio Story Converter works, from uploading an image to hearing the generated audio story. Watch the video to see the app's features and capabilities firsthand.
Kanye.mp4
Contributions are welcome! Feel free to fork the repo and submit pull requests.
This project is open-sourced under the MIT license.