MediaMCP is a powerful Model Context Protocol (MCP) server designed for high-performance, local AI media generation. It provides a seamless interface for LLMs to generate and edit images, as well as compose and cover music using local backend services.
- 🖼️ Image Generation: Generate high-fidelity images using the Flux model via
stable-diffusion.cpp. - 🎨 Image Editing: Perform image-to-image transformations and edits with text guidance.
- 🎵 Music Composition: Create complete songs with vocals using ACE Step.
- 🎤 Cover Generation: Transform existing audio into new styles or voices (voice conversion).
- 🚀 FastMCP Powered: Built on the modern FastMCP framework for low-latency, scalable tool execution.
- 🔌 Standardized Interface: Exposes a clean API for any MCP-compliant client (like Claude Desktop or custom agents).
- Python 3.11 or higher
- Access to local media generation backends:
stable-diffusion.cpp(or compatible API) for images.ACE Step CPP(or compatible API) for music.
-
Clone the repository:
git clone https://github.com/haervwe/media-mcp.git cd media-mcp -
Create and activate a virtual environment:
python -m venv venv source venv/bin/activate # On Windows: venv\Scripts\activate
-
Install dependencies:
pip install -e . -
Configure environment variables: Copy the example environment file and fill in your API keys and endpoints:
cp .env.example .env
Edit
.envto match your local setup.
Start the MediaMCP server:
python -m media_mcp.serverThe server will start on the host and port specified in your .env (default: 0.0.0.0:8080) using the streamable-http transport.
| Tool | Description | Key Parameters |
|---|---|---|
generate_image |
Generate a new image from text. | prompt, format |
edit_image |
Edit an existing image. | image (path/b64), prompt, format |
generate_song |
Create a full song with vocals. | prompt, lyrics, language, key |
generate_cover |
Create a cover of an audio file. | audio (path/b64), style_prompt, strength |
MediaMCP is highly configurable via environment variables in the .env file:
IMAGE_API_BASE_URL: Endpoint for your image generation service.MUSIC_API_BASE_URL: Endpoint for your music generation service.ASSETS_DIR: Local directory where generated media files are stored.RESPONSE_FORMAT: Choose betweenpath(file system path) orbase64(inline content).REQUEST_TIMEOUT: Timeout for long-running generation tasks (default: 300s).
This project is licensed under the MIT License - see the LICENSE file for details.
Developed with ❤️ for the AI community.