A multi-agent system for generating MIDI backing tracks.
AI musician and engineer agents collaborate to produce complete, genre-aware songs with drums, bass, guitar, and keys.
- π₯ Drummer Agent β genre-aware patterns for 22 genres + all 40 PAS Standard Drum Rudiments
- πΈ Bassist Agent β root-note bass lines with 8 genre-specific bass patterns (walking, slap, tumbao, β¦)
- πΈ Guitarist Agent β rhythm/lead guitar parts and power chords
- πΉ Keyboardist Agent β chord voicings, pads, and arpeggios
- π» Strings Agent β legato lines, pizzicato, and tremolo for strings/violin
- πΊ Brass Agent β stabs, long tones, and fall-offs for brass/trumpet/saxophone
- ποΈ Synth Agent β sustained pads and arpeggio patterns for synthesizers
- πͺ Percussion Agent β Latin/Afro-Cuban hand drum patterns (conga, bongo, djembe)
- πΈ Lead Guitar Agent β pentatonic licks and scale fills
- ποΈ Mixer & Mastering Agents β per-track volume, pan, and loudness metadata
- π€ LLM MIDI generation β
LLMMidiProviderconverts LLM JSON output to MIDI; falls back to algorithmic generation on parse failure - π€ LLM-guided generation β plug in OpenAI, Anthropic, or any LangChain provider
- ποΈ DAW integration β FluidSynth, TiMidity, GarageBand, Logic Pro, and raw MIDI/WAV export
- π REST API β FastAPI server for programmatic session management
- π Multi-provider system β pluggable
AudioProviderbackends with capability-based routing (ProviderRegistry) - π€ Google Gemini integration β full-length music generation via Lyria 3, audio analysis, and text-to-speech
- π οΈ MCP Server β expose backing-track generation as MCP tools for GitHub Copilot, Claude Code, and other AI coding assistants
- π₯οΈ Web UI β lightweight browser-based interface served alongside the REST API
# Core install (MIDI generation only)
pip install -e "."
# With development tools
pip install -e ".[dev]"
# With REST API server
pip install -e ".[api]"
# With LLM providers (OpenAI / Anthropic)
pip install -e ".[llm]"
# With Google Gemini (Lyria 3 music generation, audio analysis, TTS)
pip install -e ".[gemini]"
# With audio processing (pydub β WAV/MP3 manipulation)
pip install -e ".[audio]"
# Everything
pip install -e ".[all]"Requirements: Python 3.11+
# Generate a classic rock backing track in E minor at 120 BPM
python scripts/generate_demo.py --genre classic_rock --key E --mode minor --tempo 120 -v
# Pop song in C major with keyboard included
python scripts/generate_demo.py --genre pop --key C --mode major --with-keys
# Blues in A minor rendered to WAV via FluidSynth
python scripts/generate_demo.py --genre blues --key A --mode minor --render-audio --backend fluidsynth
# Jazz in D dorian with full band (strings, brass, synth)
python scripts/generate_demo.py --genre jazz --key D --mode dorian --with-keys -v
# Funk groove in E minor at 105 BPM
python scripts/generate_demo.py --genre funk --key E --mode minor --tempo 105 -v
# Metal in E minor at 200 BPM
python scripts/generate_demo.py --genre metal --key E --mode minor --tempo 200 -v
# Thrash metal rhythm section at 200 BPM (stress-test script)
python scripts/generate_thrash.py
# Download and configure a SoundFont for audio rendering
python scripts/setup_soundfont.py
# Start the REST API server
pip install -e ".[api]"
python scripts/run_dev.pyOutput files land in ./output/ by default, named <session-id>_<instrument>.mid plus a combined <session-id>_full.mid.
Tracks are generated sequentially β drums first, then bass, guitar, lead guitar, keys, strings, brass, synth, and percussion β so each agent can react to what came before. The orchestrator manages chord progressions per section and coordinates file export.
SessionOrchestrator
βββ DrummerAgent β drum pattern generation (22 genres, 40 rudiments)
βββ BassistAgent β bass line locked to drums + chords
βββ GuitaristAgent β rhythm/lead guitar parts
βββ LeadGuitarAgent β pentatonic licks and scale fills
βββ KeyboardistAgent β chord voicings and pads
βββ StringsAgent β legato strings / pizzicato / tremolo
βββ BrassAgent β brass stabs, long tones, fall-offs
βββ SynthAgent β sustained pads, arpeggio patterns
βββ PercussionAgent β Latin/Afro-Cuban hand drum patterns
βββ MixerAgent β volume, pan, EQ per track
βββ MasteringAgent β final loudness and metadata
See the full architecture docs for Mermaid diagrams, data flow, and DAW integration tiers.
| Agent | Role | Output |
|---|---|---|
DrummerAgent |
Kick, snare, hi-hat patterns (22 genres, 40 rudiments) | MidiTrackData |
BassistAgent |
Root-note bass lines following chord changes | MidiTrackData |
GuitaristAgent |
Rhythm guitar / power chords | MidiTrackData |
LeadGuitarAgent |
Pentatonic licks, scale runs, string bends | MidiTrackData |
KeyboardistAgent |
Chord pads and voicings | MidiTrackData |
StringsAgent |
Legato strings, pizzicato, tremolo | MidiTrackData |
BrassAgent |
Stabs, long tones, fall-offs (trumpet/sax/brass) | MidiTrackData |
SynthAgent |
Sustained pads and arpeggio patterns | MidiTrackData |
PercussionAgent |
Latin/Afro-Cuban hand drum patterns (conga/bongo/djembe) | MidiTrackData |
MixerAgent |
Per-track volume, pan, EQ | MixConfig |
MasteringAgent |
Final processing metadata | dict |
All agents accept an optional llm parameter for LLM-guided generation.
python scripts/generate_demo.py [OPTIONS]
Options:
--genre Genre preset (classic_rock, blues, pop, folk, country, punk, hard_rock,
jazz, funk, reggae, soul, rnb, metal, hip_hop, latin,
bossa_nova, electronic, house, ambient, gospel, swing, bebop)
--key Root note (C, C#, D, D#, E, F, F#, G, G#, A, A#, B)
--mode Scale mode (major, minor, dorian, mixolydian, phrygian, lydian, locrian,
harmonic_minor, melodic_minor, whole_tone, diminished, bebop_dominant)
--tempo BPM β 40 to 300 (default: 120)
--output Output directory (default: ./output)
--sections Number of song sections (default: 4)
--render-audio Render WAV via audio backend
--backend Audio backend: export | fluidsynth | timidity (default: export)
--with-keys Include keyboard in the band
-v, --verbose Enable debug logging
The MCP server exposes backing-track generation as tools for AI coding assistants (GitHub Copilot, Claude Code, etc.).
# Install with MCP support (included in core)
pip install -e "."
# Run the MCP server (stdio transport)
audio-engineer-mcp
# or: python -m audio_engineer.mcp_server| Tool | Description |
|---|---|
generate_track |
Generate a full MIDI backing track (genre, key, tempo, instruments, sections) |
generate_game_music |
Quick game music via mood preset (battle, exploration, town, boss, β¦) |
generate_audio_track |
Route a track request through the provider registry (MIDI or Gemini Lyria) |
list_genres |
List all supported genre presets |
list_game_moods |
List all game music mood presets with descriptions |
list_providers |
List registered audio providers with capabilities and availability |
Set AUDIO_ENGINEER_OUTPUT to control where the MCP server writes files.
See the MCP Server guide for full details.
The ProviderRegistry routes generation requests to the best available backend:
LLMMidiProviderβ LLM-driven MIDI generation (highest priority when an LLM callable is configured; falls back toMidiProvideron parse failure)MidiProviderβ algorithmic MIDI generation (always available, zero dependencies)GeminiLyriaProviderβ full-length audio via Google Lyria 3 (requirespip install -e ".[gemini]"andAUDIO_ENGINEER_GEMINI_API_KEY)
Custom providers can be registered at runtime:
from audio_engineer.providers import ProviderRegistry, AudioProvider, ProviderCapability
registry = ProviderRegistry()
registry.register(my_custom_provider)Pass any callable (str) -> str as the llm parameter of LLMMidiProvider or SessionOrchestrator:
from audio_engineer.providers.llm_midi_provider import LLMMidiProvider
provider = LLMMidiProvider(llm=lambda prompt: openai_client.complete(prompt))
result = provider.generate_track(request) # falls back to MidiProvider if JSON unparseableSee the Providers guide for details.
The audio_engineer.gemini package wraps the Google GenAI SDK for three capabilities:
| Agent | Class | What it does |
|---|---|---|
| Music generation | MusicGenerationAgent |
Full songs or 30 s clips via Lyria 3 (clip or pro model) |
| Audio analysis | AudioAnalysisAgent |
Transcription, genre/mood detection, audio Q&A |
| Text-to-speech | TTSAgent |
Narration and vocal scratch tracks |
pip install -e ".[gemini]"
export AUDIO_ENGINEER_GEMINI_API_KEY=your_key_hereSee the Gemini guide for usage examples.
Start the dev server: python scripts/run_dev.py (requires pip install -e ".[api]")
| Method | Path | Description |
|---|---|---|
POST |
/sessions |
Create a new session |
POST |
/sessions/{id}/run |
Run the session (trigger generation) |
GET |
/sessions/{id} |
Get session status and output file paths |
GET |
/sessions/{id}/tracks |
List generated tracks |
GET |
/sessions/{id}/export |
Download the combined MIDI file |
Interactive API docs: http://localhost:8000/docs
All settings use the AUDIO_ENGINEER_ environment variable prefix (or a .env file in the project root).
| Variable | Description | Default |
|---|---|---|
AUDIO_ENGINEER_OUTPUT_DIR |
Directory for generated files | ./output |
AUDIO_ENGINEER_LOG_LEVEL |
Logging verbosity | INFO |
AUDIO_ENGINEER_OPENAI_API_KEY |
OpenAI key for LLM agents | (unset) |
AUDIO_ENGINEER_ANTHROPIC_API_KEY |
Anthropic key for LLM agents | (unset) |
AUDIO_ENGINEER_GEMINI_API_KEY |
Google Gemini API key (Lyria 3 / audio analysis) | (unset) |
AUDIO_ENGINEER_LLM_PROVIDER |
Active LLM backend: openai, anthropic, gemini |
openai |
AUDIO_ENGINEER_DEFAULT_AUDIO_PROVIDER |
Default audio generation provider | midi_engine |
AUDIO_ENGINEER_SOUNDFONT_PATH |
Path to a .sf2 SoundFont for FluidSynth rendering |
(unset) |
AUDIO_ENGINEER_HOST |
FastAPI server host | 0.0.0.0 |
AUDIO_ENGINEER_PORT |
FastAPI server port | 8000 |
# Install dev dependencies
pip install -e ".[dev]"
# Run tests
pytest -v
# Lint
ruff check src/ tests/ scripts/
# Run a quick smoke test
python scripts/generate_demo.py --genre blues --key A --mode minor -vsrc/audio_engineer/
βββ agents/
β βββ base.py # BaseMusician, BaseEngineer, SessionContext
β βββ orchestrator.py # SessionOrchestrator
β βββ musician/ # DrummerAgent, BassistAgent, GuitaristAgent, KeyboardistAgent,
β β # StringsAgent, BrassAgent, SynthAgent, PercussionAgent, LeadGuitarAgent
β βββ engineer/ # MixerAgent, MasteringAgent
βββ core/
β βββ models.py # Pydantic models (Session, MidiTrackData, Genre, Instrument, Mode, β¦)
β βββ music_theory.py # Scales, chords, progressions (ProgressionFactory with 15+ methods)
β βββ midi_engine.py # MIDI file construction
β βββ patterns.py # Pattern library (DrumRudiment, BassPattern, MelodicPattern, 40+ drum patterns)
β βββ rhythm.py # Rhythmic utilities
β βββ audio_track.py # AudioTrack model for provider results
β βββ track_composer.py # Higher-level track composition helpers
β βββ llm_prompts.py # LLM prompt builder, JSON parser, event validator
β βββ constants.py # TICKS_PER_BEAT, all 128 GM programs, all 47 GM drum sounds, scales, chords
βββ providers/ # Multi-provider audio generation system
β βββ base.py # AudioProvider ABC, TrackRequest/Result, ProviderCapability
β βββ registry.py # ProviderRegistry with capability-based routing
β βββ midi_provider.py # MidiProvider (algorithmic, zero-dependency)
β βββ llm_midi_provider.py # LLMMidiProvider (LLM-driven MIDI generation with fallback)
β βββ gemini_provider.py # GeminiLyriaProvider (Lyria 3 audio generation)
βββ gemini/ # Google Gemini AI integration
β βββ client.py # GeminiClient singleton wrapper
β βββ music_gen.py # MusicGenerationAgent (Lyria 3 clip & pro)
β βββ audio_analysis.py # AudioAnalysisAgent
β βββ tts.py # TTSAgent (text-to-speech)
βββ daw/ # Audio backends (FluidSynth, TiMidity, GarageBand, Logic Pro)
βββ api/ # FastAPI application and route handlers
βββ ui/ # Static web interface (served by the API)
βββ mcp_server.py # MCP server entry point (audio-engineer-mcp)
βββ config/ # Settings and logging configuration
Full documentation is published at https://ianlintner.github.io/audio_engineer/.
Topics include:
- Installation & Setup
- Quick Start Guide
- CLI Reference
- REST API Reference
- MCP Server
- Multi-Provider System
- Gemini Integration
- Architecture
- Agent Guide
- Music Theory Internals
- DAW Integration
- Contributing
Contributions are welcome! See CONTRIBUTING.md for guidelines.
Key rules:
- Keep orchestration order stable: drums β bass β guitar β keys
- Prefer deterministic defaults; isolate randomness behind explicit seeds
- Add or update tests for non-trivial behavior changes
- Keep PRs focused β no unrelated refactors