Skip to content

ianlintner/audio_engineer

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

38 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

🎡 AI Music Studio

A multi-agent system for generating MIDI backing tracks.
AI musician and engineer agents collaborate to produce complete, genre-aware songs with drums, bass, guitar, and keys.

CI Docs Python 3.11+ License: MIT Ruff


✨ Features

  • πŸ₯ Drummer Agent β€” genre-aware patterns for 22 genres + all 40 PAS Standard Drum Rudiments
  • 🎸 Bassist Agent β€” root-note bass lines with 8 genre-specific bass patterns (walking, slap, tumbao, …)
  • 🎸 Guitarist Agent β€” rhythm/lead guitar parts and power chords
  • 🎹 Keyboardist Agent β€” chord voicings, pads, and arpeggios
  • 🎻 Strings Agent β€” legato lines, pizzicato, and tremolo for strings/violin
  • 🎺 Brass Agent β€” stabs, long tones, and fall-offs for brass/trumpet/saxophone
  • πŸŽ›οΈ Synth Agent β€” sustained pads and arpeggio patterns for synthesizers
  • πŸͺ˜ Percussion Agent β€” Latin/Afro-Cuban hand drum patterns (conga, bongo, djembe)
  • 🎸 Lead Guitar Agent β€” pentatonic licks and scale fills
  • 🎚️ Mixer & Mastering Agents β€” per-track volume, pan, and loudness metadata
  • πŸ€– LLM MIDI generation β€” LLMMidiProvider converts LLM JSON output to MIDI; falls back to algorithmic generation on parse failure
  • πŸ€– LLM-guided generation β€” plug in OpenAI, Anthropic, or any LangChain provider
  • πŸŽ›οΈ DAW integration β€” FluidSynth, TiMidity, GarageBand, Logic Pro, and raw MIDI/WAV export
  • 🌐 REST API β€” FastAPI server for programmatic session management
  • πŸ”Œ Multi-provider system β€” pluggable AudioProvider backends with capability-based routing (ProviderRegistry)
  • πŸ€– Google Gemini integration β€” full-length music generation via Lyria 3, audio analysis, and text-to-speech
  • πŸ› οΈ MCP Server β€” expose backing-track generation as MCP tools for GitHub Copilot, Claude Code, and other AI coding assistants
  • πŸ–₯️ Web UI β€” lightweight browser-based interface served alongside the REST API

πŸ“¦ Installation

# Core install (MIDI generation only)
pip install -e "."

# With development tools
pip install -e ".[dev]"

# With REST API server
pip install -e ".[api]"

# With LLM providers (OpenAI / Anthropic)
pip install -e ".[llm]"

# With Google Gemini (Lyria 3 music generation, audio analysis, TTS)
pip install -e ".[gemini]"

# With audio processing (pydub β€” WAV/MP3 manipulation)
pip install -e ".[audio]"

# Everything
pip install -e ".[all]"

Requirements: Python 3.11+


πŸš€ Quick Start

# Generate a classic rock backing track in E minor at 120 BPM
python scripts/generate_demo.py --genre classic_rock --key E --mode minor --tempo 120 -v

# Pop song in C major with keyboard included
python scripts/generate_demo.py --genre pop --key C --mode major --with-keys

# Blues in A minor rendered to WAV via FluidSynth
python scripts/generate_demo.py --genre blues --key A --mode minor --render-audio --backend fluidsynth

# Jazz in D dorian with full band (strings, brass, synth)
python scripts/generate_demo.py --genre jazz --key D --mode dorian --with-keys -v

# Funk groove in E minor at 105 BPM
python scripts/generate_demo.py --genre funk --key E --mode minor --tempo 105 -v

# Metal in E minor at 200 BPM
python scripts/generate_demo.py --genre metal --key E --mode minor --tempo 200 -v

# Thrash metal rhythm section at 200 BPM (stress-test script)
python scripts/generate_thrash.py

# Download and configure a SoundFont for audio rendering
python scripts/setup_soundfont.py

# Start the REST API server
pip install -e ".[api]"
python scripts/run_dev.py

Output files land in ./output/ by default, named <session-id>_<instrument>.mid plus a combined <session-id>_full.mid.


πŸ—οΈ Architecture

Tracks are generated sequentially β€” drums first, then bass, guitar, lead guitar, keys, strings, brass, synth, and percussion β€” so each agent can react to what came before. The orchestrator manages chord progressions per section and coordinates file export.

SessionOrchestrator
β”œβ”€β”€ DrummerAgent      β†’ drum pattern generation (22 genres, 40 rudiments)
β”œβ”€β”€ BassistAgent      β†’ bass line locked to drums + chords
β”œβ”€β”€ GuitaristAgent    β†’ rhythm/lead guitar parts
β”œβ”€β”€ LeadGuitarAgent   β†’ pentatonic licks and scale fills
β”œβ”€β”€ KeyboardistAgent  β†’ chord voicings and pads
β”œβ”€β”€ StringsAgent      β†’ legato strings / pizzicato / tremolo
β”œβ”€β”€ BrassAgent        β†’ brass stabs, long tones, fall-offs
β”œβ”€β”€ SynthAgent        β†’ sustained pads, arpeggio patterns
β”œβ”€β”€ PercussionAgent   β†’ Latin/Afro-Cuban hand drum patterns
β”œβ”€β”€ MixerAgent        β†’ volume, pan, EQ per track
└── MasteringAgent    β†’ final loudness and metadata

See the full architecture docs for Mermaid diagrams, data flow, and DAW integration tiers.


πŸ€– Agents

Agent Role Output
DrummerAgent Kick, snare, hi-hat patterns (22 genres, 40 rudiments) MidiTrackData
BassistAgent Root-note bass lines following chord changes MidiTrackData
GuitaristAgent Rhythm guitar / power chords MidiTrackData
LeadGuitarAgent Pentatonic licks, scale runs, string bends MidiTrackData
KeyboardistAgent Chord pads and voicings MidiTrackData
StringsAgent Legato strings, pizzicato, tremolo MidiTrackData
BrassAgent Stabs, long tones, fall-offs (trumpet/sax/brass) MidiTrackData
SynthAgent Sustained pads and arpeggio patterns MidiTrackData
PercussionAgent Latin/Afro-Cuban hand drum patterns (conga/bongo/djembe) MidiTrackData
MixerAgent Per-track volume, pan, EQ MixConfig
MasteringAgent Final processing metadata dict

All agents accept an optional llm parameter for LLM-guided generation.


πŸ–₯️ CLI Reference

python scripts/generate_demo.py [OPTIONS]

Options:
  --genre         Genre preset (classic_rock, blues, pop, folk, country, punk, hard_rock,
                               jazz, funk, reggae, soul, rnb, metal, hip_hop, latin,
                               bossa_nova, electronic, house, ambient, gospel, swing, bebop)
  --key           Root note (C, C#, D, D#, E, F, F#, G, G#, A, A#, B)
  --mode          Scale mode (major, minor, dorian, mixolydian, phrygian, lydian, locrian,
                              harmonic_minor, melodic_minor, whole_tone, diminished, bebop_dominant)
  --tempo         BPM β€” 40 to 300 (default: 120)
  --output        Output directory (default: ./output)
  --sections      Number of song sections (default: 4)
  --render-audio  Render WAV via audio backend
  --backend       Audio backend: export | fluidsynth | timidity (default: export)
  --with-keys     Include keyboard in the band
  -v, --verbose   Enable debug logging

πŸ› οΈ MCP Server

The MCP server exposes backing-track generation as tools for AI coding assistants (GitHub Copilot, Claude Code, etc.).

# Install with MCP support (included in core)
pip install -e "."

# Run the MCP server (stdio transport)
audio-engineer-mcp
# or: python -m audio_engineer.mcp_server

Available MCP Tools

Tool Description
generate_track Generate a full MIDI backing track (genre, key, tempo, instruments, sections)
generate_game_music Quick game music via mood preset (battle, exploration, town, boss, …)
generate_audio_track Route a track request through the provider registry (MIDI or Gemini Lyria)
list_genres List all supported genre presets
list_game_moods List all game music mood presets with descriptions
list_providers List registered audio providers with capabilities and availability

Set AUDIO_ENGINEER_OUTPUT to control where the MCP server writes files.
See the MCP Server guide for full details.


πŸ”Œ Multi-Provider System

The ProviderRegistry routes generation requests to the best available backend:

  1. LLMMidiProvider β€” LLM-driven MIDI generation (highest priority when an LLM callable is configured; falls back to MidiProvider on parse failure)
  2. MidiProvider β€” algorithmic MIDI generation (always available, zero dependencies)
  3. GeminiLyriaProvider β€” full-length audio via Google Lyria 3 (requires pip install -e ".[gemini]" and AUDIO_ENGINEER_GEMINI_API_KEY)

Custom providers can be registered at runtime:

from audio_engineer.providers import ProviderRegistry, AudioProvider, ProviderCapability

registry = ProviderRegistry()
registry.register(my_custom_provider)

LLM MIDI generation

Pass any callable (str) -> str as the llm parameter of LLMMidiProvider or SessionOrchestrator:

from audio_engineer.providers.llm_midi_provider import LLMMidiProvider

provider = LLMMidiProvider(llm=lambda prompt: openai_client.complete(prompt))
result = provider.generate_track(request)   # falls back to MidiProvider if JSON unparseable

See the Providers guide for details.


πŸ€– Google Gemini Integration

The audio_engineer.gemini package wraps the Google GenAI SDK for three capabilities:

Agent Class What it does
Music generation MusicGenerationAgent Full songs or 30 s clips via Lyria 3 (clip or pro model)
Audio analysis AudioAnalysisAgent Transcription, genre/mood detection, audio Q&A
Text-to-speech TTSAgent Narration and vocal scratch tracks
pip install -e ".[gemini]"
export AUDIO_ENGINEER_GEMINI_API_KEY=your_key_here

See the Gemini guide for usage examples.


Start the dev server: python scripts/run_dev.py (requires pip install -e ".[api]")

Method Path Description
POST /sessions Create a new session
POST /sessions/{id}/run Run the session (trigger generation)
GET /sessions/{id} Get session status and output file paths
GET /sessions/{id}/tracks List generated tracks
GET /sessions/{id}/export Download the combined MIDI file

Interactive API docs: http://localhost:8000/docs


βš™οΈ Configuration

All settings use the AUDIO_ENGINEER_ environment variable prefix (or a .env file in the project root).

Variable Description Default
AUDIO_ENGINEER_OUTPUT_DIR Directory for generated files ./output
AUDIO_ENGINEER_LOG_LEVEL Logging verbosity INFO
AUDIO_ENGINEER_OPENAI_API_KEY OpenAI key for LLM agents (unset)
AUDIO_ENGINEER_ANTHROPIC_API_KEY Anthropic key for LLM agents (unset)
AUDIO_ENGINEER_GEMINI_API_KEY Google Gemini API key (Lyria 3 / audio analysis) (unset)
AUDIO_ENGINEER_LLM_PROVIDER Active LLM backend: openai, anthropic, gemini openai
AUDIO_ENGINEER_DEFAULT_AUDIO_PROVIDER Default audio generation provider midi_engine
AUDIO_ENGINEER_SOUNDFONT_PATH Path to a .sf2 SoundFont for FluidSynth rendering (unset)
AUDIO_ENGINEER_HOST FastAPI server host 0.0.0.0
AUDIO_ENGINEER_PORT FastAPI server port 8000

πŸ› οΈ Development

# Install dev dependencies
pip install -e ".[dev]"

# Run tests
pytest -v

# Lint
ruff check src/ tests/ scripts/

# Run a quick smoke test
python scripts/generate_demo.py --genre blues --key A --mode minor -v

πŸ“ Project Structure

src/audio_engineer/
β”œβ”€β”€ agents/
β”‚   β”œβ”€β”€ base.py              # BaseMusician, BaseEngineer, SessionContext
β”‚   β”œβ”€β”€ orchestrator.py      # SessionOrchestrator
β”‚   β”œβ”€β”€ musician/            # DrummerAgent, BassistAgent, GuitaristAgent, KeyboardistAgent,
β”‚   β”‚                        # StringsAgent, BrassAgent, SynthAgent, PercussionAgent, LeadGuitarAgent
β”‚   └── engineer/            # MixerAgent, MasteringAgent
β”œβ”€β”€ core/
β”‚   β”œβ”€β”€ models.py            # Pydantic models (Session, MidiTrackData, Genre, Instrument, Mode, …)
β”‚   β”œβ”€β”€ music_theory.py      # Scales, chords, progressions (ProgressionFactory with 15+ methods)
β”‚   β”œβ”€β”€ midi_engine.py       # MIDI file construction
β”‚   β”œβ”€β”€ patterns.py          # Pattern library (DrumRudiment, BassPattern, MelodicPattern, 40+ drum patterns)
β”‚   β”œβ”€β”€ rhythm.py            # Rhythmic utilities
β”‚   β”œβ”€β”€ audio_track.py       # AudioTrack model for provider results
β”‚   β”œβ”€β”€ track_composer.py    # Higher-level track composition helpers
β”‚   β”œβ”€β”€ llm_prompts.py       # LLM prompt builder, JSON parser, event validator
β”‚   └── constants.py         # TICKS_PER_BEAT, all 128 GM programs, all 47 GM drum sounds, scales, chords
β”œβ”€β”€ providers/               # Multi-provider audio generation system
β”‚   β”œβ”€β”€ base.py              # AudioProvider ABC, TrackRequest/Result, ProviderCapability
β”‚   β”œβ”€β”€ registry.py          # ProviderRegistry with capability-based routing
β”‚   β”œβ”€β”€ midi_provider.py     # MidiProvider (algorithmic, zero-dependency)
β”‚   β”œβ”€β”€ llm_midi_provider.py # LLMMidiProvider (LLM-driven MIDI generation with fallback)
β”‚   └── gemini_provider.py   # GeminiLyriaProvider (Lyria 3 audio generation)
β”œβ”€β”€ gemini/                  # Google Gemini AI integration
β”‚   β”œβ”€β”€ client.py            # GeminiClient singleton wrapper
β”‚   β”œβ”€β”€ music_gen.py         # MusicGenerationAgent (Lyria 3 clip & pro)
β”‚   β”œβ”€β”€ audio_analysis.py    # AudioAnalysisAgent
β”‚   └── tts.py               # TTSAgent (text-to-speech)
β”œβ”€β”€ daw/                     # Audio backends (FluidSynth, TiMidity, GarageBand, Logic Pro)
β”œβ”€β”€ api/                     # FastAPI application and route handlers
β”œβ”€β”€ ui/                      # Static web interface (served by the API)
β”œβ”€β”€ mcp_server.py            # MCP server entry point (audio-engineer-mcp)
└── config/                  # Settings and logging configuration

πŸ“– Documentation

Full documentation is published at https://ianlintner.github.io/audio_engineer/.

Topics include:


🀝 Contributing

Contributions are welcome! See CONTRIBUTING.md for guidelines.

Key rules:

  • Keep orchestration order stable: drums β†’ bass β†’ guitar β†’ keys
  • Prefer deterministic defaults; isolate randomness behind explicit seeds
  • Add or update tests for non-trivial behavior changes
  • Keep PRs focused β€” no unrelated refactors

πŸ“„ License

MIT

About

Agentic Audio Engineer and Generate

Topics

Resources

Contributing

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors