The src/ directory is structured following Clean Architecture and Hexagonal Architecture principles. This design ensures that the core domain logic remains independent of external frameworks, heavy machine learning adapters, and UI details, enabling high maintainability and testability.
The core of the application, defining business entities and ports (interfaces).
entities.py: Defines core data structures likeGenerationProfile,GenerationResult,Intent, andMidiBytes.interfaces.py: Declares abstract ports (e.g.,MidiGenerator,BatchMidiGenerator,Evaluator) that adapters must implement.remi_vocab.py: Definitions for MIDI tokenization vocabularies.
Contains the application-specific business rules and orchestration logic.
progressive_search.py: Orchestrates the Text2Midi generation strategy using a step-by-step progressive evaluation.best_of_n_search.py: Orchestrates the MidiLLM generation strategy using batch generation and Best-of-N scoring.token_heuristics.py: Defines heuristic-based evaluations for fast sequence scoring.
Contains the concrete implementations of the interfaces defined in the domain layer, bridging the gap to external systems and libraries.
generators/: Specific integrations with generation models (text2midi_generator.pyandmidillm_generator.py).evaluators/: Implementations for sequence scoring (clap_evaluator.py,composite.py).translators/: NLP translation implementations to convert natural language to technical prompts (google_ai_translator.py,pass_through_translator.py).audio/: Audio rendering engines to transform symbolic MIDI into waveforms for evaluation (fluidsynth_memory.py).
profiles.py: Defines predefined generation profiles (BALANCED,MIDILLM_FAST, etc.) to easily switch between strategies.transformer_model.py: Contains the definition of the underlying neural network architectures (like the Transformer Encoder-Decoder).
pipeline.py: The main entry point acting as a Dependency Injection Container. It instantiates heavy adapters once, prevents GPU memory leaks, and selects the correct search strategy based on the profile.cli.py: The Command Line Interface exposing the pipeline features to end-users.
- Input: The user requests a generation through
cli.py. - Setup:
pipeline.pyinitializes the heavy models/adapters and injects them into the chosen use case strategy. - Execution: Based on the
GenerationProfile, eitherProgressiveSearchorBestOfNSearchtakes control. - Processing:
- Translators transform natural language into structured technical prompts.
- Generators create raw token sequences representing MIDI events.
- Evaluators compute scores using heuristics or ML models (e.g., CLAP) to select the best output.
- Output: The optimal
GenerationResult(containing the final MIDI bytes) is returned and saved to disk.