Skip to content

KylinDemons/AAMC

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

30 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Adaptive Agentic Meta-Controller (AAMC)

A reinforcement learning framework for intelligent SLM/LLM orchestration that dynamically routes queries to the most appropriate model, balancing performance, cost, and latency.

Overview

The AAMC framework addresses the challenge of efficiently orchestrating heterogeneous language models by:

  1. Task Complexity Estimator (TCE): A multi-task encoder that analyzes incoming queries to predict complexity, category, and tool-use requirements
  2. Reinforcement Learning Router (RLR): A preference-conditioned PPO agent that learns dynamic routing policies optimizing for multiple objectives
  3. Simulation Environment: A high-fidelity gymnasium environment modeling realistic model pools with cost, latency, and performance characteristics

Key Results: AAMC achieves >90% task success rate (comparable to LLM-only) while reducing operational costs by over 70% and significantly improving inference latency.

Repository Structure

AAMC-project/
├── README.md                 # This file
├── requirements.txt          # Python dependencies
├── environment.yml           # Conda environment
├── Makefile                  # Common commands
├── Dockerfile               # Docker build configuration
├── configs/                 # Configuration files
│   ├── tce.yaml            # TCE training config
│   ├── rlr.yaml            # RLR training config
│   └── sim.yaml            # Simulation environment config
├── data/                    # Data directory
│   ├── generate_tce_dataset.py  # Synthetic data generator
│   ├── dtce_train.csv      # Training data (generated)
│   ├── dtce_val.csv        # Validation data
│   └── dtce_test.csv       # Test data
├── aamc_env/               # Simulation environment
│   ├── __init__.py
│   └── env.py              # Gymnasium environment
├── tce/                    # Task Complexity Estimator
│   ├── __init__.py
│   ├── model.py            # TCE model architecture
│   ├── train_tce.py        # Training script
│   └── eval_tce.py         # Evaluation script
├── rler/                   # Reinforcement Learning Router
│   ├── __init__.py
│   ├── model.py            # Actor-critic networks
│   ├── ppo.py              # PPO algorithm
│   ├── train_rlr.py        # Training script
│   └── eval_rlr.py         # Evaluation script
├── baselines/              # Baseline strategies
│   ├── __init__.py
│   └── strategies.py       # LLM-only, SLM-only, rule-based, supervised
├── inference/              # Inference module
│   ├── __init__.py
│   ├── aamc_inference.py   # Single-query routing
│   └── service_stub.py     # FastAPI service (optional)
├── scripts/                # Experiment scripts
│   ├── evaluate_all.py     # Comprehensive evaluation
│   └── reproduce_fig5.sh   # Reproduce paper figures
├── experiments_results/    # Results directory
├── checkpoints/            # Model checkpoints
│   ├── tce/
│   └── rlr/
├── logs/                   # Training logs
│   ├── tce/
│   └── rlr/
└── tests/                  # Unit tests
    ├── __init__.py
    ├── test_env.py
    └── test_tce.py

Installation

Option 1: pip (Recommended)

# Clone repository
git clone <repository-url>
cd AAMC-project

# Create virtual environment
python3.10 -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate

# Install dependencies
pip install -r requirements.txt

Option 2: Conda

# Create conda environment
conda env create -f environment.yml
conda activate aamc

# Verify installation
python -c "import torch; print(torch.__version__)"

Option 3: Docker

# Build Docker image
docker build -t aamc:latest .

# Run container
docker run -it --gpus all -v $(pwd):/workspace aamc:latest bash

Quick Start

1. Generate Training Data

make data
# Or manually:
python data/generate_tce_dataset.py --n_train 10000 --n_val 2000 --n_test 2000

This generates synthetic queries with complexity scores, categories, and tool-use labels.

2. Train Task Complexity Estimator (TCE)

make tce_train
# Or manually:
python tce/train_tce.py --config configs/tce.yaml --data_dir data

Expected training time: 1-2 hours on GPU, 4-6 hours on CPU

Output: Trained TCE checkpoint at checkpoints/tce/best_model.pt

3. Train Reinforcement Learning Router (RLR)

make rlr_train
# Or manually:
python rler/train_rlr.py --config configs/rlr.yaml --sim_config configs/sim.yaml --tce_checkpoint checkpoints/tce/best_model.pt

Expected training time: 4-8 hours on GPU, 16-24 hours on CPU

Output: Trained RLR checkpoint at checkpoints/rlr/final_model.pt

For quick testing (reduced timesteps):

make rlr_train_quick

4. Evaluate All Strategies

make evaluate
# Or manually:
python scripts/evaluate_all.py \
    --sim_config configs/sim.yaml \
    --tce_checkpoint checkpoints/tce/best_model.pt \
    --rlr_checkpoint checkpoints/rlr/final_model.pt \
    --n_episodes 10

Output: Comparison table and metrics in experiments_results/

5. Inference (Single Query)

python inference/aamc_inference.py \
    --query "Write a Python function to sort a list" \
    --tce_checkpoint checkpoints/tce/best_model.pt \
    --rlr_checkpoint checkpoints/rlr/final_model.pt \
    --preference 0.7,0.2,0.1

Output: Selected model with explanation

Configuration

TCE Configuration (configs/tce.yaml)

Key parameters:

  • encoder: Pre-trained transformer (default: distilbert-base-uncased)
  • learning_rate: 2e-5
  • num_epochs: 10
  • lambda_c, lambda_cat, lambda_t: Loss weights for multi-task learning

RLR Configuration (configs/rlr.yaml)

Key parameters:

  • gamma: Discount factor (0.99)
  • gae_lambda: GAE lambda (0.95)
  • clip_epsilon: PPO clipping (0.2)
  • n_steps: Steps per rollout (2048)
  • total_timesteps: Total training steps (1,000,000)
  • preference_sampling: Strategy for sampling preference vectors

Simulation Configuration (configs/sim.yaml)

Defines:

  • Model pool specifications (cost, latency, performance)
  • Task categories and distributions
  • Performance matrix (success probability per model-task pair)
  • Reward normalization ranges

Evaluation Metrics

The evaluation script computes:

  1. Task Success Rate: Percentage of successfully completed tasks
  2. Average Cost per Task: Mean operational cost in USD
  3. Average Latency: Mean end-to-end latency in milliseconds
  4. Overall Efficiency Score: success_rate / (α·cost + β·latency)
  5. Per-Category Success Rates: Fairness analysis across task types
  6. Model Distribution: Frequency of model selections

Baselines

The implementation includes four baseline strategies:

  1. LLM-Only: Always route to LLM-XLarge (maximum performance, maximum cost)
  2. SLM-Only: Always route to SLM-Medium (balanced performance/cost)
  3. Rule-Based: Threshold-based routing using TCE complexity score
  4. Supervised Router: Trained classifier for model selection

Experiments

Reproduce Paper Results

# Generate full dataset
python data/generate_tce_dataset.py --n_train 50000 --n_val 10000 --n_test 10000

# Train TCE
make tce_train

# Train RLR
make rlr_train

# Evaluate all strategies
make evaluate

# Generate comparison plots
python scripts/plot_results.py --results_dir experiments_results

Robustness Experiments

Test robustness to noisy TCE predictions:

python scripts/evaluate_robustness.py \
    --noise_levels 0.0,0.1,0.2,0.3 \
    --n_episodes 20

Scalability Analysis

Measure decision time vs. number of models:

python scripts/evaluate_scalability.py \
    --model_counts 3,5,10,20 \
    --n_queries 1000

Compute Requirements

Minimum (Sanity Checks)

  • CPU: 4 cores
  • RAM: 16 GB
  • Storage: 20 GB
  • Time: ~30 minutes

Recommended (Full Experiments)

  • GPU: NVIDIA GPU with 16+ GB VRAM (V100, A100, RTX 4090)
  • CPU: 16+ cores
  • RAM: 64 GB
  • Storage: 100 GB
  • Time: ~8-12 GPU-hours

Cloud Options

  • AWS: p3.2xlarge (V100) - ~$3/hour
  • Google Cloud: n1-standard-8 + T4 GPU - ~$2-4/hour
  • Azure: NC6s_v3 (V100) - ~$3-6/hour

Testing

Run unit tests:

make test
# Or manually:
pytest tests/ -v --cov=. --cov-report=html

View coverage report:

open htmlcov/index.html

Design Decisions & Deviations

TCE Encoder Choice

Decision: Use distilbert-base-uncased as default
Rationale: Balances performance and efficiency (66M parameters, ~40ms inference on CPU)
Alternatives: bert-tiny (4M params) for ultra-fast, mobileBERT for mobile deployment

Queueing Model

Decision: Simple M/M/1 queue per model
Rationale: Captures essential dynamics while remaining computationally efficient
Deviation: Paper may use more complex queueing; this is a reasonable approximation

Synthetic Data

Decision: Generate synthetic queries with templates
Rationale: Enables end-to-end pipeline without external dependencies
Note: Real datasets can be plugged in via documented interface

PPO Implementation

Decision: Implement from scratch rather than using stable-baselines3
Rationale: Fine-grained control over preference conditioning and vector rewards

Troubleshooting

CUDA Out of Memory

  • Reduce batch size in configs
  • Use gradient accumulation
  • Try mixed precision training

TCE Training Slow

  • Use smaller encoder (e.g., bert-tiny)
  • Reduce dataset size
  • Enable mixed precision

RLR Not Converging

  • Increase training timesteps
  • Adjust PPO hyperparameters (clip_epsilon, learning rate)
  • Check reward normalization ranges

Citation

If you use this code, please cite:

@article{rjoub2025aamc,
  title={Adaptive Agentic Meta-Controller (AAMC): A Reinforcement Learning Framework for Intelligent SLM/LLM Orchestration},
  author={Rjoub, Gaith and Bentahar, Jamal and Almolydeen, Shahed and Irjoob, Ahmad},
  journal={Neurocomputing},
  year={2025}
}

License

This implementation is provided for research and educational purposes.

Contact

For questions or issues:

Acknowledgments

This implementation is based on the paper "Adaptive Agentic Meta-Controller (AAMC): A Reinforcement Learning Framework for Intelligent SLM/LLM Orchestration" and uses:

  • PyTorch for deep learning
  • Hugging Face Transformers for pre-trained encoders
  • Gymnasium for RL environments
  • Stable-Baselines3 for reference implementations

Version: 1.0
Last Updated: 2025-10-09

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages

  • Python 98.2%
  • Makefile 1.2%
  • Dockerfile 0.6%