Skip to content

shantanold/LLMRouter

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

48 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

LLM Router: Complete Setup Guide

A step-by-step guide to train and evaluate RL agents for LLM routing.


⚠️ IMPORTANT: Dataset Download

The run.sh script will automatically download the dataset if it's missing.

However, if you prefer to download manually or the automatic download fails:

  1. Download from Hugging Face:

  2. Move to data folder:

    # Move the downloaded file to the data directory
    mv ~/Downloads/routerbench_raw.pkl data/
    # Or if downloaded to current directory:
    mv routerbench_raw.pkl data/
  3. Verify the file is in place:

    ls -lh data/routerbench_raw.pkl

The file should be approximately 1.2 GB in size.


Quick Start (Automated)

For a fully automated setup that handles everything from environment setup to running the Streamlit dashboard:

# Make the script executable (first time only)
chmod +x run.sh

# Run everything automatically
./run.sh

This script will:

  1. ✅ Check Python installation
  2. ✅ Create and activate virtual environment
  3. ✅ Install all dependencies
  4. Automatically download dataset if missing (from Hugging Face)
  5. ✅ Precompute embeddings for embedding-based experiments
  6. ✅ Train all agents (DQN, PPO, LinUCB, PickLLM, Greedy)
  7. ✅ Run evaluation on all agents
  8. ✅ Run comparison experiments (baseline and embedding comparisons)
  9. ✅ Launch Streamlit dashboard

Options:

# Skip steps (comma-separated list)
./run.sh --skip=download
./run.sh --skip=embedding
./run.sh --skip=training
./run.sh --skip=evaluation
./run.sh --skip=download,embedding
./run.sh --skip=download,embedding,training
./run.sh --skip=download,embedding,training,evaluation

# Customize episode counts
./run.sh --episodes 1000 --eval-episodes 50

# Combine options
./run.sh --skip=download,embedding,training --episodes 1000

# Get help
./run.sh --help

Note: The script will automatically download the dataset (~1.2 GB) from Hugging Face if it's not found. This requires an internet connection and may take several minutes.


Step-by-Step Guide

For manual setup and more control over each step:

Step 1: Setup Environment

# Clone and enter directory
cd LLMRouter

# Create virtual environment
python -m venv .venv
source .venv/bin/activate  # Linux/Mac
# or: .venv\Scripts\activate  # Windows

# Install dependencies
pip install -r requirements.txt

Step 2: Prepare Data

Pre-compute Embeddings (Recommended)

For faster training with embeddings:

python scripts/precompute_embeddings.py --model all-MiniLM-L6-v2

This creates data/prompt_embeddings_all-MiniLM-L6-v2.pkl (~100MB, takes ~5 min).


Step 3: Train Agents

Train DQN (Feature-based)

python training/train.py --algo dqn \
    --num_episodes 1000 \

Train DQN (with Embeddings)

python training/train.py --algo dqn \
    --num_episodes 1000 \
    --use_embeddings \
    --embeddings_path data/prompt_embeddings_all-MiniLM-L6-v2.pkl \
    --embedding_model all-MiniLM-L6-v2 \
    --output_dir results/dqn_embeddings

Train PPO (Feature-based)

python training/train.py --algo ppo \
    --num_episodes 1000 \

Train PPO (with Embeddings)

python training/train.py --algo ppo \
    --num_episodes 1000 \
    --use_embeddings \
    --embeddings_path data/prompt_embeddings_all-MiniLM-L6-v2.pkl \
    --embedding_model all-MiniLM-L6-v2 \
    --output_dir results/ppo_embeddings

Train LinUCB

python training/train.py --algo linucb \
    --num_episodes 1000 \
    --linucb_alpha 1.0 \

Train PickLLM

python training/train.py --algo pickllm \
    --num_episodes 1000 \
    --pickllm_lr 0.1 \
    --pickllm_gamma 0.0 \
    --pickllm_epsilon_start 0.1 \
    --pickllm_epsilon_end 0.01 \
    --pickllm_epsilon_decay 0.995 \

Training takes ~10-30 minutes depending on episodes and hardware.


Step 4: Evaluate Agents

Quick Comparison (All Agents)

Edit compare_embedding_experiments.py to point to your trained models, then:

python evaluation/compare_embedding_experiments.py --num_episodes 100

Output:

======================================================================
SUMMARY
======================================================================
Experiment                            Reward         Std        Cost         AIQ
-------------------------------------------------------------------------------
Greedy                                1.7015      0.2269    0.001152      1.4138
DQN (Features)                        1.4828      0.3365    0.007123      1.1497
PPO (Embeddings)                      1.4389      0.3075    0.007950      1.8301

Evaluate Single Agent

python evaluation/evaluate.py \
    --agent_type dqn \
    --dqn_agent_path results/dqn_embeddings/models/dqn_agent_final.pt \
    --use_embeddings \
    --embeddings_path data/prompt_embeddings_all-MiniLM-L6-v2.pkl

Evaluate All Agents

python evaluation/evaluate.py \
    --agent_type all 

Step 5: View Results

Streamlit Dashboard

streamlit run evaluation/streamlit_app.py

Opens at http://localhost:8501 with:

  • Sidebar: File selection and upload for JSON results
  • Summary Metrics: Best agent, best reward, lowest cost, best efficiency
  • All Metrics Table: Comprehensive metrics for all agents (reward, performance, cost, efficiency, AIQ, budget utilization, completion rate)
  • Visualizations (4 tabs):
    • Reward: Average reward and performance bar charts
    • Cost & Efficiency: Cost and efficiency comparisons
    • AIQ: AIQ scores and cost vs performance scatter plots
    • Episode Distribution: Box plots and histograms of episode rewards
  • Agent Comparison: Interactive radar chart for comparing selected agents

Result Files

results/
├── models/                          # All trained models
│   ├── dqn_agent_final.pt          # DQN model
│   ├── dqn_agent_episode_*.pt      # DQN checkpoints
│   ├── ppo_agent_final.pt          # PPO model
│   ├── ppo_agent_episode_*.pt      # PPO checkpoints
│   ├── linucb_agent_final.npz     # LinUCB model
│   └── pickllm_agent_final.npz    # PickLLM model
├── dqn_training_stats.json          # DQN training statistics
├── ppo_training_stats.json          # PPO training statistics
├── linucb_training_stats.json       # LinUCB training statistics
├── pickllm_training_stats.json     # PickLLM training statistics
├── greedy_stats.json                # Greedy baseline statistics
├── evaluation_results.json          # Evaluation results (from evaluate.py)
├── baseline_comparison.json         # Baseline comparison results
├── comparison/                      # Comparison experiments
│   └── comparison.json
└── embedding_comparison/            # Embedding comparison results
    └── embedding_comparison.json

Full Example Workflow

# 1. Setup
source .venv/bin/activate
pip install -r requirements.txt

# 2. Pre-compute embeddings
python scripts/precompute_embeddings.py --model all-MiniLM-L6-v2

# 3. Train DQN with embeddings
python training/train.py --algo dqn \
    --num_episodes 1000 \
    --use_embeddings \
    --embeddings_path data/prompt_embeddings_all-MiniLM-L6-v2.pkl \

# 4. Train PPO with embeddings
python training/train.py --algo ppo \
    --num_episodes 1000 \
    --use_embeddings \
    --embeddings_path data/prompt_embeddings_all-MiniLM-L6-v2.pkl \
# 5. Evaluate all agents
python compare_embedding_experiments.py --num_episodes 100

# 6. View dashboard
streamlit run evaluation/streamlit_app.py

Hyperparameter Tuning

DQN Options

python training/train.py --algo dqn \
    --lr 0.0005 \              # Learning rate
    --gamma 0.99 \             # Discount factor
    --epsilon_decay 0.997 \    # Slower exploration decay
    --reward_lambda 0.2 \      # Cost penalty weight
    --latency_lambda 0.1 \     # Latency penalty weight
    --batch_size 128 \
    --hidden_dims 256,256      # Larger network

PPO Options

python training/train.py --algo ppo \
    --ppo_lr 1e-4 \
    --ppo_epochs 15 \
    --ppo_clip_epsilon 0.1

LinUCB Options

python training/train.py --algo linucb \
    --linucb_alpha 1.0 \          # Exploration parameter (higher = more exploration)
    --reward_lambda 0.2 \          # Cost penalty weight
    --num_episodes 1000

PickLLM Options

python training/train.py --algo pickllm \
    --pickllm_lr 0.1 \             # Learning rate
    --pickllm_gamma 0.0 \         # Discount factor (0.0 for immediate rewards)
    --pickllm_epsilon_start 0.1 \ # Initial exploration rate
    --pickllm_epsilon_end 0.01 \  # Final exploration rate
    --pickllm_epsilon_decay 0.995 \ # Exploration decay rate
    --reward_lambda 0.2 \          # Cost penalty weight
    --num_episodes 1000

Expected Results

Agent Avg Reward Avg Cost Cost Efficiency
Greedy ~1.70 ~0.0011 ~1800
LinUCB ~1.27 ~0.0004 ~3700
PickLLM ~1.23 ~0.0004 ~3500
DQN (Features) ~1.48 ~0.0071 ~1250
DQN (Embeddings) ~1.41 ~0.0054 ~850
PPO (Embeddings) ~1.44 ~0.0080 ~1800

Troubleshooting

"ModuleNotFoundError": Run pip install -r requirements.txt

"FileNotFoundError: routerbench_raw.pkl": Place dataset in data/ folder

"State dimension mismatch": Ensure embeddings match the trained model's configuration

Slow training: Use pre-computed embeddings instead of live encoding

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors