LLM Router: Complete Setup Guide

A step-by-step guide to train and evaluate RL agents for LLM routing.

⚠️ IMPORTANT: Dataset Download

The run.sh script will automatically download the dataset if it's missing.

However, if you prefer to download manually or the automatic download fails:

Download from Hugging Face:
- Visit: https://huggingface.co/datasets/withmartian/routerbench/blob/main/routerbench_raw.pkl
- Click the "Download" button to download routerbench_raw.pkl (~1.2 GB)

Move to data folder:

# Move the downloaded file to the data directory
mv ~/Downloads/routerbench_raw.pkl data/
# Or if downloaded to current directory:
mv routerbench_raw.pkl data/

Verify the file is in place:
```
ls -lh data/routerbench_raw.pkl
```

The file should be approximately 1.2 GB in size.

Quick Start (Automated)

For a fully automated setup that handles everything from environment setup to running the Streamlit dashboard:

# Make the script executable (first time only)
chmod +x run.sh

# Run everything automatically
./run.sh

This script will:

✅ Check Python installation
✅ Create and activate virtual environment
✅ Install all dependencies
✅ Automatically download dataset if missing (from Hugging Face)
✅ Precompute embeddings for embedding-based experiments
✅ Train all agents (DQN, PPO, LinUCB, PickLLM, Greedy)
✅ Run evaluation on all agents
✅ Run comparison experiments (baseline and embedding comparisons)
✅ Launch Streamlit dashboard

Options:

# Skip steps (comma-separated list)
./run.sh --skip=download
./run.sh --skip=embedding
./run.sh --skip=training
./run.sh --skip=evaluation
./run.sh --skip=download,embedding
./run.sh --skip=download,embedding,training
./run.sh --skip=download,embedding,training,evaluation

# Customize episode counts
./run.sh --episodes 1000 --eval-episodes 50

# Combine options
./run.sh --skip=download,embedding,training --episodes 1000

# Get help
./run.sh --help

Note: The script will automatically download the dataset (~1.2 GB) from Hugging Face if it's not found. This requires an internet connection and may take several minutes.

Step-by-Step Guide

For manual setup and more control over each step:

Step 1: Setup Environment

# Clone and enter directory
cd LLMRouter

# Create virtual environment
python -m venv .venv
source .venv/bin/activate  # Linux/Mac
# or: .venv\Scripts\activate  # Windows

# Install dependencies
pip install -r requirements.txt

Step 2: Prepare Data

Pre-compute Embeddings (Recommended)

For faster training with embeddings:

python scripts/precompute_embeddings.py --model all-MiniLM-L6-v2

This creates data/prompt_embeddings_all-MiniLM-L6-v2.pkl (~100MB, takes ~5 min).

Step 3: Train Agents

Train DQN (Feature-based)

python training/train.py --algo dqn \
    --num_episodes 1000 \

Train DQN (with Embeddings)

python training/train.py --algo dqn \
    --num_episodes 1000 \
    --use_embeddings \
    --embeddings_path data/prompt_embeddings_all-MiniLM-L6-v2.pkl \
    --embedding_model all-MiniLM-L6-v2 \
    --output_dir results/dqn_embeddings

Train PPO (Feature-based)

python training/train.py --algo ppo \
    --num_episodes 1000 \

Train PPO (with Embeddings)

python training/train.py --algo ppo \
    --num_episodes 1000 \
    --use_embeddings \
    --embeddings_path data/prompt_embeddings_all-MiniLM-L6-v2.pkl \
    --embedding_model all-MiniLM-L6-v2 \
    --output_dir results/ppo_embeddings

Train LinUCB

python training/train.py --algo linucb \
    --num_episodes 1000 \
    --linucb_alpha 1.0 \

Train PickLLM

python training/train.py --algo pickllm \
    --num_episodes 1000 \
    --pickllm_lr 0.1 \
    --pickllm_gamma 0.0 \
    --pickllm_epsilon_start 0.1 \
    --pickllm_epsilon_end 0.01 \
    --pickllm_epsilon_decay 0.995 \

Training takes ~10-30 minutes depending on episodes and hardware.

Step 4: Evaluate Agents

Quick Comparison (All Agents)

Edit compare_embedding_experiments.py to point to your trained models, then:

python evaluation/compare_embedding_experiments.py --num_episodes 100

Output:

======================================================================
SUMMARY
======================================================================
Experiment                            Reward         Std        Cost         AIQ
-------------------------------------------------------------------------------
Greedy                                1.7015      0.2269    0.001152      1.4138
DQN (Features)                        1.4828      0.3365    0.007123      1.1497
PPO (Embeddings)                      1.4389      0.3075    0.007950      1.8301

Evaluate Single Agent

python evaluation/evaluate.py \
    --agent_type dqn \
    --dqn_agent_path results/dqn_embeddings/models/dqn_agent_final.pt \
    --use_embeddings \
    --embeddings_path data/prompt_embeddings_all-MiniLM-L6-v2.pkl

Evaluate All Agents

python evaluation/evaluate.py \
    --agent_type all

Step 5: View Results

Streamlit Dashboard

streamlit run evaluation/streamlit_app.py

Opens at http://localhost:8501 with:

Sidebar: File selection and upload for JSON results
Summary Metrics: Best agent, best reward, lowest cost, best efficiency
All Metrics Table: Comprehensive metrics for all agents (reward, performance, cost, efficiency, AIQ, budget utilization, completion rate)
Visualizations (4 tabs):
- Reward: Average reward and performance bar charts
- Cost & Efficiency: Cost and efficiency comparisons
- AIQ: AIQ scores and cost vs performance scatter plots
- Episode Distribution: Box plots and histograms of episode rewards
Agent Comparison: Interactive radar chart for comparing selected agents

Result Files

results/
├── models/                          # All trained models
│   ├── dqn_agent_final.pt          # DQN model
│   ├── dqn_agent_episode_*.pt      # DQN checkpoints
│   ├── ppo_agent_final.pt          # PPO model
│   ├── ppo_agent_episode_*.pt      # PPO checkpoints
│   ├── linucb_agent_final.npz     # LinUCB model
│   └── pickllm_agent_final.npz    # PickLLM model
├── dqn_training_stats.json          # DQN training statistics
├── ppo_training_stats.json          # PPO training statistics
├── linucb_training_stats.json       # LinUCB training statistics
├── pickllm_training_stats.json     # PickLLM training statistics
├── greedy_stats.json                # Greedy baseline statistics
├── evaluation_results.json          # Evaluation results (from evaluate.py)
├── baseline_comparison.json         # Baseline comparison results
├── comparison/                      # Comparison experiments
│   └── comparison.json
└── embedding_comparison/            # Embedding comparison results
    └── embedding_comparison.json

Full Example Workflow

# 1. Setup
source .venv/bin/activate
pip install -r requirements.txt

# 2. Pre-compute embeddings
python scripts/precompute_embeddings.py --model all-MiniLM-L6-v2

# 3. Train DQN with embeddings
python training/train.py --algo dqn \
    --num_episodes 1000 \
    --use_embeddings \
    --embeddings_path data/prompt_embeddings_all-MiniLM-L6-v2.pkl \

# 4. Train PPO with embeddings
python training/train.py --algo ppo \
    --num_episodes 1000 \
    --use_embeddings \
    --embeddings_path data/prompt_embeddings_all-MiniLM-L6-v2.pkl \
# 5. Evaluate all agents
python compare_embedding_experiments.py --num_episodes 100

# 6. View dashboard
streamlit run evaluation/streamlit_app.py

Hyperparameter Tuning

DQN Options

python training/train.py --algo dqn \
    --lr 0.0005 \              # Learning rate
    --gamma 0.99 \             # Discount factor
    --epsilon_decay 0.997 \    # Slower exploration decay
    --reward_lambda 0.2 \      # Cost penalty weight
    --latency_lambda 0.1 \     # Latency penalty weight
    --batch_size 128 \
    --hidden_dims 256,256      # Larger network

PPO Options

python training/train.py --algo ppo \
    --ppo_lr 1e-4 \
    --ppo_epochs 15 \
    --ppo_clip_epsilon 0.1

LinUCB Options

python training/train.py --algo linucb \
    --linucb_alpha 1.0 \          # Exploration parameter (higher = more exploration)
    --reward_lambda 0.2 \          # Cost penalty weight
    --num_episodes 1000

PickLLM Options

python training/train.py --algo pickllm \
    --pickllm_lr 0.1 \             # Learning rate
    --pickllm_gamma 0.0 \         # Discount factor (0.0 for immediate rewards)
    --pickllm_epsilon_start 0.1 \ # Initial exploration rate
    --pickllm_epsilon_end 0.01 \  # Final exploration rate
    --pickllm_epsilon_decay 0.995 \ # Exploration decay rate
    --reward_lambda 0.2 \          # Cost penalty weight
    --num_episodes 1000

Expected Results

Agent	Avg Reward	Avg Cost	Cost Efficiency
Greedy	~1.70	~0.0011	~1800
LinUCB	~1.27	~0.0004	~3700
PickLLM	~1.23	~0.0004	~3500
DQN (Features)	~1.48	~0.0071	~1250
DQN (Embeddings)	~1.41	~0.0054	~850
PPO (Embeddings)	~1.44	~0.0080	~1800

Troubleshooting

"ModuleNotFoundError": Run pip install -r requirements.txt

"FileNotFoundError: routerbench_raw.pkl": Place dataset in data/ folder

"State dimension mismatch": Ensure embeddings match the trained model's configuration

Slow training: Use pre-computed embeddings instead of live encoding

Name		Name	Last commit message	Last commit date
Latest commit History 48 Commits
agents		agents
data		data
evaluation		evaluation
results		results
scripts		scripts
training		training
.gitignore		.gitignore
README.md		README.md
requirements.txt		requirements.txt
run.sh		run.sh

Folders and files

Latest commit

History

Repository files navigation

LLM Router: Complete Setup Guide

⚠️ IMPORTANT: Dataset Download

Quick Start (Automated)

Step-by-Step Guide

Step 1: Setup Environment

Step 2: Prepare Data

Pre-compute Embeddings (Recommended)

Step 3: Train Agents

Train DQN (Feature-based)

Train DQN (with Embeddings)

Train PPO (Feature-based)

Train PPO (with Embeddings)

Train LinUCB

Train PickLLM

Step 4: Evaluate Agents

Quick Comparison (All Agents)

Evaluate Single Agent

Evaluate All Agents

Step 5: View Results

Streamlit Dashboard

Result Files

Full Example Workflow

Hyperparameter Tuning

DQN Options

PPO Options

LinUCB Options

PickLLM Options

Expected Results

Troubleshooting

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages