LLM Hyperparameter Tuning

A comprehensive, beginner-friendly framework for optimizing machine learning model hyperparameters using Large Language Models. This system automatically generates, tests, and analyzes hyperparameter configurations to improve your model's performance.

Overview

This framework uses a three-component pipeline to intelligently optimize hyperparameters:

Hyperparameter Generator → Model Training Script → Results Analyzer

The system works by having an LLM analyze your model's training results and suggest better hyperparameter values for the next iteration. It supports both deep learning (PyTorch, TensorFlow) and classic ML (scikit-learn) models.

How It Works at a High Level

Start with your model - Create a training script that follows our simple interface
Configure search space - Define which hyperparameters to optimize and their ranges
Run optimization - The system automatically generates hyperparameters, trains your model, and analyzes results
Get better performance - The LLM learns from each iteration to suggest improvements

The optimizer communicates with your model through simple JSON files, making it framework-agnostic and easy to integrate.

Requirements and Installation

Prerequisites

# Python 3.7+ required
python --version

# Install core dependencies
pip install ollama numpy matplotlib

# For deep learning examples (optional)
pip install torch transformers scikit-learn pandas

# For classic ML examples (optional)  
pip install scikit-learn pandas numpy

LLM Setup

# Install and run Ollama
curl -fsSL https://ollama.ai/install.sh | sh

# Pull a recommended model (choose one)
ollama pull qwen2.5-coder:32b    # Best performance
ollama pull llama3.1:8b          # Balanced option
ollama pull qwen2.5-coder:7b     # Lightweight option

Quick Start

Step 1: Create Your Configuration

Copy info_template.json to info.json and customize:

{
  "model_info": "Your model description",
  "optimization_goal": "Maximize validation accuracy",
  "metrics": {
    "primary_metric": "val_accuracy",
    "description": "Validation accuracy on held-out data"
  },
  "hyperparameters": {
    "learning_rate": {
      "type": "float",
      "range": [1e-5, 0.1],
      "default": 0.001
    },
    "batch_size": {
      "type": "ordinal", 
      "values": [16, 32, 64, 128],
      "default": 32
    }
  }
}

Step 2: Create Your Model Script

Copy model_template.py to model.py and replace the template sections with your training code.

Step 3: Run Optimization

# Set your LLM model
export LLM_MODEL="qwen2.5-coder:32b"

# Run optimization (default: 10 iterations)
bash run.sh

File Layout and What Each File Does

Core System Files

hyper_optimizer-latest.py - Hyperparameter Generator: LLM-powered optimizer that analyzes results and suggests new hyperparameters
results_analyzer_latest.py - Results Analyzer: Analyzes training trajectories and provides insights for the optimizer
run.sh - Main Pipeline: Orchestrates the optimization loop
plot_trajectories.py - Visualization: Creates plots of optimization progress
enhanced_plotter.py - Advanced Visualization: Detailed analysis plots

Configuration Files

info.json - Main Configuration: Defines hyperparameters, metrics, and optimization goals
info_template.json - Template: Starting point for creating your configuration

Example Models

model_template.py - Template: Framework-agnostic template for any ML model
model_example.py - Deep Learning Example: DistilBERT text classification
model_exapmle2.py - Additional Example: Alternative implementation

Working Directory

temp/ - Temporary Files: Contains hyperparameters.json, results.json, and analysis files during optimization

Integrating Your Own Model

The optimizer works with any model through a simple file-based interface. Your model script must:

Required Functions

Read Configuration:

import json
import os

# Get temp directory (default: "temp")  
TEMP_DIR = os.environ.get("TEMP_DIR", "temp")

# Read optimization target
with open(f"{TEMP_DIR}/info.json", "r") as f:
    config = json.load(f)
primary_metric = config["metrics"]["primary_metric"]

Read Hyperparameters:

# Load hyperparameters generated by optimizer
with open(f"{TEMP_DIR}/hyperparameters.json", "r") as f:
    hyperparams = json.load(f)

learning_rate = hyperparams["learning_rate"] 
batch_size = hyperparams["batch_size"]

Save Results:

# Save results in exact required format
results = {
    "metrics": {
        "val_accuracy": [0.85, 0.87, 0.89],  # Per-epoch values
        "train_accuracy": [0.92, 0.94, 0.95]  # Optional
    },
    "epochs": [1, 2, 3]
}

with open(f"{TEMP_DIR}/results.json", "w") as f:
    json.dump(results, f, indent=2)

Key Integration Points

Environment Variables:

TEMP_DIR - Directory for temporary files (default: "temp")
ITERATION - Current optimization iteration number
LLM_MODEL - LLM model name for the optimizer

Required File Paths:

${TEMP_DIR}/info.json - Configuration (copied from main info.json)
${TEMP_DIR}/hyperparameters.json - Generated hyperparameters to use
${TEMP_DIR}/results.json - Training results you must save

Critical JSON Keys:

metrics.primary_metric in info.json - Must match your results key
metrics.{primary_metric} in results.json - Must contain per-epoch values
epochs in results.json - Must contain corresponding epoch numbers

Importing Hyperparameters Inside Model Scripts

Basic Pattern

import json
import os

def load_hyperparameters():
    """Load hyperparameters with safe defaults"""
    temp_dir = os.environ.get("TEMP_DIR", "temp")
    
    # Set safe defaults first
    defaults = {
        "learning_rate": 0.001,
        "batch_size": 32,
        "epochs": 20,
        "weight_decay": 0.0
    }
    
    # Try to load provided hyperparameters
    try:
        with open(f"{temp_dir}/hyperparameters.json", "r") as f:
            provided = json.load(f)
        # Merge with defaults
        return {**defaults, **provided}
    except (FileNotFoundError, json.JSONDecodeError):
        print("Using default hyperparameters")
        return defaults

# Use in your model
hyperparams = load_hyperparameters()
model = create_model(lr=hyperparams["learning_rate"])

Type-Safe Loading

def get_hyperparameter(hyperparams, key, default, param_type):
    """Safely extract and convert hyperparameter"""
    value = hyperparams.get(key, default)
    
    if param_type == "int":
        return int(float(value))  # Handle "64.0" -> 64
    elif param_type == "float": 
        return float(value)
    elif param_type == "bool":
        return str(value).lower() in ["true", "1", "yes"]
    else:
        return value

# Example usage
hyperparams = load_hyperparameters()
learning_rate = get_hyperparameter(hyperparams, "learning_rate", 0.001, "float")
batch_size = get_hyperparameter(hyperparams, "batch_size", 32, "int")
epochs = get_hyperparameter(hyperparams, "epochs", 20, "int")

Handling Unknown Parameters

def apply_hyperparameters(hyperparams, known_params):
    """Apply only known hyperparameters, ignore others"""
    applied = {}
    
    for key, default_value in known_params.items():
        if key in hyperparams:
            applied[key] = hyperparams[key]
            print(f"Using {key}: {applied[key]}")
        else:
            applied[key] = default_value
            print(f"Using default {key}: {applied[key]}")
    
    # Warn about unknown parameters
    unknown = set(hyperparams.keys()) - set(known_params.keys())
    if unknown:
        print(f"Ignoring unknown parameters: {unknown}")
    
    return applied

Exporting Training Trajectories from Model Scripts

Required Results Format

Your model must save results in this exact format:

# Collect metrics during training
val_accuracies = []
train_accuracies = []
epochs_list = []

for epoch in range(num_epochs):
    # Your training code here...
    train_acc = train_one_epoch()
    val_acc = validate_model()
    
    # Collect results
    val_accuracies.append(float(val_acc))
    train_accuracies.append(float(train_acc))
    epochs_list.append(epoch + 1)

# Save in required format
results = {
    "metrics": {
        "val_accuracy": val_accuracies,     # Primary metric MUST match info.json
        "train_accuracy": train_accuracies  # Optional additional metrics
    },
    "epochs": epochs_list  # Must match length of metric arrays
}

# Save to required location
temp_dir = os.environ.get("TEMP_DIR", "temp")
with open(f"{temp_dir}/results.json", "w") as f:
    json.dump(results, f, indent=2)

Schema Requirements

Critical Rules:

Primary metric name must exactly match metrics.primary_metric from info.json
All metric arrays must have same length as epochs array
Values must be JSON-serializable (use float() for numpy values)
File must be saved to ${TEMP_DIR}/results.json

Example for Different Metrics:

# For loss-based optimization
results = {
    "metrics": {
        "val_loss": [2.3, 1.8, 1.2],      # Primary metric (lower is better)
        "train_loss": [2.1, 1.5, 0.9]
    },
    "epochs": [1, 2, 3]
}

# For accuracy-based optimization  
results = {
    "metrics": {
        "val_accuracy": [0.6, 0.7, 0.85], # Primary metric (higher is better)
        "train_accuracy": [0.8, 0.9, 0.95]
    },
    "epochs": [1, 2, 3]
}

Optional: Print Final Metric for Humans

# Optional: Print final result for human readability
final_metric = results["metrics"]["val_accuracy"][-1]
print(f"Final val_accuracy: {final_metric:.6f}")

How to Run from Command Line and VS Code Integrated Terminal

Command Line Usage

Basic Run:

# Set your LLM model
export LLM_MODEL="qwen2.5-coder:32b"

# Run with defaults (10 iterations, temp/ directory)
bash run.sh

Custom Configuration:

# Custom settings
export LLM_MODEL="llama3.1:8b"
export MAX_ITERATIONS=20
export TEMP_DIR="my_experiment"

# Run optimization
bash run.sh

Single Components:

# Run individual steps manually
export TEMP_DIR="temp"
export ITERATION=1
export LLM_MODEL="qwen2.5-coder:32b"

# Step 1: Generate hyperparameters
python hyper_optimizer-latest.py

# Step 2: Train model  
python model.py

# Step 3: Analyze results
export PREVIOUS_HYPERPARAMETERS=$(cat temp/hyperparameters.json)
python results_analyzer_latest.py

VS Code Integrated Terminal

Setup in VS Code:

Open project in VS Code
Open integrated terminal (`Ctrl+``)
Ensure you're in project root directory
Run commands as shown above

Recommended VS Code Settings:

# In VS Code terminal, set up your environment
export LLM_MODEL="qwen2.5-coder:32b"
export MAX_ITERATIONS=10

# Run optimization
bash run.sh

Monitoring Progress:

Watch temp/ directory for generated files
Check temp/hyperparameters.json for current parameters
Monitor temp/results.json after each training run
View plots in temp/trajectory_plots.png

Example Usage with Short Commands

Quick Start Example

# 1. Clone and setup
git clone <repository>
cd llm_hyperopt

# 2. Setup LLM
ollama pull qwen2.5-coder:32b

# 3. Create simple model (copy from template)
cp model_template.py model.py
# Edit model.py with your training code

# 4. Create config (copy from template)  
cp info_template.json info.json
# Edit info.json with your hyperparameters

# 5. Run optimization
export LLM_MODEL="qwen2.5-coder:32b"
bash run.sh

Custom Experiments

# Deep learning experiment (more iterations)
export LLM_MODEL="qwen2.5-coder:32b"
export MAX_ITERATIONS=20
export TEMP_DIR="dl_experiment"
bash run.sh

# Quick test (fewer iterations)
export MAX_ITERATIONS=3
export TEMP_DIR="quick_test"  
bash run.sh

# Different LLM model
export LLM_MODEL="llama3.1:8b"
export TEMP_DIR="llama_test"
bash run.sh

Results and Visualization

# After optimization completes
ls temp/                        # View generated files
cat temp/hyperparameters.json   # See final hyperparameters
python plot_trajectories.py     # Create optimization plots

Minimal Examples for Both Deep Learning and Classic ML Models

Deep Learning Example (PyTorch)

#!/usr/bin/env python3
import torch
import torch.nn as nn
import json
import os

# Load hyperparameters
temp_dir = os.environ.get("TEMP_DIR", "temp")
with open(f"{temp_dir}/hyperparameters.json", "r") as f:
    hp = json.load(f)

# Simple neural network
class Net(nn.Module):
    def __init__(self, input_size=784, hidden_size=128, num_classes=10):
        super().__init__()
        self.fc1 = nn.Linear(input_size, hidden_size)
        self.fc2 = nn.Linear(hidden_size, num_classes)
        self.dropout = nn.Dropout(hp.get("dropout_rate", 0.1))
    
    def forward(self, x):
        x = torch.relu(self.fc1(x))
        x = self.dropout(x)
        return self.fc2(x)

# Training loop
model = Net(hidden_size=hp.get("hidden_size", 128))
optimizer = torch.optim.Adam(model.parameters(), lr=hp["learning_rate"])

val_accuracies = []
epochs_list = []

for epoch in range(hp.get("epochs", 10)):
    # Your training code here
    model.train()
    # ... training loop ...
    
    # Validation
    model.eval()
    val_acc = 0.85 + epoch * 0.01  # Placeholder - use real validation
    
    val_accuracies.append(float(val_acc))
    epochs_list.append(epoch + 1)

# Save results
results = {
    "metrics": {"val_accuracy": val_accuracies},
    "epochs": epochs_list
}

with open(f"{temp_dir}/results.json", "w") as f:
    json.dump(results, f, indent=2)

print(f"Final val_accuracy: {val_accuracies[-1]:.6f}")

Classic ML Example (Scikit-learn)

#!/usr/bin/env python3
from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import cross_val_score
from sklearn.datasets import make_classification
import json
import os
import numpy as np

# Load hyperparameters
temp_dir = os.environ.get("TEMP_DIR", "temp")
with open(f"{temp_dir}/hyperparameters.json", "r") as f:
    hp = json.load(f)

# Generate sample data (replace with your dataset)
X, y = make_classification(n_samples=1000, n_features=20, n_classes=2, random_state=42)

# Create model with hyperparameters
model = RandomForestClassifier(
    n_estimators=hp.get("n_estimators", 100),
    max_depth=hp.get("max_depth", 10),
    min_samples_split=hp.get("min_samples_split", 2),
    random_state=42
)

# Cross-validation to simulate training epochs
cv_scores = cross_val_score(model, X, y, cv=5, scoring='accuracy')

# Simulate per-epoch improvement (for classic ML, this might be per-fold)
val_accuracies = []
epochs_list = []

for i, score in enumerate(cv_scores):
    val_accuracies.append(float(score))
    epochs_list.append(i + 1)

# Save results
results = {
    "metrics": {"val_accuracy": val_accuracies},
    "epochs": epochs_list
}

with open(f"{temp_dir}/results.json", "w") as f:
    json.dump(results, f, indent=2)

print(f"Final val_accuracy: {np.mean(val_accuracies):.6f}")

Configuration and Search Space Basics

Hyperparameter Types

Define hyperparameters in info.json with these types:

Float (Continuous):

"learning_rate": {
  "type": "float",
  "range": [1e-6, 0.1],
  "default": 0.001,
  "log_scale": true
}

Integer (Discrete):

"epochs": {
  "type": "integer", 
  "range": [10, 200],
  "default": 50
}

Categorical (String Choices):

"optimizer": {
  "type": "categorical",
  "values": ["adam", "sgd", "rmsprop"],
  "default": "adam"
}

Ordinal (Ordered Discrete):

"batch_size": {
  "type": "ordinal",
  "values": [16, 32, 64, 128, 256],
  "default": 32
}

Search Space Design

Good Practices:

{
  "hyperparameters": {
    "learning_rate": {
      "type": "float",
      "range": [1e-5, 0.1],        # Wide range for exploration
      "default": 0.001,
      "log_scale": true             # Use log scale for learning rates
    },
    "batch_size": {
      "type": "ordinal", 
      "values": [16, 32, 64, 128],  # Powers of 2 for efficiency
      "default": 32
    },
    "dropout_rate": {
      "type": "float",
      "range": [0.0, 0.5],          # Reasonable dropout range
      "default": 0.1
    }
  }
}

Avoid:

Ranges that are too narrow (limits exploration)
Too many hyperparameters at once (>6-8 can be overwhelming)
Unrealistic ranges (e.g., learning_rate up to 10.0)

Metrics and Objectives

Supported Primary Metrics

The system automatically recognizes these metrics and optimizes appropriately:

Classification Metrics (Higher is Better):

val_accuracy, accuracy - Classification accuracy
f1_score, precision, recall - Classification quality metrics
auc_roc, auc - Area under curve metrics

Loss Metrics (Lower is Better):

val_loss, loss - Training/validation loss
mse, rmse, mae - Regression error metrics
cross_entropy, binary_crossentropy - Cross-entropy losses

Advanced Metrics:

bleu, rouge (NLP) - Text generation quality
iou, miou (Computer Vision) - Segmentation quality
ndcg, map (Information Retrieval) - Ranking quality

Metric Configuration

{
  "metrics": {
    "primary_metric": "val_accuracy",
    "description": "Validation accuracy on held-out test set"
  }
}

The optimizer will:

Maximize metrics like accuracy, f1_score, auc
Minimize metrics like loss, mse, mae
Set appropriate target values automatically

Custom Metrics

For custom metrics, the system assumes higher is better. To use a custom loss-style metric:

{
  "metrics": {
    "primary_metric": "custom_loss", 
    "description": "My custom loss function"
  }
}

Then ensure your metric name contains "loss", "error", or similar keywords for automatic detection.

Reproducibility and Seeding

Seed Pattern

Use this pattern in your model scripts for reproducible results:

import random
import numpy as np

def set_seed(seed=42):
    """Set seeds for reproducibility"""
    random.seed(seed)
    np.random.seed(seed)
    
    # PyTorch (if using)
    try:
        import torch
        torch.manual_seed(seed)
        torch.cuda.manual_seed_all(seed)
        torch.backends.cudnn.deterministic = True
        torch.backends.cudnn.benchmark = False
    except ImportError:
        pass
    
    # TensorFlow (if using) 
    try:
        import tensorflow as tf
        tf.random.set_seed(seed)
    except ImportError:
        pass

# Call at the start of your script
set_seed(42)

Environment Variables for Reproducibility

# Set consistent seed across runs
export PYTHONHASHSEED=42
export CUDA_LAUNCH_BLOCKING=1

# For PyTorch
export PYTORCH_DETERMINISTIC=1

Hyperparameter Seeding

Include seed as a hyperparameter for full reproducibility:

{
  "hyperparameters": {
    "seed": {
      "type": "integer",
      "range": [0, 2147483647], 
      "default": 42
    }
  }
}

Then use in your model:

seed = hyperparams.get("seed", 42)
set_seed(seed)

Troubleshooting and Common Mistakes

File Path Issues

Problem: FileNotFoundError: hyperparameters.json not found

Solution:

# Always check if file exists and use defaults
import os
temp_dir = os.environ.get("TEMP_DIR", "temp")
hp_path = os.path.join(temp_dir, "hyperparameters.json")

if os.path.exists(hp_path):
    with open(hp_path, "r") as f:
        hyperparams = json.load(f)
else:
    print("Using default hyperparameters") 
    hyperparams = {"learning_rate": 0.001}  # Your defaults

Results Format Errors

Problem: KeyError: 'val_accuracy' in results.json

Solution: Ensure metric names match exactly:

# In info.json
"primary_metric": "val_accuracy"

# In your model's results.json - MUST match exactly
results = {
    "metrics": {
        "val_accuracy": [0.85, 0.87, 0.89],  # Exact match required
        "epochs": [1, 2, 3]
    }
}

Empty Results

Problem: Optimizer says "No trajectory data found"

Solution: Check your results format:

# Wrong - missing required structure
results = {"accuracy": 0.85}

# Right - proper structure
results = {
    "metrics": {"val_accuracy": [0.85]},  # Must be arrays
    "epochs": [1]
}

LLM Connection Issues

Problem: Error in API call: connection refused

Solution:

# Start Ollama service
ollama serve

# In another terminal, verify model is available
ollama list
ollama pull qwen2.5-coder:32b  # If not present

# Check environment variable
echo $LLM_MODEL
export LLM_MODEL="qwen2.5-coder:32b"

Hyperparameter Type Errors

Problem: TypeError: 'str' object cannot be interpreted as an integer

Solution: Always convert types:

# Wrong - direct use
batch_size = hyperparams["batch_size"]  # Might be string "32"

# Right - type conversion
batch_size = int(float(hyperparams["batch_size"]))  # Handles "32.0" -> 32
learning_rate = float(hyperparams["learning_rate"])

Memory Issues

Problem: CUDA out of memory during optimization

Solution:

# Set memory limits
export PYTORCH_CUDA_ALLOC_CONF=max_split_size_mb:128
export CUDA_VISIBLE_DEVICES=0

# Or in your model script
import torch
torch.cuda.empty_cache()  # Clear memory between runs

Permission Errors

Problem: PermissionError: cannot write to temp/

Solution:

# Create temp directory with proper permissions
mkdir -p temp
chmod 755 temp

# Or use a different directory
export TEMP_DIR="my_temp"
mkdir -p my_temp

Common Integration Checklist

Before running optimization, verify:

info.json exists with correct primary_metric
model.py reads from ${TEMP_DIR}/hyperparameters.json
model.py saves to ${TEMP_DIR}/results.json
Results format: {"metrics": {...}, "epochs": [...]}
Metric names match exactly between info.json and results.json
LLM model is running (ollama list)
Environment variables set (LLM_MODEL, etc.)
All required Python packages installed

Quick Test:

# Test your integration
export LLM_MODEL="qwen2.5-coder:32b"
export MAX_ITERATIONS=1
bash run.sh

This should complete one full cycle without errors. If successful, increase MAX_ITERATIONS for full optimization.

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
Templates and examples		Templates and examples
README.md		README.md
Recommenderengine.py		Recommenderengine.py
enhanced_plotter.py		enhanced_plotter.py
hyper_optimizer-latest.py		hyper_optimizer-latest.py
plot_trajectories.py		plot_trajectories.py
run.sh		run.sh
runer.sh		runer.sh

Folders and files

Latest commit

History

Repository files navigation

LLM Hyperparameter Tuning

Overview

How It Works at a High Level

Requirements and Installation

Prerequisites

LLM Setup

Quick Start

Step 1: Create Your Configuration

Step 2: Create Your Model Script

Step 3: Run Optimization

File Layout and What Each File Does

Core System Files

Configuration Files

Example Models

Working Directory

Integrating Your Own Model

Required Functions

Key Integration Points

Importing Hyperparameters Inside Model Scripts

Basic Pattern

Type-Safe Loading

Handling Unknown Parameters

Exporting Training Trajectories from Model Scripts

Required Results Format

Schema Requirements

Optional: Print Final Metric for Humans

How to Run from Command Line and VS Code Integrated Terminal

Command Line Usage

VS Code Integrated Terminal

Example Usage with Short Commands

Quick Start Example

Custom Experiments

Results and Visualization

Minimal Examples for Both Deep Learning and Classic ML Models

Deep Learning Example (PyTorch)

Classic ML Example (Scikit-learn)

Configuration and Search Space Basics

Hyperparameter Types

Search Space Design

Metrics and Objectives

Supported Primary Metrics

Metric Configuration

Custom Metrics

Reproducibility and Seeding

Seed Pattern

Environment Variables for Reproducibility

Hyperparameter Seeding

Troubleshooting and Common Mistakes

File Path Issues

Results Format Errors

Empty Results

LLM Connection Issues

Hyperparameter Type Errors

Memory Issues

Permission Errors

Common Integration Checklist

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages