Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 2 additions & 2 deletions CONTRIBUTING.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@ Thank you for your interest in contributing to KaiEvolve! This document provides
## Getting Started

1. Fork the repository
2. Clone your fork: `git clone https://github.com/firstbatchxyz/kai-evolve.git`
2. Clone your fork: `git clone https://github.com/YOUR-USERNAME/kai-evolve.git`
3. Install the package in development mode: `pip install -e ".[dev]"`
4. Set up environment for testing:
```bash
Expand Down Expand Up @@ -51,7 +51,7 @@ When developing features that interact with LLMs:
python -m unittest discover tests
```
5. Commit your changes: `git commit -m "Add your descriptive commit message"`
6. Push to your fork: `git push origin feature/your-feature-name`
6. Push to your fork: `git push origin feat-your-feature-name`
7. Submit a pull request to the main repository

## Adding Examples
Expand Down
2 changes: 1 addition & 1 deletion NOTICE
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
KaiEvolve
Copyright 2025 Dria
Copyright 2025-2026 Dria

This product is a derivative work of OpenEvolve
(https://github.com/codelion/openevolve), Copyright 2025 Asankhaya Sharma
Expand Down
56 changes: 53 additions & 3 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -98,6 +98,14 @@ cd kai-evolve
pip install -e .
```

Optional extras:
```bash
pip install -e ".[viewer]" # the `kai viewer` web UI (FastAPI + Jinja)
pip install -e ".[embeddings]" # local embedding models for novelty detection /
# strategy clustering (pulls in PyTorch; the
# default "api" embedding backend needs no extra)
```

### Quick Start

#### Setting up LLM Access
Expand Down Expand Up @@ -128,7 +136,9 @@ KaiEvolve uses the OpenAI SDK, which means it works with any LLM provider that s
This setup ensures KaiEvolve can work with any LLM provider - OpenAI, Anthropic, Google, Cohere, local models via Ollama/vLLM, or any OpenAI-compatible endpoint.

```python
import asyncio
import os

from kaievolve import KaiEvolve

# Ensure API key is set
Expand All @@ -143,8 +153,8 @@ evolve = KaiEvolve(
)

# Run the evolution
best_program = await evolve.run(iterations=1000)
print(f"Best program metrics:")
best_program = asyncio.run(evolve.run(iterations=1000))
print("Best program metrics:")
for name, value in best_program.metrics.items():
print(f" {name}: {value:.4f}")
```
Expand All @@ -157,6 +167,22 @@ KaiEvolve can also be run from the command line:
python kaievolve-run.py path/to/initial_program.py path/to/evaluator.py --config path/to/config.yaml --iterations 1000
```

For a concrete first run, try the bundled circle packing example:

```bash
pip install -r examples/circle_packing/requirements.txt # scipy + matplotlib
python kaievolve-run.py examples/circle_packing/initial_program.py \
examples/circle_packing/evaluator.py \
--config examples/circle_packing/config_phase_1.yaml \
--iterations 50
```

> **Note**: if you omit `--config`, KaiEvolve warns and falls back to built-in
> defaults: models `gpt-4o-mini`/`gpt-4o`, with the API base taken from the
> `OPENAI_API_BASE` environment variable (default:
> `https://openrouter.ai/api/v1`). Those model names only resolve if your
> endpoint actually serves them, so passing an explicit config is recommended.

### Resuming from Checkpoints

KaiEvolve automatically saves checkpoints at intervals specified by the `checkpoint_interval` config parameter (default is 10 iterations). You can resume an evolution run from a saved checkpoint:
Expand Down Expand Up @@ -279,6 +305,30 @@ the solution view. See [`skills/visualization/SKILL.md`](skills/visualization/SK
and the two reference visualizers under `examples/alphaevolve/` (circle packing and
the autocorrelation step function).

### Watching and steering a live run

Active runs append a per-iteration `progress.jsonl` feed to their output
directory; `kai monitor` and the web viewer read it, so both update in real
time while a run is going (and keep working on finished runs).

Two steering channels can redirect a run **while it is running**:

- **Human steering brief** — point `prompt.steering_brief_path` at a markdown
file and edit it mid-run: its contents are re-read every iteration and
injected into generation prompts, so directives like "focus on the inner
loop" or "avoid approach X" take effect without restarting or touching
config. `kai steer` sets, appends to, or shows that file from the terminal.
- **Research director** (opt-in) — an automated meta-agent that periodically
reads the population and writes a strategic directive into the same steering
channel. Enable it with `prompt.research_director_enabled: true`; optionally
set `research_director_interval` (defaults to the migration interval) and
`research_director_model` for a dedicated reasoning model (defaults to the
run's model roster). See
[`skills/research-director/SKILL.md`](skills/research-director/SKILL.md).

Both channels compose: the human brief and the director's directive are
injected together.

## Configuration

KaiEvolve is highly configurable with advanced options:
Expand Down Expand Up @@ -500,7 +550,7 @@ Below is the optimal packing found by KaiEvolve after 800 iterations:
### AlphaEvolve Benchmark Tasks

#### [examples/alphaevolve/](examples/alphaevolve/)
Five open mathematical problems from the AlphaEvolve paper (autocorrelation_C1, packing_circles_max_sum_of_radii, no_isosceles_triangles, happy_ending, unit_distances), each packaged as an initial program plus evaluator plus config ready to evolve.
Five open mathematical problems from the AlphaEvolve paper (autocorrelation_C1, packing_circles_max_sum_of_radii, no_isosceles_triangles, happy_ending, unit_distances), each packaged as an initial program plus evaluator. Run them with the suite config tuned for these tasks, [`configs/bench_alphaevolve.yaml`](configs/bench_alphaevolve.yaml).



Expand Down
59 changes: 58 additions & 1 deletion configs/default_config.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -71,10 +71,43 @@ prompt:
# Feature extraction and program labeling thresholds
# These control how the LLM perceives and categorizes programs
suggest_simplification_after_chars: 500 # Suggest simplifying if program exceeds this many characters
include_changes_under_chars: 100 # Include change descriptions in features if under this length
include_changes_under_chars: 100 # Include change descriptions in features if under this length
concise_implementation_max_lines: 10 # Label as "concise" if program has this many lines or fewer
comprehensive_implementation_min_lines: 50 # Label as "comprehensive" if program has this many lines or more

# HMRD (Hypothesis/Method/Result/Discussion) program summaries
require_hmrd: true # Prompts ask for HYPOTHESIS+METHOD tags; workers parse them and
# attach the 4-section summary to each successful program

# Literature review: a bounded, running "lab notebook" distilled from the HMRD
# summaries of prior programs, injected into every generation prompt
literature_review_enabled: false # Enable the literature review channel
literature_review_path: null # Review file path (null = <output_dir>/literature_review.md)
literature_review_max_chars: 4000 # Bound on review size (~1000 tokens)
literature_review_interval: null # Distill every N iterations (null = checkpoint_interval)

# Human steering brief: a markdown file you edit while a run is in progress;
# it is re-read every iteration and injected into prompts (see `kai steer`)
steering_brief_path: null # Path to the brief file (null = disabled)

# Research director: an automated meta-agent that periodically reads the
# population and writes a strategic directive into the steering channel
# (see skills/research-director/SKILL.md)
research_director_enabled: false # Enable the research director (opt-in)
research_director_interval: null # Fire every N iterations (null = database.migration_interval)
research_director_model: null # Dedicated model for the director (null = run roster)

# Strategy clustering + cluster bandit: clusters programs by the embedding of
# their HMRD summary into emergent strategies, then biases parent selection
# toward promising/under-explored clusters
strategy_clustering_enabled: false # Enable strategy clustering (opt-in)
strategy_num_clusters: 6 # Number of strategy clusters
strategy_epsilon: 0.05 # Selection-probability floor per cluster
strategy_exploration_c: 0.5 # UCB exploration coefficient
strategy_recency_bonus: 0.1 # Additive bonus for recently-improving clusters
strategy_recency_window: 2 # Checkpoints counted as "recent"
strategy_embedding_backend: "local" # "local" (needs `pip install kai-evolve[embeddings]`) or "api"

# Note: meta-prompting features are not yet implemented

# Database configuration
Expand All @@ -101,6 +134,12 @@ database:
exploitation_ratio: 0.7 # Ratio of exploitation vs random selection
# Note: diversity_metric is fixed to "edit_distance" (feature_based not implemented)

# Parent-selection shaping: parents are weighted by a robustly-scaled fitness
# and down-weighted by how many children they already produced, so the search
# neither collapses to uniform nor fixates on one heavily-mined lineage
parent_selection_lambda: 2.0 # Sigmoid steepness over normalized score
parent_selection_offspring_penalty: 0.5 # 0 disables the over-mining penalty

# Feature map dimensions for MAP-Elites
# Default if not specified: ["complexity", "diversity"]
#
Expand Down Expand Up @@ -135,6 +174,12 @@ evaluator:
timeout: 300 # Maximum evaluation time in seconds
max_retries: 3 # Maximum number of retries for evaluation

# Noise-aware fitness: for STOCHASTIC evaluators, a single evaluation is a
# noisy sample of a program's true quality. When > 1, each program is
# evaluated this many times and the numeric metrics are averaged (at N x the
# evaluation cost). Leave at 1 for deterministic evaluators.
re_evaluations: 1 # Evaluations per program, averaged

# Note: resource limits (memory_limit_mb, cpu_limit) are not yet implemented

# Evaluation strategies
Expand All @@ -151,3 +196,15 @@ evaluator:
# LLM-based feedback (experimental)
use_llm_feedback: false # Use LLM to evaluate code quality
llm_feedback_weight: 0.1 # Weight for LLM feedback in final score

# Novelty detection (optional): rejects near-duplicate programs by embedding
# similarity and asks the LLM to regenerate, keeping the population diverse
novelty:
enabled: false # Enable novelty detection
embedding_backend: "local" # "local" (sentence-transformers, needs
# `pip install kai-evolve[embeddings]`) or "api"
embedding_model: "all-MiniLM-L6-v2" # "api" backend: use e.g. "openai/text-embedding-3-small"
embedding_device: null # Device for local model: "cpu", "cuda", "mps" (null = auto)
similarity_threshold: 0.95 # Reject programs with similarity >= this
max_regeneration_attempts: 3 # Regeneration attempts for a novel program
temperature_increment: 0.15 # Temperature bump per regeneration attempt
8 changes: 6 additions & 2 deletions examples/alphaevolve/autocorrelation_C1/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -53,15 +53,19 @@ So C=2.0 → 0, C=1.5 → 0.5, C=1.0 → 1.0. The target band is roughly score

- `initial_program.py` — starting point: simple random hill-climber on a uniform sequence
- `evaluator.py` — canonical scorer; returns `combined_score`, `raw_C`, `sequence_length`, `runs_successfully`
- `config.yaml` — 30 iterations, 2 islands × 2 models, diff mode
- `visualize.py` — interactive solution view for the web viewer
- `README.md` — this file

Use the suite config tuned for the AlphaEvolve tasks,
[`configs/bench_alphaevolve.yaml`](../../../configs/bench_alphaevolve.yaml)
(diff mode, 2 islands × 2 models).

## Running

```bash
OPENAI_API_KEY=sk-or-... python scripts/bench.py run \
--tasks examples/alphaevolve/autocorrelation_C1 \
--config examples/alphaevolve/autocorrelation_C1/config.yaml \
--config configs/bench_alphaevolve.yaml \
--output-dir bench_results/autocorrelation_C1 \
--runs 3 --label autocorr
```
32 changes: 32 additions & 0 deletions examples/noisy_optimization/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,32 @@
# Noisy optimization (stochastic evaluator)

A minimal benchmark for **noise-aware fitness**: the candidate's `construct()`
returns an 8-number vector whose true quality is a deterministic concave
function of its distance to a hidden target — but `evaluate()` reports that
quality with fresh additive Gaussian noise on every call. Repeated evaluations
of the *same* program differ, so selection can be fooled by a single lucky
draw. This is exactly the regime `evaluator.re_evaluations` is for: when set
above 1, each program is evaluated that many times and the metrics are
averaged, shrinking the variance of the fitness estimate.

The evaluator runs `construct()` in a clean subprocess so candidates cannot
introspect the evaluator's memory and read the hidden target — they have to
actually optimize.

## Files

- `initial_program.py` — starting point: the all-zeros vector
- `evaluator.py` — stochastic scorer (`combined_score` = true quality + noise);
also exposes `true_evaluate()` so benchmarks can grade results noiselessly
- `config.yaml` — small run with `re_evaluations: 3`

## Running

```bash
python kaievolve-run.py examples/noisy_optimization/initial_program.py \
examples/noisy_optimization/evaluator.py \
--config examples/noisy_optimization/config.yaml
```

Compare with `re_evaluations: 1` in the config to see how single noisy samples
mislead selection.
33 changes: 33 additions & 0 deletions examples/noisy_optimization/config.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,33 @@
# Noisy-optimization benchmark: stochastic evaluator (fresh Gaussian noise on
# every call). re_evaluations > 1 averages repeated evaluations per program so
# selection is not fooled by lucky draws.

max_iterations: 30
checkpoint_interval: 10
log_level: "INFO"
random_seed: 42

diff_based_evolution: true

llm:
api_base: "https://openrouter.ai/api/v1"
models:
- name: "google/gemini-3.1-flash-lite"
weight: 1.0
- name: "deepseek/deepseek-v4-flash"
weight: 1.0
temperature: 0.7
timeout: 120
retries: 2

database:
population_size: 40
archive_size: 20
num_islands: 2

evaluator:
timeout: 30
cascade_evaluation: false
re_evaluations: 3 # average 3 noisy samples per program (the point of this example)
parallel_evaluations: 2
use_llm_feedback: false
6 changes: 6 additions & 0 deletions experiments/README.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,11 @@
# KaiEvolve AlgoTune Benchmarking System

> **Note**: this directory is benchmarking infrastructure and result data
> inherited from upstream [OpenEvolve](https://github.com/codelion/openevolve);
> the recorded results predate the KaiEvolve fork. The scripts expect the
> `ALGOTUNE_PATH` / `EVOLVEBENCH_PATH` environment variables to point at your
> local checkouts.

This directory contains the benchmarking infrastructure for running KaiEvolve on the AlgoTune dataset.

## Directory Structure
Expand Down
6 changes: 4 additions & 2 deletions experiments/evolvebench/run_evolvebench.py
Original file line number Diff line number Diff line change
Expand Up @@ -19,8 +19,10 @@
import time
from pathlib import Path

# EvolveBench path (adjust if needed)
EVOLVEBENCH_PATH = Path("/Users/codelion/Documents/GitLab/evolve-bench")
# EvolveBench path
if not os.environ.get("EVOLVEBENCH_PATH"):
sys.exit("Set the EVOLVEBENCH_PATH environment variable to your evolve-bench checkout")
EVOLVEBENCH_PATH = Path(os.environ["EVOLVEBENCH_PATH"])
KAIEVOLVE_PATH = Path(__file__).parent.parent.parent
CONFIGS_DIR = Path(__file__).parent / "configs"
RESULTS_DIR = Path(__file__).parent / "results"
Expand Down
4 changes: 3 additions & 1 deletion experiments/validate_initial_programs.py
Original file line number Diff line number Diff line change
Expand Up @@ -17,7 +17,9 @@
from typing import Dict, List, Any, Tuple

# Add AlgoTune to Python path
ALGOTUNE_PATH = os.environ.get('ALGOTUNE_PATH', '/Users/codelion/Documents/GitHub/AlgoTune')
ALGOTUNE_PATH = os.environ.get('ALGOTUNE_PATH')
if not ALGOTUNE_PATH:
sys.exit("Set the ALGOTUNE_PATH environment variable to your AlgoTune checkout")
sys.path.insert(0, ALGOTUNE_PATH)

from AlgoTuneTasks.factory import TaskFactory
Expand Down
12 changes: 10 additions & 2 deletions pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -18,9 +18,11 @@ dependencies = [
"openai>=1.0.0",
"pyyaml>=6.0",
"numpy>=1.22.0",
# kmeans2 for strategy clustering (kaievolve/strategy_clusters.py); was
# previously pulled in transitively via sentence-transformers.
"scipy>=1.7.0",
"tqdm>=4.64.0",
"flask",
"sentence-transformers>=2.2.0",
# Provides `bson.ObjectId`, used for program ids (kaievolve/utils/id_generator.py).
"pymongo>=4.0.0",
"requests>=2.25.0",
# `kai` CLI: Rich rendering + questionary pick-lists + plotext charts.
Expand All @@ -42,6 +44,12 @@ viewer = [
"uvicorn>=0.32",
"jinja2>=3.1",
]
# Local embedding models for novelty detection / strategy clustering
# (embedding_backend: "local"). Off by default; the "api" backend needs no
# extra install. Heavy: pulls in PyTorch.
embeddings = [
"sentence-transformers>=2.2.0",
]

[tool.black]
line-length = 100
Expand Down
2 changes: 1 addition & 1 deletion scripts/requirements.txt
Original file line number Diff line number Diff line change
@@ -1 +1 @@
flask
plotly
Loading