firstbatchxyz · aktasbatuhan · Jun 11, 2026 · Jun 10, 2026 · Jun 10, 2026
diff --git a/CONTRIBUTING.md b/CONTRIBUTING.md
@@ -5,7 +5,7 @@ Thank you for your interest in contributing to KaiEvolve! This document provides
 ## Getting Started
 
 1. Fork the repository
-2. Clone your fork: `git clone https://github.com/firstbatchxyz/kai-evolve.git`
+2. Clone your fork: `git clone https://github.com/YOUR-USERNAME/kai-evolve.git`
 3. Install the package in development mode: `pip install -e ".[dev]"`
 4. Set up environment for testing:
    ```bash
@@ -51,7 +51,7 @@ When developing features that interact with LLMs:
    python -m unittest discover tests
    ```
 5. Commit your changes: `git commit -m "Add your descriptive commit message"`
-6. Push to your fork: `git push origin feature/your-feature-name`
+6. Push to your fork: `git push origin feat-your-feature-name`
 7. Submit a pull request to the main repository
 
 ## Adding Examples

diff --git a/NOTICE b/NOTICE
@@ -1,5 +1,5 @@
 KaiEvolve
-Copyright 2025 Dria
+Copyright 2025-2026 Dria
 
 This product is a derivative work of OpenEvolve
 (https://github.com/codelion/openevolve), Copyright 2025 Asankhaya Sharma

diff --git a/README.md b/README.md
@@ -98,6 +98,14 @@ cd kai-evolve
 pip install -e .
 ```
 
+Optional extras:
+```bash
+pip install -e ".[viewer]"      # the `kai viewer` web UI (FastAPI + Jinja)
+pip install -e ".[embeddings]"  # local embedding models for novelty detection /
+                                # strategy clustering (pulls in PyTorch; the
+                                # default "api" embedding backend needs no extra)
+```
+
 ### Quick Start
 
 #### Setting up LLM Access
@@ -128,7 +136,9 @@ KaiEvolve uses the OpenAI SDK, which means it works with any LLM provider that s
 This setup ensures KaiEvolve can work with any LLM provider - OpenAI, Anthropic, Google, Cohere, local models via Ollama/vLLM, or any OpenAI-compatible endpoint.
 
 ```python
+import asyncio
 import os
+
 from kaievolve import KaiEvolve
 
 # Ensure API key is set
@@ -143,8 +153,8 @@ evolve = KaiEvolve(
 )
 
 # Run the evolution
-best_program = await evolve.run(iterations=1000)
-print(f"Best program metrics:")
+best_program = asyncio.run(evolve.run(iterations=1000))
+print("Best program metrics:")
 for name, value in best_program.metrics.items():
     print(f"  {name}: {value:.4f}")
 ```
@@ -157,6 +167,22 @@ KaiEvolve can also be run from the command line:
 python kaievolve-run.py path/to/initial_program.py path/to/evaluator.py --config path/to/config.yaml --iterations 1000
 ```
 
+For a concrete first run, try the bundled circle packing example:
+
+```bash
+pip install -r examples/circle_packing/requirements.txt  # scipy + matplotlib
+python kaievolve-run.py examples/circle_packing/initial_program.py \
+  examples/circle_packing/evaluator.py \
+  --config examples/circle_packing/config_phase_1.yaml \
+  --iterations 50
+```
+
+> **Note**: if you omit `--config`, KaiEvolve warns and falls back to built-in
+> defaults: models `gpt-4o-mini`/`gpt-4o`, with the API base taken from the
+> `OPENAI_API_BASE` environment variable (default:
+> `https://openrouter.ai/api/v1`). Those model names only resolve if your
+> endpoint actually serves them, so passing an explicit config is recommended.
+
 ### Resuming from Checkpoints
 
 KaiEvolve automatically saves checkpoints at intervals specified by the `checkpoint_interval` config parameter (default is 10 iterations). You can resume an evolution run from a saved checkpoint:
@@ -279,6 +305,30 @@ the solution view. See [`skills/visualization/SKILL.md`](skills/visualization/SK
 and the two reference visualizers under `examples/alphaevolve/` (circle packing and
 the autocorrelation step function).
 
+### Watching and steering a live run
+
+Active runs append a per-iteration `progress.jsonl` feed to their output
+directory; `kai monitor` and the web viewer read it, so both update in real
+time while a run is going (and keep working on finished runs).
+
+Two steering channels can redirect a run **while it is running**:
+
+- **Human steering brief** — point `prompt.steering_brief_path` at a markdown
+  file and edit it mid-run: its contents are re-read every iteration and
+  injected into generation prompts, so directives like "focus on the inner
+  loop" or "avoid approach X" take effect without restarting or touching
+  config. `kai steer` sets, appends to, or shows that file from the terminal.
+- **Research director** (opt-in) — an automated meta-agent that periodically
+  reads the population and writes a strategic directive into the same steering
+  channel. Enable it with `prompt.research_director_enabled: true`; optionally
+  set `research_director_interval` (defaults to the migration interval) and
+  `research_director_model` for a dedicated reasoning model (defaults to the
+  run's model roster). See
+  [`skills/research-director/SKILL.md`](skills/research-director/SKILL.md).
+
+Both channels compose: the human brief and the director's directive are
+injected together.
+
 ## Configuration
 
 KaiEvolve is highly configurable with advanced options:
@@ -500,7 +550,7 @@ Below is the optimal packing found by KaiEvolve after 800 iterations:
 ### AlphaEvolve Benchmark Tasks
 
 #### [examples/alphaevolve/](examples/alphaevolve/)
-Five open mathematical problems from the AlphaEvolve paper (autocorrelation_C1, packing_circles_max_sum_of_radii, no_isosceles_triangles, happy_ending, unit_distances), each packaged as an initial program plus evaluator plus config ready to evolve.
+Five open mathematical problems from the AlphaEvolve paper (autocorrelation_C1, packing_circles_max_sum_of_radii, no_isosceles_triangles, happy_ending, unit_distances), each packaged as an initial program plus evaluator. Run them with the suite config tuned for these tasks, [`configs/bench_alphaevolve.yaml`](configs/bench_alphaevolve.yaml).
 
 
 

diff --git a/configs/default_config.yaml b/configs/default_config.yaml
@@ -71,10 +71,43 @@ prompt:
   # Feature extraction and program labeling thresholds
   # These control how the LLM perceives and categorizes programs
   suggest_simplification_after_chars: 500     # Suggest simplifying if program exceeds this many characters
-  include_changes_under_chars: 100           # Include change descriptions in features if under this length  
+  include_changes_under_chars: 100           # Include change descriptions in features if under this length
   concise_implementation_max_lines: 10        # Label as "concise" if program has this many lines or fewer
   comprehensive_implementation_min_lines: 50  # Label as "comprehensive" if program has this many lines or more
 
+  # HMRD (Hypothesis/Method/Result/Discussion) program summaries
+  require_hmrd: true                  # Prompts ask for HYPOTHESIS+METHOD tags; workers parse them and
+                                      # attach the 4-section summary to each successful program
+
+  # Literature review: a bounded, running "lab notebook" distilled from the HMRD
+  # summaries of prior programs, injected into every generation prompt
+  literature_review_enabled: false    # Enable the literature review channel
+  literature_review_path: null        # Review file path (null = <output_dir>/literature_review.md)
+  literature_review_max_chars: 4000   # Bound on review size (~1000 tokens)
+  literature_review_interval: null    # Distill every N iterations (null = checkpoint_interval)
+
+  # Human steering brief: a markdown file you edit while a run is in progress;
+  # it is re-read every iteration and injected into prompts (see `kai steer`)
+  steering_brief_path: null           # Path to the brief file (null = disabled)
+
+  # Research director: an automated meta-agent that periodically reads the
+  # population and writes a strategic directive into the steering channel
+  # (see skills/research-director/SKILL.md)
+  research_director_enabled: false    # Enable the research director (opt-in)
+  research_director_interval: null    # Fire every N iterations (null = database.migration_interval)
+  research_director_model: null       # Dedicated model for the director (null = run roster)
+
+  # Strategy clustering + cluster bandit: clusters programs by the embedding of
+  # their HMRD summary into emergent strategies, then biases parent selection
+  # toward promising/under-explored clusters
+  strategy_clustering_enabled: false  # Enable strategy clustering (opt-in)
+  strategy_num_clusters: 6            # Number of strategy clusters
+  strategy_epsilon: 0.05              # Selection-probability floor per cluster
+  strategy_exploration_c: 0.5         # UCB exploration coefficient
+  strategy_recency_bonus: 0.1         # Additive bonus for recently-improving clusters
+  strategy_recency_window: 2          # Checkpoints counted as "recent"
+  strategy_embedding_backend: "local" # "local" (needs `pip install kai-evolve[embeddings]`) or "api"
+
   # Note: meta-prompting features are not yet implemented
 
 # Database configuration
@@ -101,6 +134,12 @@ database:
   exploitation_ratio: 0.7             # Ratio of exploitation vs random selection
   # Note: diversity_metric is fixed to "edit_distance" (feature_based not implemented)
 
+  # Parent-selection shaping: parents are weighted by a robustly-scaled fitness
+  # and down-weighted by how many children they already produced, so the search
+  # neither collapses to uniform nor fixates on one heavily-mined lineage
+  parent_selection_lambda: 2.0            # Sigmoid steepness over normalized score
+  parent_selection_offspring_penalty: 0.5 # 0 disables the over-mining penalty
+
   # Feature map dimensions for MAP-Elites
   # Default if not specified: ["complexity", "diversity"]
   # 
@@ -135,6 +174,12 @@ evaluator:
   timeout: 300                        # Maximum evaluation time in seconds
   max_retries: 3                      # Maximum number of retries for evaluation
 
+  # Noise-aware fitness: for STOCHASTIC evaluators, a single evaluation is a
+  # noisy sample of a program's true quality. When > 1, each program is
+  # evaluated this many times and the numeric metrics are averaged (at N x the
+  # evaluation cost). Leave at 1 for deterministic evaluators.
+  re_evaluations: 1                   # Evaluations per program, averaged
+
   # Note: resource limits (memory_limit_mb, cpu_limit) are not yet implemented
 
   # Evaluation strategies
@@ -151,3 +196,15 @@ evaluator:
   # LLM-based feedback (experimental)
   use_llm_feedback: false             # Use LLM to evaluate code quality
   llm_feedback_weight: 0.1            # Weight for LLM feedback in final score
+
+# Novelty detection (optional): rejects near-duplicate programs by embedding
+# similarity and asks the LLM to regenerate, keeping the population diverse
+novelty:
+  enabled: false                      # Enable novelty detection
+  embedding_backend: "local"          # "local" (sentence-transformers, needs
+                                      # `pip install kai-evolve[embeddings]`) or "api"
+  embedding_model: "all-MiniLM-L6-v2" # "api" backend: use e.g. "openai/text-embedding-3-small"
+  embedding_device: null              # Device for local model: "cpu", "cuda", "mps" (null = auto)
+  similarity_threshold: 0.95          # Reject programs with similarity >= this
+  max_regeneration_attempts: 3        # Regeneration attempts for a novel program
+  temperature_increment: 0.15         # Temperature bump per regeneration attempt
diff --git a/examples/alphaevolve/autocorrelation_C1/README.md b/examples/alphaevolve/autocorrelation_C1/README.md
@@ -53,15 +53,19 @@ So C=2.0 → 0, C=1.5 → 0.5, C=1.0 → 1.0. The target band is roughly score
 
 - `initial_program.py` — starting point: simple random hill-climber on a uniform sequence
 - `evaluator.py` — canonical scorer; returns `combined_score`, `raw_C`, `sequence_length`, `runs_successfully`
-- `config.yaml` — 30 iterations, 2 islands × 2 models, diff mode
+- `visualize.py` — interactive solution view for the web viewer
 - `README.md` — this file
 
+Use the suite config tuned for the AlphaEvolve tasks,
+[`configs/bench_alphaevolve.yaml`](../../../configs/bench_alphaevolve.yaml)
+(diff mode, 2 islands × 2 models).
+
 ## Running
 
 ```bash
 OPENAI_API_KEY=sk-or-... python scripts/bench.py run \
   --tasks examples/alphaevolve/autocorrelation_C1 \
-  --config examples/alphaevolve/autocorrelation_C1/config.yaml \
+  --config configs/bench_alphaevolve.yaml \
   --output-dir bench_results/autocorrelation_C1 \
   --runs 3 --label autocorr
 ```
diff --git a/examples/noisy_optimization/README.md b/examples/noisy_optimization/README.md
@@ -0,0 +1,32 @@
+# Noisy optimization (stochastic evaluator)
+
+A minimal benchmark for **noise-aware fitness**: the candidate's `construct()`
+returns an 8-number vector whose true quality is a deterministic concave
+function of its distance to a hidden target — but `evaluate()` reports that
+quality with fresh additive Gaussian noise on every call. Repeated evaluations
+of the *same* program differ, so selection can be fooled by a single lucky
+draw. This is exactly the regime `evaluator.re_evaluations` is for: when set
+above 1, each program is evaluated that many times and the metrics are
+averaged, shrinking the variance of the fitness estimate.
+
+The evaluator runs `construct()` in a clean subprocess so candidates cannot
+introspect the evaluator's memory and read the hidden target — they have to
+actually optimize.
+
+## Files
+
+- `initial_program.py` — starting point: the all-zeros vector
+- `evaluator.py` — stochastic scorer (`combined_score` = true quality + noise);
+  also exposes `true_evaluate()` so benchmarks can grade results noiselessly
+- `config.yaml` — small run with `re_evaluations: 3`
+
+## Running
+
+```bash
+python kaievolve-run.py examples/noisy_optimization/initial_program.py \
+  examples/noisy_optimization/evaluator.py \
+  --config examples/noisy_optimization/config.yaml
+```
+
+Compare with `re_evaluations: 1` in the config to see how single noisy samples
+mislead selection.
diff --git a/examples/noisy_optimization/config.yaml b/examples/noisy_optimization/config.yaml
@@ -0,0 +1,33 @@
+# Noisy-optimization benchmark: stochastic evaluator (fresh Gaussian noise on
+# every call). re_evaluations > 1 averages repeated evaluations per program so
+# selection is not fooled by lucky draws.
+
+max_iterations: 30
+checkpoint_interval: 10
+log_level: "INFO"
+random_seed: 42
+
+diff_based_evolution: true
+
+llm:
+  api_base: "https://openrouter.ai/api/v1"
+  models:
+    - name: "google/gemini-3.1-flash-lite"
+      weight: 1.0
+    - name: "deepseek/deepseek-v4-flash"
+      weight: 1.0
+  temperature: 0.7
+  timeout: 120
+  retries: 2
+
+database:
+  population_size: 40
+  archive_size: 20
+  num_islands: 2
+
+evaluator:
+  timeout: 30
+  cascade_evaluation: false
+  re_evaluations: 3      # average 3 noisy samples per program (the point of this example)
+  parallel_evaluations: 2
+  use_llm_feedback: false
diff --git a/experiments/README.md b/experiments/README.md
@@ -1,5 +1,11 @@
 # KaiEvolve AlgoTune Benchmarking System
 
+> **Note**: this directory is benchmarking infrastructure and result data
+> inherited from upstream [OpenEvolve](https://github.com/codelion/openevolve);
+> the recorded results predate the KaiEvolve fork. The scripts expect the
+> `ALGOTUNE_PATH` / `EVOLVEBENCH_PATH` environment variables to point at your
+> local checkouts.
+
 This directory contains the benchmarking infrastructure for running KaiEvolve on the AlgoTune dataset.
 
 ## Directory Structure

diff --git a/experiments/evolvebench/run_evolvebench.py b/experiments/evolvebench/run_evolvebench.py
@@ -19,8 +19,10 @@
 import time
 from pathlib import Path
 
-# EvolveBench path (adjust if needed)
-EVOLVEBENCH_PATH = Path("/Users/codelion/Documents/GitLab/evolve-bench")
+# EvolveBench path
+if not os.environ.get("EVOLVEBENCH_PATH"):
+    sys.exit("Set the EVOLVEBENCH_PATH environment variable to your evolve-bench checkout")
+EVOLVEBENCH_PATH = Path(os.environ["EVOLVEBENCH_PATH"])
 KAIEVOLVE_PATH = Path(__file__).parent.parent.parent
 CONFIGS_DIR = Path(__file__).parent / "configs"
 RESULTS_DIR = Path(__file__).parent / "results"

diff --git a/experiments/validate_initial_programs.py b/experiments/validate_initial_programs.py
@@ -17,7 +17,9 @@
 from typing import Dict, List, Any, Tuple
 
 # Add AlgoTune to Python path
-ALGOTUNE_PATH = os.environ.get('ALGOTUNE_PATH', '/Users/codelion/Documents/GitHub/AlgoTune')
+ALGOTUNE_PATH = os.environ.get('ALGOTUNE_PATH')
+if not ALGOTUNE_PATH:
+    sys.exit("Set the ALGOTUNE_PATH environment variable to your AlgoTune checkout")
 sys.path.insert(0, ALGOTUNE_PATH)
 
 from AlgoTuneTasks.factory import TaskFactory

diff --git a/pyproject.toml b/pyproject.toml
@@ -18,9 +18,11 @@ dependencies = [
     "openai>=1.0.0",
     "pyyaml>=6.0",
     "numpy>=1.22.0",
+    # kmeans2 for strategy clustering (kaievolve/strategy_clusters.py); was
+    # previously pulled in transitively via sentence-transformers.
+    "scipy>=1.7.0",
     "tqdm>=4.64.0",
-    "flask",
-    "sentence-transformers>=2.2.0",
+    # Provides `bson.ObjectId`, used for program ids (kaievolve/utils/id_generator.py).
     "pymongo>=4.0.0",
     "requests>=2.25.0",
     # `kai` CLI: Rich rendering + questionary pick-lists + plotext charts.
@@ -42,6 +44,12 @@ viewer = [
     "uvicorn>=0.32",
     "jinja2>=3.1",
 ]
+# Local embedding models for novelty detection / strategy clustering
+# (embedding_backend: "local"). Off by default; the "api" backend needs no
+# extra install. Heavy: pulls in PyTorch.
+embeddings = [
+    "sentence-transformers>=2.2.0",
+]
 
 [tool.black]
 line-length = 100

diff --git a/scripts/requirements.txt b/scripts/requirements.txt
@@ -1 +1 @@
-flask
+plotly