diff --git a/CONTRIBUTING.md b/CONTRIBUTING.md index 31d0ec3..a2e43bd 100644 --- a/CONTRIBUTING.md +++ b/CONTRIBUTING.md @@ -5,7 +5,7 @@ Thank you for your interest in contributing to KaiEvolve! This document provides ## Getting Started 1. Fork the repository -2. Clone your fork: `git clone https://github.com/firstbatchxyz/kai-evolve.git` +2. Clone your fork: `git clone https://github.com/YOUR-USERNAME/kai-evolve.git` 3. Install the package in development mode: `pip install -e ".[dev]"` 4. Set up environment for testing: ```bash @@ -51,7 +51,7 @@ When developing features that interact with LLMs: python -m unittest discover tests ``` 5. Commit your changes: `git commit -m "Add your descriptive commit message"` -6. Push to your fork: `git push origin feature/your-feature-name` +6. Push to your fork: `git push origin feat-your-feature-name` 7. Submit a pull request to the main repository ## Adding Examples diff --git a/NOTICE b/NOTICE index 847c307..ad38aef 100644 --- a/NOTICE +++ b/NOTICE @@ -1,5 +1,5 @@ KaiEvolve -Copyright 2025 Dria +Copyright 2025-2026 Dria This product is a derivative work of OpenEvolve (https://github.com/codelion/openevolve), Copyright 2025 Asankhaya Sharma diff --git a/README.md b/README.md index 1e1b947..8cfac65 100644 --- a/README.md +++ b/README.md @@ -98,6 +98,14 @@ cd kai-evolve pip install -e . ``` +Optional extras: +```bash +pip install -e ".[viewer]" # the `kai viewer` web UI (FastAPI + Jinja) +pip install -e ".[embeddings]" # local embedding models for novelty detection / + # strategy clustering (pulls in PyTorch; the + # default "api" embedding backend needs no extra) +``` + ### Quick Start #### Setting up LLM Access @@ -128,7 +136,9 @@ KaiEvolve uses the OpenAI SDK, which means it works with any LLM provider that s This setup ensures KaiEvolve can work with any LLM provider - OpenAI, Anthropic, Google, Cohere, local models via Ollama/vLLM, or any OpenAI-compatible endpoint. ```python +import asyncio import os + from kaievolve import KaiEvolve # Ensure API key is set @@ -143,8 +153,8 @@ evolve = KaiEvolve( ) # Run the evolution -best_program = await evolve.run(iterations=1000) -print(f"Best program metrics:") +best_program = asyncio.run(evolve.run(iterations=1000)) +print("Best program metrics:") for name, value in best_program.metrics.items(): print(f" {name}: {value:.4f}") ``` @@ -157,6 +167,22 @@ KaiEvolve can also be run from the command line: python kaievolve-run.py path/to/initial_program.py path/to/evaluator.py --config path/to/config.yaml --iterations 1000 ``` +For a concrete first run, try the bundled circle packing example: + +```bash +pip install -r examples/circle_packing/requirements.txt # scipy + matplotlib +python kaievolve-run.py examples/circle_packing/initial_program.py \ + examples/circle_packing/evaluator.py \ + --config examples/circle_packing/config_phase_1.yaml \ + --iterations 50 +``` + +> **Note**: if you omit `--config`, KaiEvolve warns and falls back to built-in +> defaults: models `gpt-4o-mini`/`gpt-4o`, with the API base taken from the +> `OPENAI_API_BASE` environment variable (default: +> `https://openrouter.ai/api/v1`). Those model names only resolve if your +> endpoint actually serves them, so passing an explicit config is recommended. + ### Resuming from Checkpoints KaiEvolve automatically saves checkpoints at intervals specified by the `checkpoint_interval` config parameter (default is 10 iterations). You can resume an evolution run from a saved checkpoint: @@ -279,6 +305,30 @@ the solution view. See [`skills/visualization/SKILL.md`](skills/visualization/SK and the two reference visualizers under `examples/alphaevolve/` (circle packing and the autocorrelation step function). +### Watching and steering a live run + +Active runs append a per-iteration `progress.jsonl` feed to their output +directory; `kai monitor` and the web viewer read it, so both update in real +time while a run is going (and keep working on finished runs). + +Two steering channels can redirect a run **while it is running**: + +- **Human steering brief** — point `prompt.steering_brief_path` at a markdown + file and edit it mid-run: its contents are re-read every iteration and + injected into generation prompts, so directives like "focus on the inner + loop" or "avoid approach X" take effect without restarting or touching + config. `kai steer` sets, appends to, or shows that file from the terminal. +- **Research director** (opt-in) — an automated meta-agent that periodically + reads the population and writes a strategic directive into the same steering + channel. Enable it with `prompt.research_director_enabled: true`; optionally + set `research_director_interval` (defaults to the migration interval) and + `research_director_model` for a dedicated reasoning model (defaults to the + run's model roster). See + [`skills/research-director/SKILL.md`](skills/research-director/SKILL.md). + +Both channels compose: the human brief and the director's directive are +injected together. + ## Configuration KaiEvolve is highly configurable with advanced options: @@ -500,7 +550,7 @@ Below is the optimal packing found by KaiEvolve after 800 iterations: ### AlphaEvolve Benchmark Tasks #### [examples/alphaevolve/](examples/alphaevolve/) -Five open mathematical problems from the AlphaEvolve paper (autocorrelation_C1, packing_circles_max_sum_of_radii, no_isosceles_triangles, happy_ending, unit_distances), each packaged as an initial program plus evaluator plus config ready to evolve. +Five open mathematical problems from the AlphaEvolve paper (autocorrelation_C1, packing_circles_max_sum_of_radii, no_isosceles_triangles, happy_ending, unit_distances), each packaged as an initial program plus evaluator. Run them with the suite config tuned for these tasks, [`configs/bench_alphaevolve.yaml`](configs/bench_alphaevolve.yaml). diff --git a/configs/default_config.yaml b/configs/default_config.yaml index a706c4f..7ec5725 100644 --- a/configs/default_config.yaml +++ b/configs/default_config.yaml @@ -71,10 +71,43 @@ prompt: # Feature extraction and program labeling thresholds # These control how the LLM perceives and categorizes programs suggest_simplification_after_chars: 500 # Suggest simplifying if program exceeds this many characters - include_changes_under_chars: 100 # Include change descriptions in features if under this length + include_changes_under_chars: 100 # Include change descriptions in features if under this length concise_implementation_max_lines: 10 # Label as "concise" if program has this many lines or fewer comprehensive_implementation_min_lines: 50 # Label as "comprehensive" if program has this many lines or more + # HMRD (Hypothesis/Method/Result/Discussion) program summaries + require_hmrd: true # Prompts ask for HYPOTHESIS+METHOD tags; workers parse them and + # attach the 4-section summary to each successful program + + # Literature review: a bounded, running "lab notebook" distilled from the HMRD + # summaries of prior programs, injected into every generation prompt + literature_review_enabled: false # Enable the literature review channel + literature_review_path: null # Review file path (null = /literature_review.md) + literature_review_max_chars: 4000 # Bound on review size (~1000 tokens) + literature_review_interval: null # Distill every N iterations (null = checkpoint_interval) + + # Human steering brief: a markdown file you edit while a run is in progress; + # it is re-read every iteration and injected into prompts (see `kai steer`) + steering_brief_path: null # Path to the brief file (null = disabled) + + # Research director: an automated meta-agent that periodically reads the + # population and writes a strategic directive into the steering channel + # (see skills/research-director/SKILL.md) + research_director_enabled: false # Enable the research director (opt-in) + research_director_interval: null # Fire every N iterations (null = database.migration_interval) + research_director_model: null # Dedicated model for the director (null = run roster) + + # Strategy clustering + cluster bandit: clusters programs by the embedding of + # their HMRD summary into emergent strategies, then biases parent selection + # toward promising/under-explored clusters + strategy_clustering_enabled: false # Enable strategy clustering (opt-in) + strategy_num_clusters: 6 # Number of strategy clusters + strategy_epsilon: 0.05 # Selection-probability floor per cluster + strategy_exploration_c: 0.5 # UCB exploration coefficient + strategy_recency_bonus: 0.1 # Additive bonus for recently-improving clusters + strategy_recency_window: 2 # Checkpoints counted as "recent" + strategy_embedding_backend: "local" # "local" (needs `pip install kai-evolve[embeddings]`) or "api" + # Note: meta-prompting features are not yet implemented # Database configuration @@ -101,6 +134,12 @@ database: exploitation_ratio: 0.7 # Ratio of exploitation vs random selection # Note: diversity_metric is fixed to "edit_distance" (feature_based not implemented) + # Parent-selection shaping: parents are weighted by a robustly-scaled fitness + # and down-weighted by how many children they already produced, so the search + # neither collapses to uniform nor fixates on one heavily-mined lineage + parent_selection_lambda: 2.0 # Sigmoid steepness over normalized score + parent_selection_offspring_penalty: 0.5 # 0 disables the over-mining penalty + # Feature map dimensions for MAP-Elites # Default if not specified: ["complexity", "diversity"] # @@ -135,6 +174,12 @@ evaluator: timeout: 300 # Maximum evaluation time in seconds max_retries: 3 # Maximum number of retries for evaluation + # Noise-aware fitness: for STOCHASTIC evaluators, a single evaluation is a + # noisy sample of a program's true quality. When > 1, each program is + # evaluated this many times and the numeric metrics are averaged (at N x the + # evaluation cost). Leave at 1 for deterministic evaluators. + re_evaluations: 1 # Evaluations per program, averaged + # Note: resource limits (memory_limit_mb, cpu_limit) are not yet implemented # Evaluation strategies @@ -151,3 +196,15 @@ evaluator: # LLM-based feedback (experimental) use_llm_feedback: false # Use LLM to evaluate code quality llm_feedback_weight: 0.1 # Weight for LLM feedback in final score + +# Novelty detection (optional): rejects near-duplicate programs by embedding +# similarity and asks the LLM to regenerate, keeping the population diverse +novelty: + enabled: false # Enable novelty detection + embedding_backend: "local" # "local" (sentence-transformers, needs + # `pip install kai-evolve[embeddings]`) or "api" + embedding_model: "all-MiniLM-L6-v2" # "api" backend: use e.g. "openai/text-embedding-3-small" + embedding_device: null # Device for local model: "cpu", "cuda", "mps" (null = auto) + similarity_threshold: 0.95 # Reject programs with similarity >= this + max_regeneration_attempts: 3 # Regeneration attempts for a novel program + temperature_increment: 0.15 # Temperature bump per regeneration attempt diff --git a/examples/alphaevolve/autocorrelation_C1/README.md b/examples/alphaevolve/autocorrelation_C1/README.md index 6aabf40..3a48621 100644 --- a/examples/alphaevolve/autocorrelation_C1/README.md +++ b/examples/alphaevolve/autocorrelation_C1/README.md @@ -53,15 +53,19 @@ So C=2.0 → 0, C=1.5 → 0.5, C=1.0 → 1.0. The target band is roughly score - `initial_program.py` — starting point: simple random hill-climber on a uniform sequence - `evaluator.py` — canonical scorer; returns `combined_score`, `raw_C`, `sequence_length`, `runs_successfully` -- `config.yaml` — 30 iterations, 2 islands × 2 models, diff mode +- `visualize.py` — interactive solution view for the web viewer - `README.md` — this file +Use the suite config tuned for the AlphaEvolve tasks, +[`configs/bench_alphaevolve.yaml`](../../../configs/bench_alphaevolve.yaml) +(diff mode, 2 islands × 2 models). + ## Running ```bash OPENAI_API_KEY=sk-or-... python scripts/bench.py run \ --tasks examples/alphaevolve/autocorrelation_C1 \ - --config examples/alphaevolve/autocorrelation_C1/config.yaml \ + --config configs/bench_alphaevolve.yaml \ --output-dir bench_results/autocorrelation_C1 \ --runs 3 --label autocorr ``` diff --git a/examples/noisy_optimization/README.md b/examples/noisy_optimization/README.md new file mode 100644 index 0000000..3af6a21 --- /dev/null +++ b/examples/noisy_optimization/README.md @@ -0,0 +1,32 @@ +# Noisy optimization (stochastic evaluator) + +A minimal benchmark for **noise-aware fitness**: the candidate's `construct()` +returns an 8-number vector whose true quality is a deterministic concave +function of its distance to a hidden target — but `evaluate()` reports that +quality with fresh additive Gaussian noise on every call. Repeated evaluations +of the *same* program differ, so selection can be fooled by a single lucky +draw. This is exactly the regime `evaluator.re_evaluations` is for: when set +above 1, each program is evaluated that many times and the metrics are +averaged, shrinking the variance of the fitness estimate. + +The evaluator runs `construct()` in a clean subprocess so candidates cannot +introspect the evaluator's memory and read the hidden target — they have to +actually optimize. + +## Files + +- `initial_program.py` — starting point: the all-zeros vector +- `evaluator.py` — stochastic scorer (`combined_score` = true quality + noise); + also exposes `true_evaluate()` so benchmarks can grade results noiselessly +- `config.yaml` — small run with `re_evaluations: 3` + +## Running + +```bash +python kaievolve-run.py examples/noisy_optimization/initial_program.py \ + examples/noisy_optimization/evaluator.py \ + --config examples/noisy_optimization/config.yaml +``` + +Compare with `re_evaluations: 1` in the config to see how single noisy samples +mislead selection. diff --git a/examples/noisy_optimization/config.yaml b/examples/noisy_optimization/config.yaml new file mode 100644 index 0000000..6ccf2da --- /dev/null +++ b/examples/noisy_optimization/config.yaml @@ -0,0 +1,33 @@ +# Noisy-optimization benchmark: stochastic evaluator (fresh Gaussian noise on +# every call). re_evaluations > 1 averages repeated evaluations per program so +# selection is not fooled by lucky draws. + +max_iterations: 30 +checkpoint_interval: 10 +log_level: "INFO" +random_seed: 42 + +diff_based_evolution: true + +llm: + api_base: "https://openrouter.ai/api/v1" + models: + - name: "google/gemini-3.1-flash-lite" + weight: 1.0 + - name: "deepseek/deepseek-v4-flash" + weight: 1.0 + temperature: 0.7 + timeout: 120 + retries: 2 + +database: + population_size: 40 + archive_size: 20 + num_islands: 2 + +evaluator: + timeout: 30 + cascade_evaluation: false + re_evaluations: 3 # average 3 noisy samples per program (the point of this example) + parallel_evaluations: 2 + use_llm_feedback: false diff --git a/experiments/README.md b/experiments/README.md index 8451b65..039a2c6 100644 --- a/experiments/README.md +++ b/experiments/README.md @@ -1,5 +1,11 @@ # KaiEvolve AlgoTune Benchmarking System +> **Note**: this directory is benchmarking infrastructure and result data +> inherited from upstream [OpenEvolve](https://github.com/codelion/openevolve); +> the recorded results predate the KaiEvolve fork. The scripts expect the +> `ALGOTUNE_PATH` / `EVOLVEBENCH_PATH` environment variables to point at your +> local checkouts. + This directory contains the benchmarking infrastructure for running KaiEvolve on the AlgoTune dataset. ## Directory Structure diff --git a/experiments/evolvebench/run_evolvebench.py b/experiments/evolvebench/run_evolvebench.py index 03b5643..724b90b 100644 --- a/experiments/evolvebench/run_evolvebench.py +++ b/experiments/evolvebench/run_evolvebench.py @@ -19,8 +19,10 @@ import time from pathlib import Path -# EvolveBench path (adjust if needed) -EVOLVEBENCH_PATH = Path("/Users/codelion/Documents/GitLab/evolve-bench") +# EvolveBench path +if not os.environ.get("EVOLVEBENCH_PATH"): + sys.exit("Set the EVOLVEBENCH_PATH environment variable to your evolve-bench checkout") +EVOLVEBENCH_PATH = Path(os.environ["EVOLVEBENCH_PATH"]) KAIEVOLVE_PATH = Path(__file__).parent.parent.parent CONFIGS_DIR = Path(__file__).parent / "configs" RESULTS_DIR = Path(__file__).parent / "results" diff --git a/experiments/validate_initial_programs.py b/experiments/validate_initial_programs.py index b61228e..cd1276e 100644 --- a/experiments/validate_initial_programs.py +++ b/experiments/validate_initial_programs.py @@ -17,7 +17,9 @@ from typing import Dict, List, Any, Tuple # Add AlgoTune to Python path -ALGOTUNE_PATH = os.environ.get('ALGOTUNE_PATH', '/Users/codelion/Documents/GitHub/AlgoTune') +ALGOTUNE_PATH = os.environ.get('ALGOTUNE_PATH') +if not ALGOTUNE_PATH: + sys.exit("Set the ALGOTUNE_PATH environment variable to your AlgoTune checkout") sys.path.insert(0, ALGOTUNE_PATH) from AlgoTuneTasks.factory import TaskFactory diff --git a/pyproject.toml b/pyproject.toml index 6bb24f7..b521169 100644 --- a/pyproject.toml +++ b/pyproject.toml @@ -18,9 +18,11 @@ dependencies = [ "openai>=1.0.0", "pyyaml>=6.0", "numpy>=1.22.0", + # kmeans2 for strategy clustering (kaievolve/strategy_clusters.py); was + # previously pulled in transitively via sentence-transformers. + "scipy>=1.7.0", "tqdm>=4.64.0", - "flask", - "sentence-transformers>=2.2.0", + # Provides `bson.ObjectId`, used for program ids (kaievolve/utils/id_generator.py). "pymongo>=4.0.0", "requests>=2.25.0", # `kai` CLI: Rich rendering + questionary pick-lists + plotext charts. @@ -42,6 +44,12 @@ viewer = [ "uvicorn>=0.32", "jinja2>=3.1", ] +# Local embedding models for novelty detection / strategy clustering +# (embedding_backend: "local"). Off by default; the "api" backend needs no +# extra install. Heavy: pulls in PyTorch. +embeddings = [ + "sentence-transformers>=2.2.0", +] [tool.black] line-length = 100 diff --git a/scripts/requirements.txt b/scripts/requirements.txt index 8ab6294..ff0243c 100644 --- a/scripts/requirements.txt +++ b/scripts/requirements.txt @@ -1 +1 @@ -flask \ No newline at end of file +plotly