CLI ergonomics for iterative exploration (status/watch, --init-from, stop-on-target, compare, eval)

## Summary

CLI ergonomics for **iterative exploration** — running many cycles, reseeding from the best, chasing a target metric, and comparing configs. These come from sustained hands-on use during a multi-cycle benchmark/record-chasing push, where the same manual workarounds kept recurring. The top 3 are the ones that bit repeatedly; #4–#5 are smaller wins.

Related: #26 (more detailed evolution stats) — #1 below is the *reader/consumer* side of that data, complementary rather than overlapping.

## Proposed additions

### 1. `kaievolve status <run_dir>` / `--watch`  (priority)
Today, checking on a live run means hand-writing `grep -hoE "raw_C=..."` loops against `logs/*.log`. The web viewer already has live streaming (`live.json` / progress feed), but there is no CLI equivalent.

A one-shot `status` and a follow-mode `watch` that print:
- iteration `N / total`, throughput (iters/min), elapsed
- current best metric + which program id / island produced it
- last few improvements (metric trajectory)
- per-island best (island spread)
- whether the optional research director has fired and its latest directive

```
kaievolve status  bench_results/<run>/run_0
kaievolve watch   bench_results/<run>/run_0     # tails the live feed
```

### 2. `--init-from <run_dir>`  (priority)
Chaining cycles (evolve → reseed from best → evolve) currently requires manually locating `run_0/best/best_program.py` and copying it over `initial_program.py`. A flag that pulls a previous run's best program as the new seed makes this a one-liner:

```
kaievolve-run.py --init-from bench_results/<prev_run>/run_0  evaluator.py --config cfg.yaml
```

### 3. `--target <metric><op><value>` + stop-on-reach  (priority)
When chasing a known target there's no way to early-exit and clearly flag success; runs just continue and breakthroughs have to be detected by grep. A target predicate plus an unambiguous `TARGET REACHED` marker in the log/exit closes the loop:

```
kaievolve-run.py ... --target "raw_C<=1.5029" --stop-on-target
```

### 4. `kaievolve compare <dir1> <dir2> ...`
A/B'ing configs currently means digging through `best_program_info.json` across run dirs. A side-by-side of best metric / config diff / iterations / wall-time would make config comparison trivial.

### 5. `kaievolve eval <program.py> <evaluator.py> [--config cfg.yaml]`
Scoring a single candidate program against the canonical evaluator currently requires throwaway `python -c` snippets. A one-shot eval command standardizes it (useful for sanity-checking seeds and inspecting individual programs).

## Why
All five surfaced during real iterative use. #1 and #2 especially map to common workflows — "how's the run doing" and "reseed from the best program" — that today devolve into log-grepping and manual file-copying.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CLI ergonomics for iterative exploration (status/watch, --init-from, stop-on-target, compare, eval) #28

Summary

Proposed additions

1. `kaievolve status <run_dir>` / `--watch` (priority)

2. `--init-from <run_dir>` (priority)

3. `--target <metric><op><value>` + stop-on-reach (priority)

4. `kaievolve compare <dir1> <dir2> ...`

5. `kaievolve eval <program.py> <evaluator.py> [--config cfg.yaml]`

Why

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

CLI ergonomics for iterative exploration (status/watch, --init-from, stop-on-target, compare, eval) #28

Description

Summary

Proposed additions

1. kaievolve status <run_dir> / --watch (priority)

2. --init-from <run_dir> (priority)

3. --target <metric><op><value> + stop-on-reach (priority)

4. kaievolve compare <dir1> <dir2> ...

5. kaievolve eval <program.py> <evaluator.py> [--config cfg.yaml]

Why

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions

1. `kaievolve status <run_dir>` / `--watch` (priority)

2. `--init-from <run_dir>` (priority)

3. `--target <metric><op><value>` + stop-on-reach (priority)

4. `kaievolve compare <dir1> <dir2> ...`

5. `kaievolve eval <program.py> <evaluator.py> [--config cfg.yaml]`