Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion .claude-plugin/marketplace.json
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@
"email": "mho@looplia.run"
},
"metadata": {
"version": "1.6.0",
"version": "1.7.0",
"description": "Skills for product planning, project scaffolding, and agentic development workflows."
},
"plugins": [
Expand Down
44 changes: 44 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -21,6 +21,50 @@ bug fixes → **patch**; removing or breaking a skill contract → **major**.

_Nothing yet._

## [1.7.0] - 2026-06-11

**Goal-driven autopilot driver**: `/aep-autopilot` now keeps itself ticking with
a **goal driver** by default — the host-native `/goal` primitive (Claude Code
v2.1.139+ and Codex's experimental `goals` feature) — which re-fires a tick when
there is work and **self-terminates** when the current layer is complete or a
human-judgment gate is hit. The fixed-interval `/loop` driver is retained as a
fallback (`--loop`). Only the _driver_ changes; the 7-step CHECK→ACT tick, the
delegated cheap CHECK, the signals protocol, and the orchestrator boundary are
unchanged. Decision record:
[`docs/decisions/goal-driven-autopilot.md`](docs/decisions/goal-driven-autopilot.md).

### Added

- **Goal driver (default)** — `/aep-autopilot` with no `--loop` flag builds a
one-layer goal condition and drives it via `/goal`: "layer N complete (all
stories merged + wrapped) OR autopilot paused". Scoped to **one layer per run**
— it stops at the layer boundary so the human runs the layer gate / `/aep-reflect`
and re-invokes for the next layer. Native and near-symmetric on both hosts
(Claude Code Haiku-evaluator Stop hook; Codex persisted thread goal with
`token_budget`).
- **Per-tick surface + wait tail (step ⑦, goal driver only)** — each tick
surfaces a **signals-only** `AUTOPILOT …` status line for the goal evaluator to
judge (boundary-safe: never workspace code), then waits a bounded **floor**
(default `5m`, `--floor`) before ending the turn. The floor is the anti-hot-loop
mechanism — CC uses `Monitor` with a hard timeout (a raw foreground `sleep` is
blocked in a turn); Codex uses shell `sleep`.
- **Goal-driver flags** — `--floor <dur>` (per-tick wait floor) and
`--max-turns <n>` (runaway backstop, default `200`); on Codex a `token_budget`
is set as the hard wall (soft-stops to `budget_limited`).

### Changed

- **`/aep-autopilot` default behavior** — the default driver is now goal-driven
and self-terminating per layer. The command surface is unchanged; `--loop
<interval>` selects the prior fixed-interval behavior exactly.
- **`/aep-autopilot stop`** — cancels whichever driver is active (`/goal clear`
for the goal driver; `/loop` cancel or cron/launchd removal for the loop
driver).
- **Driver × backend compatibility** (executor `backends.md`) — the long-lived
session class now names two in-session variants, `/goal` (default) and `/loop`;
the goal driver is in-session-only, so the cron/launchd row stays the
`/loop` / `codex exec` path.

## [1.6.0] - 2026-06-10

**Native-first executor backends**: launch/dispatch/build/autopilot/wrap now
Expand Down
144 changes: 144 additions & 0 deletions docs/decisions/goal-driven-autopilot.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,144 @@
# Goal-Driven Autopilot Driver

> Decision record (2026-06-11). Adds a **goal-driven driver** as the default way
> `/aep-autopilot` keeps itself ticking, alongside the existing fixed-interval
> `/loop` driver (retained as a fallback). This changes only the **driver** — how
> the next tick is triggered and when autopilot stops. The 7-step CHECK→ACT tick,
> the delegated cheap CHECK, the signals protocol, the executor abstraction, and
> the orchestrator boundary are all unchanged.

---

## 1. The Problem

Autopilot's real objective is goal-shaped — _"complete the current layer"_ — but
it was expressed as a **fixed-interval poll loop** (`/loop 5m /aep-autopilot
tick`). The 5-minute interval is not intrinsic to the design; it is just a
throttle on polling the workspace `signals/` files, and termination was ad-hoc
("all layers complete → stop"). Two consequences:

- **Idle ticks.** A layer with nothing to do still wakes every 5 minutes.
- **No native stop.** The loop runs forever until a human stops it; there is no
first-class "this layer is done, hand back" signal.

Both hosts now ship a native primitive for exactly this shape.

## 2. Key Research Findings (2026-06, verified against live runtimes)

1. **Both Claude Code and Codex ship a `/goal` primitive**, with near-identical
UX and semantics:
- **Claude Code `/goal`** (v2.1.139+): a session-scoped completion condition.
After each turn a small fast model (Haiku) judges the condition **against
the conversation transcript**; "no" auto-starts the next turn, "yes" clears
the goal. It is a wrapper around a session-scoped prompt-based Stop hook. The
evaluator **cannot run tools** — it judges only what the turn surfaced.
Headless via `claude -p "/goal …"`.
- **Codex `goals`** (experimental flag `goals=true`, confirmed in
`codex features list` on CLI 0.130): a **persisted thread-level goal**
(SQLite state machine with statuses `active / paused / budget_limited /
complete`, events `thread/goal/updated|cleared`, re-check `Goal unmet (…)`).
First-class **`token_budget`** → on exhaustion a soft stop to
`budget_limited` with a wrap-up steer. Continues only when the thread is
**idle**, the goal is active, and within budget. Controls: `/goal`,
`/goal check|pause|resume|clear`.
2. **Both are turn/idle-driven, not time-interval-driven** — the next turn fires
the instant the previous one ends (CC) or the thread goes idle (Codex). This
is the opposite of `/loop`'s fixed cadence.
3. **A raw foreground `sleep` is blocked inside a Claude Code turn**; the
sanctioned bounded wait is the `Monitor` tool with a hard timeout. Codex has
no such restriction.
4. **Background-completion push notifications are unreliable cross-platform**
(Claude Code issue #21048), so an event-driven "wake on worker completion"
design cannot be the foundation.

## 3. The Decision

Add `executor`-level concept of a **goal driver** and make it the autopilot
default; keep the **loop driver** as a fallback selected by `--loop <interval>`.

### 3.1 Driver selection

- No `--loop` flag → **goal driver** (default).
- `--loop <interval>` → **loop driver** (the prior behavior, unchanged).
- `--floor <dur>` (goal driver only) → per-tick wait floor, default `5m`.
- `--max-turns <n>` (goal driver only) → runaway backstop, default `200`.

### 3.2 Scope — one layer per goal

A single `/goal` run drives the **current layer** to completion, then stops and
hands back to the human to run the layer gate / `/aep-reflect` and re-invoke
`/aep-autopilot` for the next layer. This maps 1:1 onto AEP's existing
pause-at-gate boundaries (layer gates, `.5` calibration layers, outcome
contracts) but replaces the infinite loop with crisp termination.

### 3.3 Goal condition

The skill builds the condition for layer `N`:

> Layer `N` is COMPLETE — every story is `status=completed` and its worktree is
> wrapped (none remain under `.feature-workspaces/`), per the surfaced AUTOPILOT
> status line — **OR** autopilot has entered `status=paused` requiring human
> input. Judge **only** from the surfaced status line. Each turn run exactly one
> `/aep-autopilot tick` then end the turn. Stop after `max-turns` turns.

The two stop conditions (`layer_complete` / `paused`) are exactly the conditions
that today end or pause the loop, so pause/escalation semantics are unchanged:
hard-pause escalations (design ambiguity, layer-gate failure, outcome contract,
repeated failure) flip `paused=true` and the goal stops; a single relayed
human-gate does **not** halt the layer (other workspaces keep progressing).

### 3.4 The three adaptations to the tick

The tick body is unchanged; the goal driver adds a tail to step ⑦ (skipped under
the loop driver):

1. **Surface a signals-only status line** into the transcript so the evaluator
can judge `layer_complete` / `paused`. It contains no workspace code or file
contents, so the **orchestrator boundary holds** — the evaluator never reads
code.
2. **Wait the per-tick floor** before ending the turn (CC `Monitor`-with-timeout;
Codex `sleep`). This is mandatory: without it `/goal` re-fires instantly and
hot-loops. Early-wake on a signal change is an allowed optimization, not a
correctness dependency — the timeout alone guarantees a stable cadence.
3. **Keep the delegated cheap CHECK** — the goal session is long-lived, so the
token-isolation reason for running the read-heavy CHECK in a Haiku subagent /
`codex exec` one-shot still holds.

### 3.5 Driver × backend compatibility

The goal driver is a **long-lived in-session** driver, so it is compatible with
every steerable mode (claude-team / claude-bg / codex-subagent / codex-exec /
legacy) — the same row as `/loop`. It **cannot** drive a fresh-session-per-tick
cron/launchd scheduler; fully-unattended OS-scheduled runs therefore stay on the
`/loop` / `codex exec` path. This is why the loop driver is retained, not removed.

## 4. Why reliability over cleverness

The chosen wait mechanism is a **bounded wait with a hard timeout floor**, not
push notifications or event-watch APIs. Rationale: fewest moving parts, no
silent cross-platform failure modes (#21048), and a cadence **behaviorally
identical to the `/loop 5m` users already trust** — the only change is
self-termination. "Slow is fine; stable matters more."

## 5. Alternatives Considered

- **Push-when-available + watch fallback.** Pure event-driven, zero idle turns,
but two code paths and an unreliable push channel (#21048). Rejected for
reliability.
- **Replace `/loop` entirely.** Rejected: `/goal` is in-session-only, so the
cron/launchd unattended path needs `/loop`/`codex exec`. Keeping both also de-
risks hosts on older CLI versions without `/goal`.
- **Whole-backlog goal scope.** More autonomy, but it must pause at every
outcome-contract / gate boundary anyway; per-layer scope gives cleaner
termination and matches existing pause points.

## 6. Consequences

- Autopilot stops on its own at the layer boundary; no more idle 5-minute ticks
within a quiet layer (it still waits the floor, but never loops past `done`).
- Identical command surface (`/aep-autopilot`); the default driver changes
underneath. `--loop` restores the exact prior behavior.
- New invariant: the per-tick status line must stay signals-only. Enforced by
guardrail in the autopilot skill.
- Codex requires the `goals` feature enabled (`--enable goals`); Claude Code
requires v2.1.139+. Hosts lacking `/goal` fall back to `--loop`.
17 changes: 10 additions & 7 deletions docs/workflow/autonomous-loop.md
Original file line number Diff line number Diff line change
Expand Up @@ -64,17 +64,18 @@ topology:
/aep-autopilot
```

One command. Initializes state, runs the first tick, and starts a recurring loop (default: every 5 minutes). Use `--loop` to customize the interval:
One command. Initializes state, runs the first tick, and keeps ticking until the **current layer is complete**, then stops on its own. The default driver is **goal-driven** (`/goal`, native on Claude Code and Codex): it re-fires a tick when there is work and self-terminates at the layer boundary so you can run the layer gate / `/aep-reflect` and re-invoke for the next layer. Use `--floor` to tune the goal driver's per-tick wait (default 5m), or `--loop` for the fixed-interval fallback driver:

```
/aep-autopilot --loop 10m
/aep-autopilot --floor 3m # goal driver, tighter per-tick wait
/aep-autopilot --loop 10m # fixed-interval loop driver instead
```

---

## The Autonomous Cycle

Autopilot runs as a tick-based state machine. Every 5 minutes, a tick executes:
Autopilot runs as a tick-based state machine. Under the **goal driver** each tick runs, surfaces a signals-only status line, and waits a floor (default 5m) before the `/goal` evaluator decides whether to re-fire or stop (layer complete / paused); under the **loop driver** a tick fires every interval. Either way, a tick executes:

```
/aep-autopilot tick
Expand All @@ -98,15 +99,17 @@ Autopilot runs as a tick-based state machine. Every 5 minutes, a tick executes:
│ ├─ /aep-dispatch + /aep-launch for top story
│ └─ Check layer completion → gate test if needed
└─ ⑦ Write state + history + status
└─ ⑦ Write state + history + status (+ surface status line + wait, goal driver)
```

The cycle continues until:
The cycle continues until (goal driver — scoped to the current layer):

- All stories in all layers complete → stop and notify human
- All stories in the **current layer** complete → goal met, stop and hand back to the human (run the layer gate / `/aep-reflect`, then re-invoke `/aep-autopilot` for the next layer)
- Design escalation → pause and notify human
- Layer gate fails → pause and notify human
- No stories ready → wait for in-progress workspaces
- No stories ready → wait for in-progress workspaces (the per-tick floor)

(Under the loop driver the cadence is the fixed interval and it runs across layers until you `/aep-autopilot stop`.)

---

Expand Down
Loading
Loading