diff --git a/.claude-plugin/marketplace.json b/.claude-plugin/marketplace.json index a64a982..486b4d9 100644 --- a/.claude-plugin/marketplace.json +++ b/.claude-plugin/marketplace.json @@ -5,7 +5,7 @@ "email": "mho@looplia.run" }, "metadata": { - "version": "1.6.0", + "version": "1.7.0", "description": "Skills for product planning, project scaffolding, and agentic development workflows." }, "plugins": [ diff --git a/CHANGELOG.md b/CHANGELOG.md index e109948..79d93a6 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -21,6 +21,50 @@ bug fixes → **patch**; removing or breaking a skill contract → **major**. _Nothing yet._ +## [1.7.0] - 2026-06-11 + +**Goal-driven autopilot driver**: `/aep-autopilot` now keeps itself ticking with +a **goal driver** by default — the host-native `/goal` primitive (Claude Code +v2.1.139+ and Codex's experimental `goals` feature) — which re-fires a tick when +there is work and **self-terminates** when the current layer is complete or a +human-judgment gate is hit. The fixed-interval `/loop` driver is retained as a +fallback (`--loop`). Only the _driver_ changes; the 7-step CHECK→ACT tick, the +delegated cheap CHECK, the signals protocol, and the orchestrator boundary are +unchanged. Decision record: +[`docs/decisions/goal-driven-autopilot.md`](docs/decisions/goal-driven-autopilot.md). + +### Added + +- **Goal driver (default)** — `/aep-autopilot` with no `--loop` flag builds a + one-layer goal condition and drives it via `/goal`: "layer N complete (all + stories merged + wrapped) OR autopilot paused". Scoped to **one layer per run** + — it stops at the layer boundary so the human runs the layer gate / `/aep-reflect` + and re-invokes for the next layer. Native and near-symmetric on both hosts + (Claude Code Haiku-evaluator Stop hook; Codex persisted thread goal with + `token_budget`). +- **Per-tick surface + wait tail (step ⑦, goal driver only)** — each tick + surfaces a **signals-only** `AUTOPILOT …` status line for the goal evaluator to + judge (boundary-safe: never workspace code), then waits a bounded **floor** + (default `5m`, `--floor`) before ending the turn. The floor is the anti-hot-loop + mechanism — CC uses `Monitor` with a hard timeout (a raw foreground `sleep` is + blocked in a turn); Codex uses shell `sleep`. +- **Goal-driver flags** — `--floor ` (per-tick wait floor) and + `--max-turns ` (runaway backstop, default `200`); on Codex a `token_budget` + is set as the hard wall (soft-stops to `budget_limited`). + +### Changed + +- **`/aep-autopilot` default behavior** — the default driver is now goal-driven + and self-terminating per layer. The command surface is unchanged; `--loop +` selects the prior fixed-interval behavior exactly. +- **`/aep-autopilot stop`** — cancels whichever driver is active (`/goal clear` + for the goal driver; `/loop` cancel or cron/launchd removal for the loop + driver). +- **Driver × backend compatibility** (executor `backends.md`) — the long-lived + session class now names two in-session variants, `/goal` (default) and `/loop`; + the goal driver is in-session-only, so the cron/launchd row stays the + `/loop` / `codex exec` path. + ## [1.6.0] - 2026-06-10 **Native-first executor backends**: launch/dispatch/build/autopilot/wrap now diff --git a/docs/decisions/goal-driven-autopilot.md b/docs/decisions/goal-driven-autopilot.md new file mode 100644 index 0000000..94a075a --- /dev/null +++ b/docs/decisions/goal-driven-autopilot.md @@ -0,0 +1,144 @@ +# Goal-Driven Autopilot Driver + +> Decision record (2026-06-11). Adds a **goal-driven driver** as the default way +> `/aep-autopilot` keeps itself ticking, alongside the existing fixed-interval +> `/loop` driver (retained as a fallback). This changes only the **driver** — how +> the next tick is triggered and when autopilot stops. The 7-step CHECK→ACT tick, +> the delegated cheap CHECK, the signals protocol, the executor abstraction, and +> the orchestrator boundary are all unchanged. + +--- + +## 1. The Problem + +Autopilot's real objective is goal-shaped — _"complete the current layer"_ — but +it was expressed as a **fixed-interval poll loop** (`/loop 5m /aep-autopilot +tick`). The 5-minute interval is not intrinsic to the design; it is just a +throttle on polling the workspace `signals/` files, and termination was ad-hoc +("all layers complete → stop"). Two consequences: + +- **Idle ticks.** A layer with nothing to do still wakes every 5 minutes. +- **No native stop.** The loop runs forever until a human stops it; there is no + first-class "this layer is done, hand back" signal. + +Both hosts now ship a native primitive for exactly this shape. + +## 2. Key Research Findings (2026-06, verified against live runtimes) + +1. **Both Claude Code and Codex ship a `/goal` primitive**, with near-identical + UX and semantics: + - **Claude Code `/goal`** (v2.1.139+): a session-scoped completion condition. + After each turn a small fast model (Haiku) judges the condition **against + the conversation transcript**; "no" auto-starts the next turn, "yes" clears + the goal. It is a wrapper around a session-scoped prompt-based Stop hook. The + evaluator **cannot run tools** — it judges only what the turn surfaced. + Headless via `claude -p "/goal …"`. + - **Codex `goals`** (experimental flag `goals=true`, confirmed in + `codex features list` on CLI 0.130): a **persisted thread-level goal** + (SQLite state machine with statuses `active / paused / budget_limited / +complete`, events `thread/goal/updated|cleared`, re-check `Goal unmet (…)`). + First-class **`token_budget`** → on exhaustion a soft stop to + `budget_limited` with a wrap-up steer. Continues only when the thread is + **idle**, the goal is active, and within budget. Controls: `/goal`, + `/goal check|pause|resume|clear`. +2. **Both are turn/idle-driven, not time-interval-driven** — the next turn fires + the instant the previous one ends (CC) or the thread goes idle (Codex). This + is the opposite of `/loop`'s fixed cadence. +3. **A raw foreground `sleep` is blocked inside a Claude Code turn**; the + sanctioned bounded wait is the `Monitor` tool with a hard timeout. Codex has + no such restriction. +4. **Background-completion push notifications are unreliable cross-platform** + (Claude Code issue #21048), so an event-driven "wake on worker completion" + design cannot be the foundation. + +## 3. The Decision + +Add `executor`-level concept of a **goal driver** and make it the autopilot +default; keep the **loop driver** as a fallback selected by `--loop `. + +### 3.1 Driver selection + +- No `--loop` flag → **goal driver** (default). +- `--loop ` → **loop driver** (the prior behavior, unchanged). +- `--floor ` (goal driver only) → per-tick wait floor, default `5m`. +- `--max-turns ` (goal driver only) → runaway backstop, default `200`. + +### 3.2 Scope — one layer per goal + +A single `/goal` run drives the **current layer** to completion, then stops and +hands back to the human to run the layer gate / `/aep-reflect` and re-invoke +`/aep-autopilot` for the next layer. This maps 1:1 onto AEP's existing +pause-at-gate boundaries (layer gates, `.5` calibration layers, outcome +contracts) but replaces the infinite loop with crisp termination. + +### 3.3 Goal condition + +The skill builds the condition for layer `N`: + +> Layer `N` is COMPLETE — every story is `status=completed` and its worktree is +> wrapped (none remain under `.feature-workspaces/`), per the surfaced AUTOPILOT +> status line — **OR** autopilot has entered `status=paused` requiring human +> input. Judge **only** from the surfaced status line. Each turn run exactly one +> `/aep-autopilot tick` then end the turn. Stop after `max-turns` turns. + +The two stop conditions (`layer_complete` / `paused`) are exactly the conditions +that today end or pause the loop, so pause/escalation semantics are unchanged: +hard-pause escalations (design ambiguity, layer-gate failure, outcome contract, +repeated failure) flip `paused=true` and the goal stops; a single relayed +human-gate does **not** halt the layer (other workspaces keep progressing). + +### 3.4 The three adaptations to the tick + +The tick body is unchanged; the goal driver adds a tail to step ⑦ (skipped under +the loop driver): + +1. **Surface a signals-only status line** into the transcript so the evaluator + can judge `layer_complete` / `paused`. It contains no workspace code or file + contents, so the **orchestrator boundary holds** — the evaluator never reads + code. +2. **Wait the per-tick floor** before ending the turn (CC `Monitor`-with-timeout; + Codex `sleep`). This is mandatory: without it `/goal` re-fires instantly and + hot-loops. Early-wake on a signal change is an allowed optimization, not a + correctness dependency — the timeout alone guarantees a stable cadence. +3. **Keep the delegated cheap CHECK** — the goal session is long-lived, so the + token-isolation reason for running the read-heavy CHECK in a Haiku subagent / + `codex exec` one-shot still holds. + +### 3.5 Driver × backend compatibility + +The goal driver is a **long-lived in-session** driver, so it is compatible with +every steerable mode (claude-team / claude-bg / codex-subagent / codex-exec / +legacy) — the same row as `/loop`. It **cannot** drive a fresh-session-per-tick +cron/launchd scheduler; fully-unattended OS-scheduled runs therefore stay on the +`/loop` / `codex exec` path. This is why the loop driver is retained, not removed. + +## 4. Why reliability over cleverness + +The chosen wait mechanism is a **bounded wait with a hard timeout floor**, not +push notifications or event-watch APIs. Rationale: fewest moving parts, no +silent cross-platform failure modes (#21048), and a cadence **behaviorally +identical to the `/loop 5m` users already trust** — the only change is +self-termination. "Slow is fine; stable matters more." + +## 5. Alternatives Considered + +- **Push-when-available + watch fallback.** Pure event-driven, zero idle turns, + but two code paths and an unreliable push channel (#21048). Rejected for + reliability. +- **Replace `/loop` entirely.** Rejected: `/goal` is in-session-only, so the + cron/launchd unattended path needs `/loop`/`codex exec`. Keeping both also de- + risks hosts on older CLI versions without `/goal`. +- **Whole-backlog goal scope.** More autonomy, but it must pause at every + outcome-contract / gate boundary anyway; per-layer scope gives cleaner + termination and matches existing pause points. + +## 6. Consequences + +- Autopilot stops on its own at the layer boundary; no more idle 5-minute ticks + within a quiet layer (it still waits the floor, but never loops past `done`). +- Identical command surface (`/aep-autopilot`); the default driver changes + underneath. `--loop` restores the exact prior behavior. +- New invariant: the per-tick status line must stay signals-only. Enforced by + guardrail in the autopilot skill. +- Codex requires the `goals` feature enabled (`--enable goals`); Claude Code + requires v2.1.139+. Hosts lacking `/goal` fall back to `--loop`. diff --git a/docs/workflow/autonomous-loop.md b/docs/workflow/autonomous-loop.md index 35e868a..7a5d6e2 100644 --- a/docs/workflow/autonomous-loop.md +++ b/docs/workflow/autonomous-loop.md @@ -64,17 +64,18 @@ topology: /aep-autopilot ``` -One command. Initializes state, runs the first tick, and starts a recurring loop (default: every 5 minutes). Use `--loop` to customize the interval: +One command. Initializes state, runs the first tick, and keeps ticking until the **current layer is complete**, then stops on its own. The default driver is **goal-driven** (`/goal`, native on Claude Code and Codex): it re-fires a tick when there is work and self-terminates at the layer boundary so you can run the layer gate / `/aep-reflect` and re-invoke for the next layer. Use `--floor` to tune the goal driver's per-tick wait (default 5m), or `--loop` for the fixed-interval fallback driver: ``` -/aep-autopilot --loop 10m +/aep-autopilot --floor 3m # goal driver, tighter per-tick wait +/aep-autopilot --loop 10m # fixed-interval loop driver instead ``` --- ## The Autonomous Cycle -Autopilot runs as a tick-based state machine. Every 5 minutes, a tick executes: +Autopilot runs as a tick-based state machine. Under the **goal driver** each tick runs, surfaces a signals-only status line, and waits a floor (default 5m) before the `/goal` evaluator decides whether to re-fire or stop (layer complete / paused); under the **loop driver** a tick fires every interval. Either way, a tick executes: ``` /aep-autopilot tick @@ -98,15 +99,17 @@ Autopilot runs as a tick-based state machine. Every 5 minutes, a tick executes: │ ├─ /aep-dispatch + /aep-launch for top story │ └─ Check layer completion → gate test if needed │ - └─ ⑦ Write state + history + status + └─ ⑦ Write state + history + status (+ surface status line + wait, goal driver) ``` -The cycle continues until: +The cycle continues until (goal driver — scoped to the current layer): -- All stories in all layers complete → stop and notify human +- All stories in the **current layer** complete → goal met, stop and hand back to the human (run the layer gate / `/aep-reflect`, then re-invoke `/aep-autopilot` for the next layer) - Design escalation → pause and notify human - Layer gate fails → pause and notify human -- No stories ready → wait for in-progress workspaces +- No stories ready → wait for in-progress workspaces (the per-tick floor) + +(Under the loop driver the cadence is the fixed interval and it runs across layers until you `/aep-autopilot stop`.) --- diff --git a/skills/patterns/autopilot/SKILL.md b/skills/patterns/autopilot/SKILL.md index 59d9590..e4c0913 100644 --- a/skills/patterns/autopilot/SKILL.md +++ b/skills/patterns/autopilot/SKILL.md @@ -6,20 +6,33 @@ description: |- # Autopilot -One command to go autonomous. Initializes state, runs the first tick, and starts a recurring loop — all in one invocation. +One command to go autonomous. Initializes state, runs the first tick, and keeps +itself ticking until the current layer is complete — all in one invocation. The +default driver is **goal-driven** (`/goal`, native on both Claude Code and +Codex): each tick advances the layer and then **the runtime decides whether to +re-fire** based on a completion condition, so autopilot **stops on its own** when +the layer is done or a human-judgment gate is hit. The fixed-interval `/loop` +driver remains available as a fallback (`--loop`). ``` -/aep-autopilot # start with default 5m interval -/aep-autopilot --loop 10m # start with custom interval +/aep-autopilot # start: goal-driven driver (default) — drives the CURRENT LAYER to completion, then stops +/aep-autopilot --loop 10m # start with the fixed-interval loop driver instead (custom interval) +/aep-autopilot --floor 3m # goal driver with a custom per-tick wait floor (default 5m) /aep-autopilot status # check progress and escalations -/aep-autopilot stop # gracefully stop the loop +/aep-autopilot stop # gracefully stop the driver ``` +> **Scope is one layer per run.** The goal driver completes when every story in +> the **current layer** is merged + wrapped, then hands control back so the human +> can run the layer gate / `/aep-reflect` and re-invoke `/aep-autopilot` for the +> next layer. This mirrors the existing pause-at-gate behavior with crisp +> termination instead of an infinite loop. + **Where this fits:** ``` /aep-envision → /aep-map → /aep-validate - → /aep-autopilot + → /aep-autopilot (goal: "layer N complete") ┌─────────────────────────────────────────────┐ │ tick ① read state │ │ tick ② sync signals │ @@ -27,9 +40,12 @@ One command to go autonomous. Initializes state, runs the first tick, and starts │ tick ④ GUIDE COMPLETION (quality + merge) │ │ tick ⑤ detect stuck workspaces │ │ tick ⑥ dispatch new work (/aep-launch) │ - │ tick ⑦ write state │ - │ ... repeat every 5 min ... │ + │ tick ⑦ write state + SURFACE status + WAIT │ └─────────────────────────────────────────────┘ + │ goal evaluator reads the surfaced status line: + │ "is layer N complete, or is autopilot paused?" + ├─ no → re-fire next tick (after the wait floor) + └─ yes / paused → STOP (hand back to human) → /aep-reflect (after layer completes or autopilot stops) ``` @@ -47,7 +63,7 @@ You are an **ORCHESTRATOR**, not an **EXECUTOR**. All code operations happen ins ### Executor mode + driver (read once, applies throughout) Autopilot steers running workspace workers, so it requires a **steerable mode -whose lifetime is compatible with the periodic driver** (see the driver × +whose lifetime is compatible with the driver** (see the driver × backend matrix in `.claude/skills/aep-executor/references/backends.md`). Every "send to workspace" action in this skill is `executor.nudge(ws, msg)`, and every liveness probe is `executor.liveness(ws)` — the table below is the @@ -61,10 +77,14 @@ are mode-independent; deliver them through your mode's transport. | `present` | teammate pane / `Shift+Down` | `claude attach ` | thread list / `/agent` | signals + PR | cmux tab / `tmux attach` | - **Driver compatibility:** session-bound workers (claude-team teammates, - codex-subagents) die with the orchestrator session. Long-lived driver - (Claude Code `/loop`, a living Codex main thread ticking in-thread) → any - steerable mode. Cron/launchd driver (fresh session per tick) → OS-bound - modes only: **claude-bg**, **codex-exec**, **legacy**. + codex-subagents) die with the orchestrator session. **Long-lived driver** — + the **goal driver** (`/goal`, default) or the fixed-interval `/loop`, both + running in a living Claude Code session or Codex main thread → any steerable + mode. **Cron/launchd driver** (fresh session per tick) → OS-bound modes only + (**claude-bg**, **codex-exec**, **legacy**) **and only via `/loop` or an + external scheduler** — `/goal` is inherently in-session and cannot drive a + fresh-session-per-tick scheduler, so unattended OS-scheduled runs keep using + the `/loop`/`codex exec` path. - **claude-team — the standing team:** at `/aep-autopilot` start, `TeamCreate` **once** (team "aep"). Each tick-⑥ launch spawns one teammate into it; each tick-③ wrap shuts that teammate down (team persists); `/aep-autopilot stop` @@ -148,16 +168,23 @@ The autopilot does NOT evaluate workspace code. It triggers and monitors the wor ## `/aep-autopilot` (default — start) -Initialize autopilot, run the first tick, and start the recurring loop. This is a single command — no second step needed. +Initialize autopilot, run the first tick, and start the driver (goal-driven by default; `--loop` for the fixed-interval fallback). This is a single command — no second step needed. **Usage:** ``` -/aep-autopilot # default: 5 minute tick interval -/aep-autopilot --loop 10m # custom interval -/aep-autopilot --loop 3m # faster for active development +/aep-autopilot # default: goal driver — drives the current layer to completion, then stops +/aep-autopilot --floor 3m # goal driver, custom per-tick wait floor (default 5m) +/aep-autopilot --loop 10m # fixed-interval loop driver instead (custom interval) — the fallback +/aep-autopilot --loop 3m # loop driver, faster for active development ``` +**Driver selection rule:** the presence of `--loop ` selects the +fixed-interval **loop driver**; its absence selects the **goal driver** (the +default). `--floor ` only applies to the goal driver (the bounded per-tick +wait, default `5m`); `--max-turns ` caps the goal driver as a runaway +backstop (default `200`). + ### Prerequisites ```bash @@ -232,10 +259,69 @@ Verify these conditions before proceeding: 5. Run the first tick immediately (see tick protocol below). -6. Start the recurring **periodic driver** that fires `/aep-autopilot tick` every - interval. The driver is host-specific (the tick itself is host-agnostic), - and it constrains the launch mode (session-bound workers need a long-lived - orchestrator): +6. Start the driver. The default is the **goal driver**; `--loop` selects the + fixed-interval **loop driver**. Both keep the orchestrator long-lived (so any + steerable mode works); the goal driver additionally **self-terminates** when + the layer is done. The tick body (the 7-step CHECK→ACT protocol below) is the + same under either driver — only how the next tick is triggered differs. + + ### 6a. Goal driver (default) + + Build the **goal condition** for the current layer and hand it to the host's + native `/goal` primitive. The condition is the success predicate the goal + evaluator judges each turn — against the **status line the tick surfaces** + (signals only — never workspace code) — plus a one-line per-turn directive: + + ``` + Layer of this product is COMPLETE: every story in layer is + status=completed AND its worktree has been wrapped (none remain under + .feature-workspaces/), as shown by the AUTOPILOT status line surfaced this + turn — OR autopilot has entered status=paused requiring human input (design + ambiguity, layer-gate failure, outcome contract, or repeated failure), as + shown by the same status line. Judge ONLY from the surfaced status line, + never from memory. Each turn, run exactly ONE `/aep-autopilot tick`, then end + the turn; never run more than one tick per turn. Stop after turns + if the layer has not completed. + ``` + + - **Claude Code (`/goal`, requires v2.1.139+):** + + ``` + /goal + ``` + + `/goal` starts a turn immediately and, after each turn, a small fast model + (Haiku) checks the condition against the conversation; "no" auto-starts the + next turn with the evaluator's reason as guidance, "yes" clears the goal and + stops. Pair with auto mode so each turn runs unattended. The per-tick wait + floor (step ⑦) is what prevents hot-looping — without it the evaluator + re-fires the instant a turn ends. (The evaluator reads only the surfaced + status line, so the orchestrator boundary holds: it never sees workspace + code.) + + - **Codex (`goals` feature, experimental — enable with `--enable goals`):** + + ``` + /goal + ``` + + Set a `token_budget` as the hard runaway wall (on exhaustion Codex + soft-stops to `budget_limited` with a wrap-up steer rather than dying). + Codex continues the goal only when the thread is **idle**, the goal is + active, and it is within budget; `/goal pause` · `/goal resume` · + `/goal check` · `/goal clear` manage it. + + Under either host, **keep delegating each tick's CHECK** to a cheap + context-isolated agent (Haiku subagent / `codex exec` one-shot) — the goal + session is long-lived, so the token-isolation reason from "Execution model" + still holds. + + ### 6b. Loop driver (fallback — `--loop `) + + The fixed-interval driver, unchanged from earlier versions. Use it for hosts + without `/goal`, for fully-unattended **OS-scheduled** runs (cron/launchd — + `/goal` is in-session-only), or when you simply want a fixed cadence. The + loop driver does not self-terminate; stop it with `/aep-autopilot stop`. **Claude Code — `/loop` (GA, in-session, long-lived):** @@ -245,7 +331,7 @@ Verify these conditions before proceeding: Where `` is from `--loop` flag (default: `5m`). `/loop` invokes `/aep-autopilot tick` each interval; the tick keeps the main session cheap by - delegating its CHECK to a Haiku subagent (see below). This is the driver + delegating its CHECK to a Haiku subagent. This is the driver that supports **claude-team** (the team lives in this session). **Codex — two driver options:** @@ -273,7 +359,7 @@ Verify these conditions before proceeding: ## `/aep-autopilot tick` -The per-tick handler invoked by the periodic driver. Can also be run manually at any time. **Idempotent** — safe to run multiple times with no state change producing no duplicate actions. +The per-tick handler invoked by the driver (goal or loop). Can also be run manually at any time. **Idempotent** — safe to run multiple times with no state change producing no duplicate actions. **Execution model — CHECK → ACT.** A tick is two halves: @@ -396,12 +482,19 @@ The 7-step protocol below is the **content of the CHECK prompt** (steps ①② - If well-specified → run /aep-launch (max ONE launch per tick) - Add new workspace entry to state (with story_ids, wave, readiness_score) -⑦ WRITE STATE [CHECK] +⑦ WRITE STATE + SURFACE + WAIT [CHECK, then driver tail] - Write .dev-workflow/autopilot-state.json (atomic: write .tmp then rename) - Append tick summary to .dev-workflow/autopilot-history.jsonl - Update .dev-workflow/autopilot-status.md - Increment tick_count, set last_tick_at - Release tick lock + - SURFACE the AUTOPILOT status line into the transcript (GOAL DRIVER ONLY) — + signals-only, so the goal evaluator can judge "layer complete? paused?" + - WAIT the per-tick floor before ending the turn (GOAL DRIVER ONLY) — the + anti-hot-loop floor (default 5m, `--floor`). CC → Monitor with a hard + timeout (a raw foreground sleep is blocked); Codex → shell sleep. + Under the loop driver, neither the surface nor the wait runs (the `/loop` + interval is the cadence). ``` --- @@ -427,12 +520,18 @@ Also parse `.dev-workflow/autopilot-state.json` and present: ## `/aep-autopilot stop` -Gracefully stop the autopilot and cancel the recurring loop. +Gracefully stop the autopilot and cancel the active driver (goal or loop). 1. Set `status: "stopped"` in `.dev-workflow/autopilot-state.json` 2. Update `.dev-workflow/autopilot-status.md` with stopped state 3. Log stop event to `.dev-workflow/autopilot-history.jsonl` -4. Cancel the `/loop` (use the loop skill's cancel mechanism) +4. Cancel the driver: + - **Goal driver:** clear the active goal — `/goal clear` (Claude Code; aliases + `stop`/`off`/`reset`/`none`/`cancel`) or `/goal clear` (Codex). No further + turn re-fires. (The goal driver also self-clears when the layer completes, + so `stop` is mainly for early termination.) + - **Loop driver:** cancel the `/loop` (use the loop skill's cancel mechanism), + or remove the cron/launchd job for an OS-scheduled run. 5. **claude-team only:** if no teammates are still building, shut them all down and `TeamDelete` the standing team. If workers are still mid-build, leave the team alive (deleting it would kill session-bound workers) and note in @@ -440,7 +539,7 @@ Gracefully stop the autopilot and cancel the recurring loop. **What happens:** -- The recurring loop is cancelled — no more ticks +- The active driver is cancelled — no more ticks - Running workspaces continue autonomously (they don't depend on autopilot) **What does NOT happen:** @@ -515,7 +614,9 @@ After the human resolves the design issue: /aep-autopilot ``` -This re-reads the product context (now with refined specs), re-initializes the loop, and resumes ticking. +This re-reads the product context (now with refined specs) and re-initializes +the driver — under the goal driver it sets a fresh goal for the current layer; +under the loop driver it restarts the `/loop` — then resumes ticking. --- @@ -535,6 +636,15 @@ This re-reads the product context (now with refined specs), re-initializes the l - **Respect WIP limits** — never exceed `topology.routing.concurrency_limit` - **Atomic state writes** — write to `.tmp` then rename to prevent corruption - **Tick lock** — prevent overlapping ticks via `tick_in_progress` timestamp +- **Goal driver is scoped to ONE layer** — the goal condition completes (or + pauses) at the current layer boundary; never widen it to the whole backlog +- **Goal evaluator sees signals only** — the per-tick surfaced status line MUST + be signals-only (no workspace code, no file contents), preserving the + orchestrator boundary; the evaluator never reads code +- **Per-tick wait floor is mandatory under the goal driver** — without it + `/goal` re-fires the instant a turn ends and hot-loops, burning tokens +- **Always bound the goal** — `--max-turns` (default 200) and, on Codex, a + `token_budget`, so a non-converging layer can never run forever - **Pause on design ambiguity** — unless `auto_design: true`, escalate to human when readiness < 0.7 - **Always escalate on repeated failures** — `attempt_count >= 2` always pauses, even with `auto_design: true` @@ -548,5 +658,5 @@ After autopilot completes a layer or is stopped: | ----------------------- | --------------------------------------------------------------------------------- | | `/aep-reflect` | After layer completes — evaluate outcome contracts (Step 2.75), classify feedback | | `/aep-autopilot status` | Anytime — check progress and escalations | -| `/aep-autopilot` | After resolving a pause — resume the loop | +| `/aep-autopilot` | After resolving a pause — resume the driver (re-sets the layer goal) | | `/aep-dispatch` | Manual mode — pick a specific story interactively | diff --git a/skills/patterns/autopilot/references/tick-protocol.md b/skills/patterns/autopilot/references/tick-protocol.md index f2481b0..2cc0d0f 100644 --- a/skills/patterns/autopilot/references/tick-protocol.md +++ b/skills/patterns/autopilot/references/tick-protocol.md @@ -2,8 +2,11 @@ The 7-step state machine executed on each autopilot tick. Each tick is idempotent — running it twice with no external state change produces the same result and takes no duplicate actions. -**Target duration:** <60 seconds per tick -**Invocation:** `/loop 5m /aep-autopilot tick` or manual `/aep-autopilot tick` +**Target duration:** <60 seconds of work per tick (under the goal driver the +turn then waits the per-tick floor — step ⑦ — before ending) +**Invocation:** goal driver (default) — `/goal ""` re-fires +this tick each turn until the layer completes; loop driver (fallback) — +`/loop 5m /aep-autopilot tick`; or manual `/aep-autopilot tick` > **BOUNDARY REMINDER:** The autopilot is an orchestrator. Every action on a workspace is `executor.nudge()` / `executor.liveness()` — autopilot runs only on **steerable, driver-compatible modes** (claude-team / claude-bg / codex-subagent / codex-exec / legacy; see the per-mode transport table in SKILL.md and `aep-executor/references/backends.md`). The nudge texts in this file are mode-independent — deliver each through the workspace's `backend` transport (`SendMessage` / `feedback.md` / `send_input` / `codex exec resume` / `tmux send-keys`). Never spawn code reviewers from main, never read workspace source code, never call `gh pr merge`. See SKILL.md "STOP — Orchestrator Boundaries". @@ -473,6 +476,41 @@ At natural checkpoints (layer complete, escalation, or autopilot stop), run the 5. **Release tick lock** — set `tick_in_progress` to `null` +### Goal-driver tail (steps 6–7 — skip entirely under the loop driver) + +The fixed-interval `/loop` driver stops here: the interval is the cadence and the +`/loop` skill re-fires the next tick. The **goal driver** instead ends each turn +with two extra actions so the `/goal` evaluator can decide whether to re-fire: + +6. **Surface the AUTOPILOT status line** into the transcript — one compact, + **signals-only** line (no workspace code, no file contents) the goal evaluator + judges the completion condition against. Derive every field from + `autopilot-state.json` + product-context layer data: + + ``` + AUTOPILOT layer= wave= stories= done= in_progress= + wrapped= ready_remaining= paused= escalations= tick= + layer_complete= # true ⟺ done==total AND no worktrees remain + ``` + + `layer_complete=true` (or `paused=true`) is exactly what the goal condition + keys on. Because the line is signals-only, the evaluator never reads workspace + code — the orchestrator boundary holds. + +7. **Wait the per-tick floor**, then end the turn. This is the anti-hot-loop + floor — without it `/goal` re-fires the instant the turn ends. Default `5m` + (`--floor`); implement with the host's sanctioned bounded wait: + - **Claude Code:** the `Monitor` tool with a hard timeout (a raw foreground + `sleep` is blocked inside a turn). An early wake on a `signals/` change is an + allowed optimization but **not** required — the timeout alone guarantees a + bounded, stable cadence. + - **Codex:** a shell `sleep ` (no restriction). + + Ending the turn returns control to `/goal`, whose evaluator reads the surfaced + status line and either re-fires the next tick or stops (layer complete / + paused). Step ⑤'s stuck detection still runs every tick, so a stalled layer + escalates → `paused` → the goal stops for the human. + --- ## Workspace State Derivation diff --git a/skills/patterns/executor/references/backends.md b/skills/patterns/executor/references/backends.md index f301d85..821c240 100644 --- a/skills/patterns/executor/references/backends.md +++ b/skills/patterns/executor/references/backends.md @@ -141,18 +141,25 @@ back pin it: `git config aep.executor-backend tmux`. ## Driver × Backend Compatibility An orchestrator (notably `/aep-autopilot`) is driven either by a **long-lived -session** (Claude Code `/loop`; a living Codex main thread ticking in-thread) -or by a **cron/launchd scheduler** that starts a fresh session per tick. -Session-bound workers cannot outlive their parent, so: - -| Driver | claude-team | claude-bg | codex-subagent | codex-exec | legacy | -| ------------------------------------------------ | ---------------------- | --------------------------------- | --------------------------------------- | ------------------------------------------ | ------ | -| Long-lived session (`/loop`, living main thread) | ✅ | ✅ | ✅ | ✅ | ✅ | -| Cron/launchd (fresh session per tick) | ❌ team dies with lead | ✅ OS-level, any session attaches | ❌ subagents invisible to a new session | ✅ `codex exec resume` works cross-process | ✅ | +session** or by a **cron/launchd scheduler** that starts a fresh session per +tick. The long-lived class has two in-session variants — the **goal driver** +(`/goal`, native on both hosts, the autopilot default, self-terminating per +layer) and the fixed-interval **loop driver** (`/loop`; a living Codex main +thread ticking in-thread). Both are equally compatible with every steerable +mode. Session-bound workers cannot outlive their parent, so: + +| Driver | claude-team | claude-bg | codex-subagent | codex-exec | legacy | +| --------------------------------------------------------------- | ---------------------- | --------------------------------- | --------------------------------------- | ------------------------------------------ | ------ | +| Long-lived session (`/goal` **or** `/loop`, living main thread) | ✅ | ✅ | ✅ | ✅ | ✅ | +| Cron/launchd (fresh session per tick) | ❌ team dies with lead | ✅ OS-level, any session attaches | ❌ subagents invisible to a new session | ✅ `codex exec resume` works cross-process | ✅ | A consumer that needs steering (autopilot) must pick a mode compatible with its -driver: on Claude Code, `/loop` + `claude-team` (or `claude-bg`); on Codex, -in-thread ticks + `codex-subagent`, or cron ticks + `codex-exec`. +driver: on Claude Code, `/goal` (or `/loop`) + `claude-team` (or `claude-bg`); +on Codex, in-thread `/goal` (or manual ticks) + `codex-subagent`, or cron ticks + +- `codex-exec`. **The goal driver is in-session only** — it cannot drive a + fresh-session-per-tick scheduler, so the cron/launchd row is always the + `/loop`/`codex exec` path. ---