Skip to content
7 changes: 4 additions & 3 deletions .claude-plugin/marketplace.json
Original file line number Diff line number Diff line change
Expand Up @@ -5,13 +5,13 @@
"email": "mho@looplia.run"
},
"metadata": {
"version": "1.8.0",
"version": "2.0.0",
"description": "Skills for product planning, project scaffolding, and agentic development workflows."
},
"plugins": [
{
"name": "product-context",
"description": "Product-level planning and iteration: envision, map, dispatch, validate, calibrate, reflect.",
"description": "Product-level planning and iteration: envision, map, dispatch, validate, calibrate, reflect, watch.",
"source": "./",
"strict": false,
"skills": [
Expand All @@ -20,7 +20,8 @@
"./skills/product-context/dispatch",
"./skills/product-context/validate",
"./skills/product-context/calibrate",
"./skills/product-context/reflect"
"./skills/product-context/reflect",
"./skills/product-context/watch"
]
},
{
Expand Down
51 changes: 51 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -21,6 +21,57 @@ bug fixes → **patch**; removing or breaking a skill contract → **major**.

_Nothing yet._

## [2.0.0] - 2026-06-16

The **autonomy loop** release. Closes the loop-engineering gaps identified in
`docs/research/loop-engineering-autonomy-gap.md` (G2–G7) and adds a `full_auto`
master switch. Every new capability defaults to **human-in-the-loop** — autonomy
is opt-in via `topology.routing` flags.

### Added

- **`/aep-watch` skill** (G6) — continuously ingests telemetry / error streams /
bug trackers, classifies findings with the `/aep-reflect` classifier, and
auto-files bug/refinement stories so reflect→dispatch becomes self-feeding.
- **Change-strategy recovery ladder** (G2) — `gen-eval/references/recovery-ladder.md`;
on repeated eval FAIL the build climbs same-fix → re-ground → fresh
`native-bg-subagent` generator → decompose **before** the `eval_not_converging`
human gate.
- **Host-aware post-deploy dogfood** (G4b) — `executor/references/dogfood-validation.md`:
`dogfood_method()` (Claude → agent-browser; Codex → native in-app browser /
computer-use, or Playwright headless) + `target_url()` (config-first, CI fallback).
- **Post-merge guard** (G4a) — `autopilot/references/post-merge-guard.md` + tick
Step ③.5: monitors merged stories' deploy health; dogfood issues → reflect story;
hard regression → conservative `auto_revert` (default off, warn + escalate).
- **Telemetry-driven reflect** (G5) — `reflect/references/telemetry-ingestion.md`:
automated source ingestion + quantitative outcome-contract auto-evaluation.
- **Telemetry source determination** — projects decide sources via a hybrid
metric-driven rule: `/aep-scaffold`/`/aep-onboard` detect the observability stack
(candidate sources); `/aep-map` binds each quantitative `success_metric` +
`health_signal` to a source (`metric_map`); a shared `coverage_check()` lets
`/aep-watch`, `/aep-reflect`, and the post-merge guard **block auto when the
binding is incomplete** instead of silently no-op'ing.
- **Visual Design evaluator dimension** (G3) — vision-model scoring of screenshots
against the design system, for both Claude and Codex (multimodal).
- **`full_auto` master switch** (A1) — `topology.routing.full_auto` (default false)
gates the strategic human pauses (design escalation, qualitative outcome eval);
implies `auto_design` + `auto_outcome_eval` + `watch.auto_create`. New config keys
added to the product-context schema.

### Changed

- `/aep-build` Phase 5 climbs the recovery ladder; Phase 6 dogfood is host-aware
(degrades instead of skipping when agent-browser is absent).
- `/aep-reflect` Step 1 supports automated ingestion; Step 2.75 auto-evaluates
quantitative outcome contracts (qualitative still pauses unless `full_auto`).
- `/aep-autopilot` gains the post-merge guard step and `full_auto`-aware routing;
loop hygiene unified on `--max-turns` (G7).

### Fixed

- Carries forward the v1.8.0 executor fix (claude-team removed; `native-bg-subagent`
default + post-spawn liveness probe). Every new spawn path uses it.

## [1.8.0] - 2026-06-15

### Changed
Expand Down
2 changes: 1 addition & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -615,7 +615,7 @@ These aren't rules we invented — they're patterns extracted from Anthropic's e

## Getting Started

**Brand new to AEP?** Start with the [Orientation Guide](docs/orientation.md) for a 10-minute tour of the mental models, the 16 skills, and the four paths — then run `/aep-onboard`.
**Brand new to AEP?** Start with the [Orientation Guide](docs/orientation.md) for a 10-minute tour of the mental models, the 17 skills, and the four paths — then run `/aep-onboard`.

**New to this plugin?**

Expand Down
7 changes: 4 additions & 3 deletions docs/orientation.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
# AEP Orientation Guide

**A 10-minute first-hour tour for new users.** Read this before (or right after) running `/aep-onboard`. When you finish, you'll know what AEP is, the three mental models that drive every skill, what each of the 16 skills does, and which of four concrete paths matches your situation.
**A 10-minute first-hour tour for new users.** Read this before (or right after) running `/aep-onboard`. When you finish, you'll know what AEP is, the three mental models that drive every skill, what each of the 17 skills does, and which of four concrete paths matches your situation.

For precise definitions of every term used here, see the [Glossary](glossary.md). For a one-page decision tree, see the [Skills Quick Reference](skills-quick-reference.md).

Expand Down Expand Up @@ -94,7 +94,7 @@ More: [README.md "The Feature Lifecycle"](../README.md) and [skills/agentic-deve

---

## 3. The 16 Skills at a Glance
## 3. The 17 Skills at a Glance

| Skill | Plugin | Session | Purpose |
| ------------------------ | ---------------------------- | --------- | --------------------------------------------------------------- |
Expand All @@ -107,6 +107,7 @@ More: [README.md "The Feature Lifecycle"](../README.md) and [skills/agentic-deve
| `/aep-dispatch` | product-context | Main | Pick next story + create OpenSpec change + hand off |
| `/aep-calibrate` | product-context | Main | Human alignment checkpoint for any quality dimension |
| `/aep-reflect` | product-context | Main | Classify feedback + update context (close the loop) |
| `/aep-watch` | product-context | Main | Ingest telemetry/errors → auto-file stories (self-feeding loop) |
| `/aep-design` | agentic-development-workflow | Main | Interactive feature design (explore + propose + review) |
| `/aep-launch` | agentic-development-workflow | Main | Spawn autonomous workspace + optional evaluator |
| `/aep-build` | agentic-development-workflow | Workspace | Implement → test → PR → merge (autonomous) |
Expand Down Expand Up @@ -237,4 +238,4 @@ One-line pointers so you know what to look up when you hit an unfamiliar term. F

---

**You're done with orientation.** The rest of AEP is discoverable from the three mental models, the 16-skill table, and the four paths. When in doubt, reach for the decision tree in the quick reference — it covers the common forks.
**You're done with orientation.** The rest of AEP is discoverable from the three mental models, the 17-skill table, and the four paths. When in doubt, reach for the decision tree in the quick reference — it covers the common forks.
136 changes: 136 additions & 0 deletions docs/research/g4-dogfood-validation-design.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,136 @@
# G4 — Host-aware Dogfood Validation 預設設計

> **狀態:** 設計 spec(待展開為 skill 變更)。屬 [loop-engineering-autonomy-gap](./loop-engineering-autonomy-gap.md) §3 的 **G4** 子設計:部署後在 staging/production 上的驗證,依 host 採原生方法。
> **日期:** 2026-06-15 **分支:** `research/loop-engineering-autonomy-gap`

---

## Context

G4 是「合併後的生產回饋閉環」。現況:AEP 的 dogfood(`/aep-build` Phase 6)只在**本地 localhost**(`ports.env` 的 `BASE_URL`)跑,且**前提是 agent-browser 有裝**否則整個 phase skip;**完全沒有 staging/production 部署後驗證**。本設計補上兩件事:

1. **Dogfood 方法 host-aware** —— Claude Code 自動判斷是否用 agent-browser;Codex 採原生 browser / computer-use。
2. **部署後在 staging/production 驗證** —— 新增 post-deploy dogfood,目標 URL 來自 config 或 CI。

---

## 決策(已拍板)

| 項目 | 決定 |
| --------------------- | ----------------------------------------------------------------------------------------------------------------------------------------- |
| staging/prod URL 來源 | **config 優先,fallback CI** —— `topology.routing.deploy_targets.{staging_url,production_url}`;缺則從 CI/deploy 輸出(如 preview URL)讀 |
| 接入點 | **新 G4 post-deploy 步驟 + 升級 Phase 6 為 host-aware**(兩者並存) |
| 發現問題時 | **自動建 story 進 dispatch**(走 `/aep-reflect` 分類器,連動 G6) |

---

## 研究依據(host 原生能力)

- **Claude Code:** agent-browser 是原生瀏覽器工具(CDP 驅動 Chrome、accessibility-tree `@eN` refs、screenshot `--annotate`、video、auth vault),`/agent-browser:dogfood` 已是 Phase 6 用的探索式測試流程。健康偵測 `agent_browser_healthy()`(`agent-browser navigate about:blank`)已存在於 `testing-guide`。
- **Codex:** computer-use(GPT-5.4 原生:截圖 + 滑鼠鍵盤 + 寫 Playwright)與 in-app browser(Atlas)**僅桌面 app**;`codex exec`(headless)**沒有**,只能寫並跑 Playwright 腳本或退回 agent-browser CLI。→ Codex 必須分桌面 / headless 兩條路。

來源:[OpenAI Codex app](https://developers.openai.com/codex/app)、[GPT-5.4 — OpenAI](https://openai.com/index/introducing-gpt-5-4/)、[Codex superapp — MacStories](https://www.macstories.net/news/openai-unveils-codex-superapp-update-with-computer-use-automations-built-in-browser-and-more/)、[Codex for Chrome — eigent.ai](https://www.eigent.ai/blog/codex-for-chrome)。

---

## 預設選擇邏輯(dogfood method 偵測)

延用 `executor.detect()` 的精神,新增一層方法偵測(host × mode):

```
dogfood_method():
resolve HOST + mode via executor.detect()

if HOST == claude: # 任一 mode
if agent_browser_healthy(): return "agent-browser" # /agent-browser:dogfood
else: return "degrade" # 非 UI→API/curl;UI→human-eval

if HOST == codex:
if mode == codex-subagent and computer_use_enabled: # 桌面 app
return "codex-native" # in-app browser + computer-use
else: # codex-exec / headless
if playwright_available(): return "playwright-script" # GPT-5.4 原生會寫
elif agent_browser_healthy(): return "agent-browser" # CLI 退路
else: return "degrade"
```

| Host / mode | 預設原生方法 | 偵測 | 退路 |
| ---------------------------- | ------------------------------------ | --------------------------- | -------------------------------- |
| Claude Code(任一 mode) | `/agent-browser:dogfood` | `agent_browser_healthy()` | 非 UI→API/curl;UI→human-eval |
| Codex 桌面(codex-subagent) | native in-app browser + computer-use | desktop + computer-use 啟用 | Playwright skill → agent-browser |
| Codex headless(codex-exec) | 寫並跑 Playwright 腳本 | playwright 可用/可裝 | agent-browser CLI → API 檢查 |

> 所有方法統一輸出同格式報告(`/agent-browser:dogfood` 的 severity/category/repro 模板),讓下游分類器 host-agnostic。

---

## 目標 URL 解析

```
target_url(env): # env ∈ {local, staging, production}
if env == local: # 現況不變
source .dev-workflow/ports.env → return $BASE_URL
else:
u = product-context: topology.routing.deploy_targets.<env>_url
if u: return u # config 優先
else: return <讀 CI/deploy 步驟輸出的 preview/deploy URL> # fallback CI
```

---

## 接入點

### (1) 升級 Phase 6(本地,pre-merge)

`/aep-build` Phase 6 把「agent-browser 沒裝就 skip」改為呼叫 `dogfood_method()`:Claude→agent-browser、Codex→原生。`env=local`,URL 來自 `ports.env`。報告仍寫 `.dev-workflow/dogfood-<feature>.md`。

### (2) 新 G4 post-deploy 步驟(staging/prod,post-merge)

在 autopilot tick 的 wrap 後(或 `post-merge-guard`)新增:merge→(觸發/等待 deploy)→`target_url(staging|production)`→`dogfood_method()` 跑驗證→寫報告。維持 orchestrator boundary(讀 signals/報告 + 跑 gh/CLI,不讀 workspace code)。

---

## 發現問題時的行為

- **dogfood 發現的問題** → 餵 `/aep-reflect` 分類器 → 自動建 bug/refinement story 進 `product-context.yaml` → dispatch(連動 G6 自我餵食)。
- **硬性 regression(健康訊號)** → 另走 G4 post-merge guard 的 `auto_revert` 政策(預設保守:先告警、人工確認後才 revert)。兩條路分開:dogfood 找 UX/功能問題建 story;guard 找服務性 regression 決定回滾。

---

## config 新增(product-context.yaml)

```yaml
topology:
routing:
deploy_targets:
staging_url: "https://staging.example.com" # 選填;缺則 fallback CI
production_url: "https://example.com"
dogfood:
method: auto # auto | agent-browser | codex-native | playwright
post_deploy_env: staging # staging | production | none
on_issue: create_story # create_story | escalate
```

---

## 實作時會動到的檔案(待展開)

| 檔案 | 變更 |
| ------------------------------------------------------------------------ | --------------------------------------------------------------------------------------------------- |
| `agentic-development-workflow/build/SKILL.md` | Phase 6 改呼叫 `dogfood_method()`(host-aware),不再「沒裝就 skip」 |
| 新 `patterns/.../references/dogfood-validation.md` | `dogfood_method()` 偵測 + `target_url()` 解析 + 報告格式 |
| `patterns/executor/references/codex-native.md` | 新增 codex-subagent 用 in-app browser / computer-use 做 dogfood 的 recipe;codex-exec 用 Playwright |
| `patterns/autopilot/references/tick-protocol.md` + `post-merge-guard.md` | 新增 post-deploy dogfood 步驟 |
| `product-context/reflect/SKILL.md` | 接收 dogfood 報告 → 分類 → 建 story(已有分類器,補來源) |
| `project-setup/testing-guide/SKILL.md` | 重用既有 `agent_browser_healthy()`;補 playwright 偵測 |

---

## Verification(實作後)

1. **Claude Code**:裝/不裝 agent-browser 各跑一次 Phase 6 → 確認自動選 agent-browser / 正確 degrade。
2. **Codex 桌面**:codex-subagent 跑 post-deploy → 確認用 in-app browser + computer-use 驗證 staging URL。
3. **Codex headless**:codex-exec 跑 → 確認改寫並跑 Playwright 腳本(無 computer-use 時)。
4. **URL 解析**:設 `deploy_targets.staging_url` → 用之;移除 → 確認 fallback 從 CI 輸出取得。
5. **on_issue**:故意留一個 UX bug → 確認自動在 `product-context.yaml` 建出 bug story 並進 dispatch。
6. **boundary**:確認 post-deploy 步驟只讀報告/signals + 跑 CLI,不讀 workspace code。
Loading
Loading