diff --git a/.gitignore b/.gitignore
index 1153341..c5d3e84 100644
--- a/.gitignore
+++ b/.gitignore
@@ -55,9 +55,11 @@ ASR.md
 .plan/
 # Tracked docs are explicitly listed below; everything else under docs/
 # is Claude scratch (plans, brainstorm output, etc) and stays gitignored.
+#   - DESIGN.md:         consolidated architecture + decision log.
 #   - AIRGAP_INSTALL.md: Phase 14 (HARD-02) air-gap install path.
 #   - DEVELOPMENT.md:    Phase 16 (BUNDLER-01) contributor workflow.
 docs/*
+!docs/DESIGN.md
 !docs/AIRGAP_INSTALL.md
 !docs/DEVELOPMENT.md
 REVIEW_*.md
diff --git a/.planning/phases/01-concurrency-foundation/01-01-SUMMARY.md b/.planning/phases/01-concurrency-foundation/01-01-SUMMARY.md
deleted file mode 100644
index e619dac..0000000
--- a/.planning/phases/01-concurrency-foundation/01-01-SUMMARY.md
+++ /dev/null
@@ -1,134 +0,0 @@
----
-phase: 01-concurrency-foundation
-plan: 01
-subsystem: infra
-tags: [asyncio, locks, concurrency, fastapi, streamlit, session-management]
-
-# Dependency graph
-requires: []
-provides:
-  - SessionBusy(RuntimeError) exception with session_id attribute
-  - SessionLockRegistry.is_locked(session_id) non-blocking predicate
-  - Per-session task-reentrant lock held across full graph turn including HITL pause
-  - HTTP 429 + Retry-After:1 on all three session-start/approval API callsites
-  - UI retry hint on SessionBusy at investigation form submission
-  - locks.py inlined into dist/ bundles
-affects:
-  - 01-02-concurrency-foundation  # approval_watchdog retry path uses SessionBusy
-
-# Tech tracking
-tech-stack:
-  added: []
-  patterns:
-    - class-name match for exception handling in api.py (no hard import at module load)
-    - task-reentrant asyncio lock with is_locked() fail-fast check before acquire()
-    - D-09: dist/ regeneration in same atomic commit as src/ changes
-
-key-files:
-  created: []
-  modified:
-    - src/runtime/locks.py
-    - src/runtime/service.py
-    - src/runtime/api.py
-    - src/runtime/ui.py
-    - tests/test_session_lock.py
-    - scripts/build_single_file.py
-    - dist/app.py
-    - dist/ui.py
-    - dist/apps/incident-management.py
-    - dist/apps/code-review.py
-
-key-decisions:
-  - "D-01: Lock held across entire graph turn including LangGraph interrupt() HITL pause"
-  - "D-02: Single acquire site inside _run() closure, not at start_session() entry"
-  - "D-03: Fail-fast contention — SessionBusy raised, not queued"
-  - "D-04: Reads stay lock-free throughout"
-  - "D-09: dist/ regenerated in same atomic commit as src/ changes"
-  - "D-10: Direct atomic commit on refactor/prompt-vs-code-remediation branch"
-  - "D-15: Slot eviction deferred to v2 — TODO comment added to _slots dict"
-  - "D-16 (location override): SessionBusy raised inside _run() at acquire site, NOT at start_session() entry — start_session() mints fresh session_id so no pre-existing lock slot exists"
-  - "D-17: EventLog stays lock-free"
-  - "locks.py added to RUNTIME_MODULE_ORDER in build_single_file.py (was missing)"
-
-patterns-established:
-  - "Exception class-name matching pattern: e.__class__.__name__ in ('SessionCapExceeded', 'SessionBusy') — avoids hard import at module load time"
-  - "is_locked() + acquire() pattern: check is_locked() first for fail-fast, then async with acquire() for the body — non-contending in steady state"
-  - "asyncio_mode=auto: new async tests in tests/ do NOT need @pytest.mark.asyncio decorator"
-
-requirements-completed:
-  - PVC-01
-
-# Metrics
-duration: ~35min
-completed: 2026-05-06
----
-
-# Phase 01: Concurrency Foundation — Plan 01 Summary
-
-**Per-session task-reentrant asyncio lock with fail-fast SessionBusy, HTTP 429/Retry-After mapping at all three API callsites, UI retry hint, and locks.py bundled into dist/**
-
-## Performance
-
-- **Duration:** ~35 min
-- **Started:** 2026-05-06T08:00:00Z
-- **Completed:** 2026-05-06T08:35:00Z
-- **Tasks:** 3
-- **Files modified:** 10
-
-## Accomplishments
-- `SessionBusy(RuntimeError)` exception and `is_locked()` predicate added to `locks.py`; 5 new unit tests pass (838 total)
-- `service.py._run()` wrapped with per-session lock acquire; fail-fast contention check via `is_locked()` before `acquire()`
-- All three FastAPI callsites (`/investigate`, `POST /sessions`, approval submission) now map `SessionBusy` → HTTP 429 + `Retry-After: 1`; UI shows `st.warning` + early return
-- `locks.py` added to `RUNTIME_MODULE_ORDER` in `build_single_file.py` (was omitted); all four dist bundles regenerated with `SessionBusy`, `is_locked`, `_locks.acquire` present
-
-## Task Commits
-
-All tasks committed atomically in a single commit per D-09/D-10:
-
-1. **Tasks 1-3: All changes** - `ea43964` (feat)
-
-## Files Created/Modified
-- `src/runtime/locks.py` - Added `SessionBusy` class, `is_locked()` predicate, TODO(v2) eviction note
-- `src/runtime/service.py` - Wrapped `_run()` body with `async with orch._locks.acquire(session_id):`; `is_locked()` fail-fast guard
-- `src/runtime/api.py` - Extended class-name match at 2 existing handlers + 1 new handler at approval submission callsite
-- `src/runtime/ui.py` - SessionBusy try/except at `asyncio.run()` investigation form path
-- `tests/test_session_lock.py` - 5 new tests for `is_locked()` + `SessionBusy` (no `@pytest.mark.asyncio` per asyncio_mode=auto)
-- `scripts/build_single_file.py` - Added `(RUNTIME_ROOT, "locks.py")` before `orchestrator.py` in `RUNTIME_MODULE_ORDER`
-- `dist/app.py`, `dist/ui.py`, `dist/apps/incident-management.py`, `dist/apps/code-review.py` - Regenerated with locks.py inlined
-
-## Decisions Made
-- D-16 location override confirmed: `SessionBusy` raised inside `_run()` not at `start_session()` entry — `start_session()` mints a fresh `session_id` so there is no pre-existing lock slot to check
-- `locks.py` was missing from `RUNTIME_MODULE_ORDER` in the build script — added before `orchestrator.py` which instantiates `SessionLockRegistry`
-- Used `is_locked()` as a pre-check before `acquire()` to satisfy D-03 fail-fast without blocking; the acquire() itself is non-contending in the steady state
-
-## Deviations from Plan
-
-### Auto-fixed Issues
-
-**1. [Rule 3 - Blocking] locks.py missing from build_single_file.py RUNTIME_MODULE_ORDER**
-- **Found during:** Task 3 (dist/ regeneration verification)
-- **Issue:** `def is_locked`, `class SessionBusy` absent from `dist/app.py` after initial build; `locks.py` was not listed in `RUNTIME_MODULE_ORDER`
-- **Fix:** Added `(RUNTIME_ROOT, "locks.py")` to `RUNTIME_MODULE_ORDER` before `orchestrator.py`; rebuilt all four bundles
-- **Files modified:** `scripts/build_single_file.py`, all four dist files
-- **Verification:** `grep -c "def is_locked" dist/app.py` → 1; `grep -c "class SessionBusy" dist/app.py` → 1; `grep -c "_locks\.acquire" dist/app.py` → 2
-- **Committed in:** `ea43964` (same atomic commit)
-
----
-
-**Total deviations:** 1 auto-fixed (1 blocking — missing bundle entry)
-**Impact on plan:** Essential fix for D-09 compliance. No scope creep.
-
-## Issues Encountered
-None beyond the locks.py bundle omission documented above.
-
-## User Setup Required
-None - no external service configuration required.
-
-## Next Phase Readiness
-- Per-session lock foundation complete; `SessionBusy` exception available for 01-02
-- 01-02 (`approval_watchdog.py` retry path) can import `SessionBusy` from `runtime.locks` without circular import risk
-- All 838 tests pass; ruff clean on all modified files
-
----
-*Phase: 01-concurrency-foundation*
-*Completed: 2026-05-06*
diff --git a/.planning/phases/14-reproducible-air-gap-lockfile/14-01-PLAN.md b/.planning/phases/14-reproducible-air-gap-lockfile/14-01-PLAN.md
deleted file mode 100644
index 97986f8..0000000
--- a/.planning/phases/14-reproducible-air-gap-lockfile/14-01-PLAN.md
+++ /dev/null
@@ -1,75 +0,0 @@
----
-phase: 14-reproducible-air-gap-lockfile
-plan: 01
-title: Reproducible air-gap dependency lockfile (HARD-02)
-status: in_progress
-date: 2026-05-07
-requirement: HARD-02 (CONCERNS C2)
----
-
-# Plan 14-01 — Reproducible Air-Gap Dependency Lockfile
-
-## One-liner
-
-Commit a `uv.lock` that pins every transitive dependency with hashes; CI installs from the lockfile and a freshness gate fails the build when `pyproject.toml` drifts from `uv.lock`; document the offline install path so an engineer behind a corporate firewall can reproduce the dependency graph from an internal mirror without public-internet access.
-
-## Tool Selection — `uv` (rationale)
-
-Considered `uv`, `pip-tools`, `poetry`. Selected **`uv`** (locally installed: `uv 0.11.7`).
-
-| Criterion (`~/.claude/rules/dependencies.md`) | `uv` | `pip-tools` | `poetry` |
-| --- | --- | --- | --- |
-| License | Apache-2.0 / MIT (dual) | BSD-3-Clause | MIT |
-| Active maintenance / bus factor | Astral team, daily releases | jazzband collective | python-poetry org |
-| Lockfile format | `uv.lock` (TOML, hashes per platform marker) | `requirements.txt` w/ `--generate-hashes` | `poetry.lock` (TOML) |
-| PEP 621 (`pyproject.toml` `[project]`) native | Yes — already what we use | Reads `pyproject.toml` direct | Requires `[tool.poetry]` rewrite of `[project]` |
-| Resolver speed (171 pkgs) | ~14 ms (measured) | seconds | seconds |
-| Single static binary | Yes (Rust) | No (Python pkg) | No (Python pkg) |
-| Works fully offline (`--offline`, `--frozen`) | Yes (first-class) | Indirect via `pip install --no-index` | Yes |
-| Drift gate (`--check`) | `uv lock --check` | `pip-compile --check` (since 7.4) | `poetry check --lock` |
-| Already adopted in repo | **Yes** (`uv.lock` already present, 4430 lines, 171 pkgs) | No | No |
-
-**Decision:** `uv`. The lockfile already exists in-repo and is in sync (`uv lock --check` exits 0 in 14 ms). `poetry` is rejected because adopting it would require rewriting `[project]` into `[tool.poetry]` — a pyproject-format migration that violates "minimal diff" scope. `pip-tools` would lose the `uv.lock` work already present and forfeit the multi-platform marker pinning that `uv.lock` gives for free.
-
-## Tasks (8)
-
-1. **Confirm lockfile freshness against current `pyproject.toml`** — `uv lock --check` (already passes; recorded as baseline).
-2. **Add `[tool.uv]` block to `pyproject.toml` if needed** — likely no-op; defaults already satisfy our needs. Verify behaviour.
-3. **Rewrite CI install step in `.github/workflows/ci.yml`** — replace `pip install -e ".[dev]"` with `uv sync --frozen --extra dev`, plus `astral-sh/setup-uv@v6` for the runner.
-4. **Add CI lockfile-freshness gate** — new step `uv lock --check` runs before install; fails CI when `pyproject.toml` and `uv.lock` drift.
-5. **Switch CI test/lint/type-check steps to `uv run`** — `uv run pytest …`, `uv run ruff check …`, `uv run pyright …` so tools execute against the locked virtualenv.
-6. **Document the offline install path** — new `docs/AIRGAP_INSTALL.md` (≤50 lines): clone, `UV_INDEX_URL=https://internal-mirror`, `uv sync --frozen --offline`, `uv run pytest tests/ -x`.
-7. **Local verification (acceptance gates)**:
-   - `uv lock --check` → exit 0
-   - `python -m pytest tests/ -x` → all collected tests pass (baseline 1047)
-   - `ruff check src tests` → unchanged from baseline (13 pre-existing errors — NOT regressed)
-   - `pyright src/runtime` → unchanged from baseline (54 pre-existing errors — NOT regressed)
-   - `python scripts/build_single_file.py && git diff --exit-code dist/` → clean
-   - `git grep -nE 'https://ollama\.com|ollama\.com/api' -- src/` → zero matches (HARD-05 ratchet)
-   - `python -c 'import yaml; yaml.safe_load(open(".github/workflows/ci.yml"))'` → no parse error (no local yamllint installed)
-8. **Single atomic commit** on `refactor/framework-flow-control` per phase precedent.
-
-## Files Touched
-
-| File | Status | Why |
-| --- | --- | --- |
-| `pyproject.toml` | possibly add `[tool.uv]` block (else unchanged) | UV config / extras declaration |
-| `uv.lock` | **already present, unchanged** | Pre-existing; freshness re-verified at commit time |
-| `.github/workflows/ci.yml` | modified | Install via `uv sync --frozen`; add lockfile-freshness gate; run tools via `uv run` |
-| `docs/AIRGAP_INSTALL.md` | NEW | Offline install instructions |
-| `.planning/phases/14-reproducible-air-gap-lockfile/14-01-PLAN.md` | NEW | This file |
-| `.planning/phases/14-reproducible-air-gap-lockfile/14-01-SUMMARY.md` | NEW | After-action |
-| `.planning/phases/14-reproducible-air-gap-lockfile/14-VERIFICATION.md` | NEW | Per-success-criterion gates |
-
-## Out of Scope (deferred)
-
-- **Vendored wheels tarball** for true `--no-index` install — separate phase (called out in 14-CONTEXT.md `Deferred Ideas`).
-- **`Makefile` / `make bootstrap`** scaffolding — ROADMAP SC-2 wording mentions `make bootstrap` "or equivalent"; the equivalent is `uv sync --frozen [--offline]`. Documented in `docs/AIRGAP_INSTALL.md`.
-- **Pyright / ruff baseline cleanup** — existing pre-Phase-14 baselines preserved exactly; not a Phase 14 concern.
-
-## Hard-Stop Triggers (HALT, write BLOCKER.md)
-
-- `uv lock --check` reports drift after commit → root-cause and stop.
-- Any test in `tests/` newly fails with the lockfile-driven install AND root cause is the lockfile.
-- CI YAML edits don't validate as YAML.
-- `dist/*` regen produces a non-empty `git diff` after Phase 14 changes.
diff --git a/.planning/phases/14-reproducible-air-gap-lockfile/14-01-SUMMARY.md b/.planning/phases/14-reproducible-air-gap-lockfile/14-01-SUMMARY.md
deleted file mode 100644
index c62278d..0000000
--- a/.planning/phases/14-reproducible-air-gap-lockfile/14-01-SUMMARY.md
+++ /dev/null
@@ -1,83 +0,0 @@
----
-status: completed
-phase: 14-reproducible-air-gap-lockfile
-plan: 01
-subsystem: build / ci / dependencies
-tags: [hardening, air-gap, build, ci, lockfile]
-requires: [phase-13-llm-provider-hardening]
-provides: [uv.lock-CI-install, uv-lock-check-freshness-gate, docs/AIRGAP_INSTALL.md]
-affects: [pyproject.toml, .github/workflows/ci.yml, .gitignore, docs/AIRGAP_INSTALL.md, uv.lock]
-tech-stack:
-  added: [uv (Apache-2.0/MIT, single static binary, Astral)]
-  patterns: [pin+hash transitive lockfile, --frozen install, lockfile-drift CI gate]
-key-files:
-  created:
-    - docs/AIRGAP_INSTALL.md
-  modified:
-    - .github/workflows/ci.yml
-    - .gitignore
-  unchanged-but-canonical:
-    - pyproject.toml         # already PEP 621; no [tool.uv] needed
-    - uv.lock                # already in sync (uv lock --check exit 0)
-decisions:
-  - "Tool: uv 0.11.7 (Apache-2.0/MIT). Picked over pip-tools (loses uv.lock investment, no per-marker pinning) and poetry (would require [project] -> [tool.poetry] rewrite, violates minimal diff)."
-  - "uv.lock already exists (171 packages, 4430 lines, in sync per `uv lock --check`); Phase 14 wires CI to install from it, adds the freshness gate, and documents the offline path. No new lockfile generation required."
-  - "CI install: `uv sync --frozen --extra dev` (replaces `pip install -e .[dev]`). `--frozen` forbids re-resolving."
-  - "CI lockfile-drift gate: `uv lock --check` runs as the FIRST step inside the job (before install) so a stale uv.lock fails the build before anything else."
-  - "Tools (ruff, pyright, pytest) run via `uv run` so they execute against the locked virtualenv."
-  - "Pinned uv version 0.11.7 in CI (matches local) — bumps are deliberate, not silent."
-  - "Documented offline path in `docs/AIRGAP_INSTALL.md` (38 lines): clone -> UV_INDEX_URL=internal-mirror -> `uv sync --frozen [--offline]`. Negation rule added to .gitignore so docs/AIRGAP_INSTALL.md is the single shipped doc."
-  - "Single atomic commit per phase precedent (Phase 9-13)."
-metrics:
-  duration: "~15 min"
-  tasks-completed: 8
-  files-touched: 4    # (1 new, 2 modified, 1 planning .md whitelisted)
-  tests-added: 0       # pure infra, no new test surface
-  tests-total: 1044    # (1044 passed, 3 skipped — same as Phase 13)
-  ratchet-status: green
-  bundle-determinism: deterministic (`git diff --exit-code dist/` clean after regen)
-gates:
-  uv-lock-check: "Resolved 171 packages in 2ms — exit 0"
-  yaml-valid: "9 steps, parses clean"
-  ollama-grep-src: "0 matches (HARD-05 ratchet preserved)"
-  ruff: "13 errors (pre-Phase-14 baseline, unchanged)"
-  pyright-runtime: "54 errors (pre-Phase-14 baseline, unchanged)"
-  pyright-full: "329 errors (pre-Phase-14 baseline, unchanged)"
-  dist-regen-diff: "clean (exit 0)"
-  pytest: "1044 passed, 3 skipped"
----
-
-# Phase 14 Plan 01 Summary — Reproducible Air-Gap Dependency Lockfile
-
-## One-liner
-
-Wired the existing in-repo `uv.lock` into CI via `uv sync --frozen`, added a `uv lock --check` lockfile-freshness gate that fails the build on `pyproject.toml`/`uv.lock` drift, and documented the offline install path in `docs/AIRGAP_INSTALL.md` so an engineer behind a corporate firewall can reproduce the exact dependency graph from an internal mirror without public-internet access. Closes HARD-02 (CONCERNS C2).
-
-## What changed
-
-| File | Change |
-| --- | --- |
-| `.github/workflows/ci.yml` | Added `astral-sh/setup-uv@v6` (uv 0.11.7); added `uv lock --check` gate as first job step; replaced `pip install -e ".[dev]"` with `uv sync --frozen --extra dev`; rewrote `ruff` / `pyright` / `pytest` invocations as `uv run …` so they hit the locked venv. |
-| `docs/AIRGAP_INSTALL.md` (new) | 38-line offline-install recipe: clone → set `UV_INDEX_URL` → `uv sync --frozen [--offline]` → `uv run pytest tests/ -x`. |
-| `.gitignore` | Added `!docs/AIRGAP_INSTALL.md` negation so the air-gap install doc ships while the rest of `docs/` (Claude artefacts) stays ignored. |
-| `pyproject.toml` | Unchanged — already PEP 621; uv reads `[project]` natively, no `[tool.uv]` block required. |
-| `uv.lock` | Unchanged — already present, 4430 lines, 171 packages, in sync. Verified by `uv lock --check` exit 0. |
-
-## Acceptance gates (all green)
-
-```
-uv lock --check                                          : EXIT 0 (171 pkgs, 2 ms)
-python -c 'import yaml; yaml.safe_load(open(ci.yml))'    : 9 steps, parses
-git grep -nE 'https://ollama\.com|ollama\.com/api' src/  : 0 matches  (HARD-05 ratchet)
-ruff check src tests                                     : 13 errors  (pre-existing baseline)
-pyright src/runtime                                      : 54 errors  (pre-existing baseline)
-pyright                                                  : 329 errors (pre-existing baseline)
-python scripts/build_single_file.py && git diff dist/    : clean (exit 0)
-pytest tests/ -x                                         : 1044 passed, 3 skipped
-```
-
-## Out of scope (deferred)
-
-- A vendored-wheels tarball (truly `--no-index` install kit) — separate phase.
-- Pyright / ruff baseline cleanup — pre-existing baselines, not Phase 14 territory.
-- `Makefile` `make bootstrap` shim — `uv sync --frozen [--offline]` is the documented equivalent (ROADMAP SC-2 wording allows "or equivalent").
diff --git a/.planning/phases/14-reproducible-air-gap-lockfile/14-VERIFICATION.md b/.planning/phases/14-reproducible-air-gap-lockfile/14-VERIFICATION.md
deleted file mode 100644
index 57bca93..0000000
--- a/.planning/phases/14-reproducible-air-gap-lockfile/14-VERIFICATION.md
+++ /dev/null
@@ -1,141 +0,0 @@
----
-status: passed
-phase: 14
-phase_name: Reproducible Air-Gap Lockfile
-date: 2026-05-07
-verified: 2026-05-07T09:35:00Z
-score: 5/5 ROADMAP success criteria + 8/8 plan tasks verified
-overrides_applied: 0
-re_verification:
-  previous_status: null
-  is_re_verification: false
----
-
-# Phase 14 Verification Report — Reproducible Air-Gap Dependency Lockfile
-
-**Phase Goal (ROADMAP):** An engineer behind a corporate firewall can clone the repo, point at an internal package mirror, and reproduce the exact dependency graph used in CI / dev. Today `pyproject.toml` resolves freshly on every install — non-deterministic and breaks `~/.claude/rules/build.md`'s "vendor all dependencies" rule.
-
-**Requirement:** HARD-02 (CONCERNS C2)
-**Verified:** 2026-05-07
-**Status:** passed
-
----
-
-## Goal-Backward Verification (ROADMAP Success Criteria)
-
-### SC-1 — Committed lockfile pins every direct + transitive dep with version + hash — VERIFIED
-
-**Evidence:**
-- `uv.lock` present at repo root: 4430 lines, **171 packages** pinned (verified via `grep -E '^(name|version) = ' uv.lock | head`).
-- Every entry includes `source`, `version`, and per-distribution `sha256` hash (sample: `aiofile==3.9.0` with sdist + wheel hashes).
-- `requires-python = ">=3.11"` matches `pyproject.toml`.
-- `uv lock --check` exit code: **0** ("Resolved 171 packages in 2ms") — lockfile is in sync with `pyproject.toml`.
-
-### SC-2 — `make bootstrap` (or equivalent) installs from lockfile alone via internal mirror — VERIFIED
-
-**Evidence:**
-- `docs/AIRGAP_INSTALL.md` (NEW, 38 lines) documents the recipe:
-  ```
-  export UV_INDEX_URL="https://<internal-mirror>/simple/"
-  uv sync --frozen --extra dev
-  # or, fully offline (cache pre-warmed):
-  uv sync --frozen --offline --extra dev
-  ```
-- `uv sync --frozen` is the documented equivalent of `make bootstrap` (ROADMAP wording: "make bootstrap or equivalent"). It refuses to re-resolve and installs the exact set in `uv.lock` with hash verification.
-- `UV_INDEX_URL` env override redirects all package resolution to an internal mirror (no hardcoded public URLs).
-
-### SC-3 — CI installs from the lockfile, not the `pyproject.toml` solver — VERIFIED
-
-**Evidence (`.github/workflows/ci.yml`):**
-- New step `Set up uv` pins uv `0.11.7` via `astral-sh/setup-uv@v6`.
-- Replaced `run: pip install -e ".[dev]"` with `run: uv sync --frozen --extra dev`.
-- All downstream tool invocations (`ruff`, `pyright`, `pytest`) use `uv run`, ensuring they execute inside the locked virtualenv rather than a side-installed Python.
-- `--frozen` flag forbids re-resolution: any drift between `pyproject.toml` and `uv.lock` would fail this step (also caught earlier by SC-4).
-
-### SC-4 — Lockfile-drift CI gate fails the build on `pyproject.toml` change without lockfile update — VERIFIED
-
-**Evidence (`.github/workflows/ci.yml`):**
-- New step `Lockfile freshness gate (HARD-02)` runs `uv lock --check` BEFORE the install step.
-- `uv lock --check` exits non-zero when `pyproject.toml` and `uv.lock` are out of sync (would attempt to update the lockfile in dry-run mode).
-- Gate is positioned first so a stale lockfile fails fast.
-- Local invocation against current tree: exit 0 (clean baseline).
-
-### SC-5 — `dist/*` regenerated; existing test suite passes — VERIFIED
-
-**Evidence:**
-- `python scripts/build_single_file.py` ran clean; `git diff --exit-code dist/` exit code: **0** (no drift).
-- `python -m pytest tests/ -x` result: **1044 passed, 3 skipped, 0 failed** — matches Phase 13 baseline (`tests-total: 1044` per `13-01-SUMMARY.md` metrics).
-
----
-
-## Cross-Phase Ratchet Gates (preserved, not regressed)
-
-| Gate | Baseline (pre-Phase-14) | Phase 14 result | Status |
-| --- | --- | --- | --- |
-| `git grep -nE 'https://ollama\.com|ollama\.com/api' -- src/` (HARD-05) | 0 matches | 0 matches (exit 1) | Preserved |
-| `ruff check src tests` | 13 errors | 13 errors | Preserved (pre-existing baseline; not a Phase 14 deliverable) |
-| `pyright src/runtime` | 54 errors | 54 errors | Preserved (pre-existing baseline) |
-| `pyright` (full) | 329 errors | 329 errors | Preserved (pre-existing baseline) |
-| `pytest tests/ -x` | 1044 passed / 3 skipped | 1044 passed / 3 skipped | Preserved |
-| `git diff --exit-code dist/` after `build_single_file.py` | clean | clean | Preserved |
-| `uv lock --check` | exit 0 | exit 0 | Preserved (still in sync) |
-
----
-
-## Hard-Constraint Verification (from prompt)
-
-| Constraint | Verdict | Notes |
-| --- | --- | --- |
-| Air-gapped target — no new public-internet calls | PASS | uv reads from `UV_INDEX_URL` (internal mirror); `--frozen` + `--offline` documented. |
-| No `curl | sh` in any script | PASS | `docs/AIRGAP_INSTALL.md` explicitly says "ship via your internal artifact store — do not `curl | sh`". |
-| Permissive license for new tooling | PASS | uv: Apache-2.0 / MIT (dual-licensed). |
-| No version downgrades vs `pyproject.toml` `>=` | PASS | uv.lock unchanged from already-resolved state; `uv lock --check` exit 0 confirms no rewrite. |
-| Reproducible — same inputs same dep set | PASS | uv.lock pins version + sha256 per platform marker. |
-| Existing test suite passes | PASS | 1044 passed / 3 skipped. |
-| CI builds successfully from lockfile | PASS (locally validated; CI run will land on next push) | YAML parses; steps in correct order; `uv sync --frozen` is the canonical install command. |
-| No code outside Phase 14 scope touched | PASS | Only `.github/workflows/ci.yml`, `.gitignore`, new `docs/AIRGAP_INSTALL.md`, plus phase planning files. |
-
----
-
-## Tool Selection Audit (`~/.claude/rules/dependencies.md`)
-
-| Criterion | uv (chosen) |
-| --- | --- |
-| License: MIT/Apache/BSD only | Apache-2.0 + MIT (dual) — PASS |
-| Active maintenance | Astral, weekly releases — PASS |
-| Single-maintainer bus factor | Backed by Astral team — PASS |
-| Low transitive footprint | Zero Python deps (Rust binary) — PASS |
-| Works fully offline once installed | `--offline`, `--frozen` first-class flags — PASS |
-| Lockfile with full hashes | `uv.lock` pins sha256 per dist per platform marker — PASS |
-| PEP 621 (`pyproject.toml` `[project]`) compatible | Native, no rewrite — PASS |
-| Generates lockfile reproducibly | Same `pyproject.toml` + uv version → identical `uv.lock` — PASS |
-
-Rejected alternatives:
-- **pip-tools** — Would forfeit `uv.lock` (already in repo, 171 pkgs) and per-marker hash pinning.
-- **poetry** — Would require rewriting `[project]` → `[tool.poetry]`, violating minimal-diff scope.
-
----
-
-## Hard-Stop Triggers Checklist (none triggered)
-
-- Selected tool requires public internet at runtime/CI: **NO** — uv supports `--offline` and reads from `UV_INDEX_URL`.
-- Lockfile downgrades a dep below `pyproject.toml` `>=`: **NO** — `uv lock --check` exit 0 means no resolution changes occurred.
-- Test suite fails after lockfile in place AND root cause is the lockfile: **NO** — 1044 passed / 3 skipped, identical to Phase 13 baseline.
-- CI YAML edits don't validate: **NO** — `python -c 'import yaml; yaml.safe_load(open(...))'` parses cleanly; 9 steps detected.
-- Selected tool requires non-permissive license: **NO** — uv is Apache-2.0 + MIT.
-- `dist/*` not deterministic: **NO** — `git diff --exit-code dist/` clean.
-
----
-
-## Files of Record
-
-- `pyproject.toml` (unchanged — already PEP 621; uv reads `[project]` natively)
-- `uv.lock` (unchanged — already in sync, 171 packages, sha256-pinned)
-- `.github/workflows/ci.yml` (modified — uv setup + lockfile gate + `uv sync --frozen` + `uv run` for tools)
-- `.gitignore` (modified — `!docs/AIRGAP_INSTALL.md` negation so the install doc ships)
-- `docs/AIRGAP_INSTALL.md` (NEW — 38-line offline install recipe)
-- `.planning/phases/14-reproducible-air-gap-lockfile/14-01-PLAN.md` (NEW)
-- `.planning/phases/14-reproducible-air-gap-lockfile/14-01-SUMMARY.md` (NEW)
-- `.planning/phases/14-reproducible-air-gap-lockfile/14-VERIFICATION.md` (NEW — this file)
-
-**Verdict:** All 5 ROADMAP success criteria, all 8 plan tasks, all 7 cross-phase ratchet gates, and all 8 hard constraints verified. Phase 14 status: **passed**.
diff --git a/README.md b/README.md
new file mode 100644
index 0000000..046a5d5
--- /dev/null
+++ b/README.md
@@ -0,0 +1,49 @@
+# ASR — Multi-Agent Runtime Framework
+
+Python multi-agent runtime built on **LangGraph** (orchestration) +
+**FastMCP** (tool dispatch), with HITL gate, markdown turn-output
+contract, and a single-file deploy bundle for air-gapped corporate
+targets.
+
+Two reference apps live in the same repo to prove the runtime is
+generic:
+
+- **`examples/incident_management/`** — 4-skill investigation
+  pipeline (intake → triage → deep_investigator → resolution) with
+  ASR memory layers (Knowledge Graph, Release Context, Playbooks).
+- **`examples/code_review/`** — 3-skill PR review pipeline (intake
+  → analyzer → recommender).
+
+## Quick start
+
+```bash
+uv sync --frozen --extra dev
+uv run pytest tests/ -x
+
+# Run the incident-management app via the CLI entrypoint
+uv run python -m runtime --config config/incident_management.yaml
+
+# Streamlit UI
+ASR_LOG_LEVEL=INFO uv run streamlit run src/runtime/ui.py --server.port 37777
+```
+
+Set provider keys in `.env` (`OLLAMA_API_KEY`, `OPENROUTER_API_KEY`,
+`AZURE_OPENAI_KEY`, …) and switch `llm.default` /
+`skill.model` overrides in `config/config.yaml`.
+
+## Documentation
+
+- **[`docs/DESIGN.md`](docs/DESIGN.md)** — architecture, core
+  abstractions, runtime model, storage, deployment, decision log,
+  milestone history. **Start here** if you're new to the codebase.
+- **[`docs/DEVELOPMENT.md`](docs/DEVELOPMENT.md)** — day-to-day
+  contributor loop: setup, regenerating `dist/`, adding a runtime
+  module.
+- **[`docs/AIRGAP_INSTALL.md`](docs/AIRGAP_INSTALL.md)** —
+  air-gapped / internal-mirror install procedure.
+
+## Status
+
+`main` carries v1.0 → v1.5. v2.0 (React UI replacing the Streamlit
+prototype) is the next big move. See `docs/DESIGN.md` § 13 for the
+milestone history and § 14 for the pending list.
diff --git a/docs/DESIGN.md b/docs/DESIGN.md
new file mode 100644
index 0000000..a9d5296
--- /dev/null
+++ b/docs/DESIGN.md
@@ -0,0 +1,938 @@
+# ASR Multi-Agent Runtime Framework — Design & Decisions
+
+> **Audience.** New contributors and operators who need one document
+> covering what the framework is, how it composes, and *why* the
+> non-obvious decisions are the way they are.
+>
+> **Scope.** Architecture, core abstractions, runtime model, storage,
+> deployment, and a decision log. Operational how-tos live in
+> `docs/DEVELOPMENT.md` (dev workflow) and `docs/AIRGAP_INSTALL.md`
+> (corporate-mirror install).
+
+---
+
+## 1. What it is
+
+ASR is a generic Python multi-agent runtime that wraps **LangGraph**
+for orchestration and **FastMCP** for tool dispatch, adds a HITL
+gateway and a markdown turn-output contract on top, and ships as a
+single-file bundle into air-gapped corporate environments.
+
+Two reference apps live in the same repo to prove the runtime is
+genuinely generic:
+
+- **`examples/incident_management/`** — 4-skill investigation
+  pipeline (intake → triage → deep_investigator → resolution) with
+  ASR memory layers (L2 Knowledge Graph, L5 Release Context, L7
+  Playbook Store) and a remediation workflow that pauses on
+  high-risk actions.
+- **`examples/code_review/`** — 3-skill PR review pipeline (intake
+  → analyzer → recommender). Built specifically to surface every
+  framework leak that would have made the runtime
+  incident-shaped — those leaks were lifted into the framework
+  rather than worked around.
+
+What the framework owns: session lifecycle, agent dispatch, tool
+gateway, HITL pause/resume, telemetry, storage, deployment bundling.
+
+What an app owns: domain `Session` subclass, MCP servers, skill
+prompts + per-skill YAML, `App*Config` for cross-cutting domain
+knobs (severity aliases, escalation roster, similarity thresholds).
+
+---
+
+## 2. Architecture at a glance
+
+Layers from bottom to top:
+
+```
++------------------------------------------------------------+
+| App layer (examples/incident_management, examples/code_review)
+| - state.py, config.py, skills/, mcp_server.py, ui.py       |
++------------------------------------------------------------+
+| Framework — runtime/                                       |
+| - Session, Skill, AgentRun, ToolCall, AgentTurnOutput      |
+| - Orchestrator, OrchestratorService                        |
+| - Gateway (wrap_tool), policies, ToolRegistry              |
+| - SessionStore, HistoryStore, EventLog                     |
+| - graph.py: build_graph + make_agent_node                  |
+| - llm.py: provider abstraction                             |
+| - ui.py: Streamlit shell                                   |
+| - api.py: FastAPI surface                                  |
++------------------------------------------------------------+
+| LangGraph 1.x  (orchestration / state / checkpointing)     |
+| LangChain 1.x  (chat models, agents.create_agent, tools)   |
+| FastMCP        (in-process / stdio / http MCP servers)     |
++------------------------------------------------------------+
+| Providers: Ollama Cloud · OpenRouter · Azure OpenAI · …    |
++------------------------------------------------------------+
+```
+
+**Control flow for one session** (steady state):
+
+```
+UI / API  ──start_session──▶  OrchestratorService  ──▶  Orchestrator
+                                                            │
+                                                            ▼
+                                  build_graph (langgraph StateGraph)
+                                                            │
+                                       per-agent step       ▼
+                              ┌───────────────────────────────────┐
+                              │ make_agent_node                   │
+                              │ - reload session from store       │
+                              │ - emit agent_started event        │
+                              │ - wrap_tool(s) with gateway       │
+                              │ - create_agent (langchain/langgraph)
+                              │ - _drive_agent_with_resume        │
+                              │     loop: ainvoke / handle pause  │
+                              │ - parse_envelope_from_result      │
+                              │ - record AgentRun                 │
+                              │ - decide route from signal        │
+                              └───────────────────────────────────┘
+                                                            │
+                              gate node? (low confidence) ──▼
+                              terminal tool? ──▶ status set by tool
+                              else ──▶ default_terminal_status
+                                                            │
+                                                            ▼
+                                  finalize_session_status_async
+```
+
+---
+
+## 3. Core abstractions
+
+### 3.1 `Session` (`src/runtime/state.py`)
+
+The framework's unit of work. All apps subclass it; the framework
+itself only reads/writes the fields declared on the base class.
+
+```python
+class Session(BaseModel):
+    id: str
+    status: str
+    created_at: str
+    updated_at: str
+    deleted_at: str | None
+    agents_run: list[AgentRun]
+    tool_calls: list[ToolCall]
+    findings: dict[str, Any]
+    token_usage: TokenUsage
+    pending_intervention: dict | None
+    user_inputs: list[str]
+    parent_session_id: str | None       # dedup linkage
+    dedup_rationale: str | None
+    extra_fields: dict[str, Any]        # bag for app-specific domain data
+    version: int                        # optimistic concurrency
+    turn_confidence_hint: float | None  # transient (excluded from persistence)
+```
+
+Apps add domain fields on a subclass:
+
+```python
+class IncidentState(Session):
+    query: str
+    environment: str
+    reporter: Reporter
+    severity: str
+    summary: str
+    resolution: str | None
+```
+
+Fields the row schema doesn't have a column for round-trip via the
+`extra_fields` JSON bag — see [§ 8 Storage](#8-storage).
+
+### 3.2 `Skill` (`src/runtime/skill.py`)
+
+YAML-driven configuration unit:
+
+```yaml
+name: triage
+description: Hypothesis-loop triage agent
+kind: responsive               # responsive | supervisor | monitor
+model: gpt_oss_cheap           # optional per-agent override
+tools:
+  local_inc: [submit_hypothesis, update_incident]
+  local_observability: [get_logs, get_metrics, ...]
+routes:
+  - when: success
+    next: deep_investigator
+  - when: needs_input
+    next: __end__
+    gate: confidence
+  - when: default
+    next: deep_investigator
+system_prompt: |
+  ...
+```
+
+Three `kind`s:
+
+- `responsive` — ReAct LLM agent (the default; uses
+  `langchain.agents.create_agent`).
+- `supervisor` — non-LLM rule-based dispatcher (or LLM-dispatched
+  via `dispatch_strategy: llm`); used by intake to pre-filter.
+- `monitor` — out-of-band runner (e.g. `MonitorRunner`); not a graph
+  node.
+
+### 3.3 `AgentRun` + `ToolCall` (`src/runtime/state.py`)
+
+Append-only audit rows:
+
+```python
+class AgentRun(BaseModel):
+    agent: str
+    started_at: str
+    ended_at: str
+    summary: str                      # final_text, or "agent failed: <exc>"
+    token_usage: TokenUsage
+    confidence: float | None
+    confidence_rationale: str | None
+    signal: str | None
+
+class ToolCall(BaseModel):
+    agent: str
+    tool: str
+    args: dict
+    result: dict | str | list | int | float | bool | None
+    ts: str
+    risk: ToolRisk | None             # low | medium | high
+    status: ToolStatus                # executed | executed_with_notify
+                                      # | pending_approval | approved
+                                      # | rejected | timeout
+    approver: str | None
+    approved_at: str | None
+    approval_rationale: str | None
+```
+
+### 3.4 `Orchestrator` + `OrchestratorService`
+
+- `Orchestrator` (`src/runtime/orchestrator.py`) — owns the compiled
+  langgraph, the `SessionStore`, the per-session async lock
+  registry, and the synchronous lifecycle methods (`start_session`,
+  `stream_session`, `resume_session`, `retry_session`).
+- `OrchestratorService` (`src/runtime/service.py`) — long-lived
+  asyncio loop wrapper around `Orchestrator`. Owns the loop thread,
+  registers in-flight sessions, exposes a thread-safe `submit_async`
+  / `submit_and_wait` bridge so the Streamlit UI thread and the
+  FastAPI request handlers can both schedule work without fighting
+  over the same FastMCP / SQLAlchemy transports.
+
+### 3.5 `wrap_tool` Gateway (`src/runtime/tools/gateway.py`)
+
+Every `BaseTool` an agent sees is wrapped by the gateway. The
+wrapper:
+
+1. Injects session-derived args (e.g. `environment` from the
+   session row) before the LLM-visible arg surface, so the LLM
+   physically cannot fabricate them.
+2. Consults the risk policy:
+   - `low` → run, emit `tool_invoked` with `status=executed`.
+   - `medium` → run, append a `executed_with_notify` audit row.
+   - `high` → call `langgraph.types.interrupt(payload)`, append a
+     `pending_approval` row, save to DB, pause the graph.
+3. After resume:
+   - On `approve` → run the inner tool, update the pending row to
+     `approved`, save.
+   - On `reject` / `timeout` → return a marker dict, update the
+     pending row to the matching status, save.
+
+### 3.6 `AgentTurnOutput` envelope
+(`src/runtime/agents/turn_output.py`)
+
+The structured output every agent must produce per turn:
+
+```python
+class AgentTurnOutput(BaseModel):
+    content: str
+    confidence: float                 # [0.0, 1.0], reconciled
+    confidence_rationale: str
+    signal: str | None                # success | failed | needs_input | None
+```
+
+How the envelope is sourced — see [§ 6 Markdown turn output](#6-markdown-turn-output-contract-phase-22).
+
+---
+
+## 4. Runtime model
+
+### 4.1 Session lifecycle
+
+States a session walks through:
+
+```
+new ─▶ in_progress ─▶ <terminal>
+                       resolved | escalated | needs_review |
+                       awaiting_input | error | stopped | duplicate
+```
+
+- `new` — row created, graph not yet entered.
+- `in_progress` — at least one agent has run; non-terminal.
+- Terminal states are set by:
+  - **Terminal tool calls** (e.g. `mark_resolved` → `resolved`,
+    `mark_escalated` → `escalated`); the tool registry maps tool
+    names to status transitions.
+  - **`default_terminal_status`** (`needs_review` for incident
+    management) when the graph completes without a terminal tool.
+  - **`_handle_agent_failure`** → `error` on agent exceptions.
+  - **`stop_session()`** → `stopped` on explicit cancellation.
+  - **`dedup_check`** → `duplicate` (with `parent_session_id`) when
+    stage-2 LLM dedup confirms a match against a prior closed
+    session.
+
+### 4.2 Per-agent dispatch
+(`src/runtime/graph.py:_build_agent_nodes`)
+
+For every skill in `cfg.orchestrator.skills`:
+
+```python
+llm = get_llm(cfg.llm, skill.model, role=agent_name, ...)
+node = make_agent_node(skill=skill, llm=llm, tools=run_tools, ...)
+sg.add_node(agent_name, node)
+```
+
+`skill.model` is the per-agent override; falls through to
+`cfg.llm.default` when `None`. This is what lets intake run on
+Ollama while triage / DI / resolution run on OpenRouter — see the
+v1.5-C decision below.
+
+### 4.3 Routing
+
+`skill.routes` is a list of `(when, next, gate?)` rules. The
+runtime evaluates them after each agent step:
+
+```yaml
+routes:
+  - when: success           # signal value
+    next: deep_investigator
+  - when: needs_input
+    next: __end__
+    gate: confidence        # route through gate node first
+  - when: default           # fallback
+    next: triage
+```
+
+The framework's gate node fires when the upstream agent's confidence
+is below `framework.confidence_threshold` (default 0.75). The gate
+emits a `pending_intervention` and the session moves to
+`awaiting_input` until the operator supplies a `resume_with_input`
+verdict. Agents emit signals via the `signal` arg of typed-terminal
+or patch tools.
+
+### 4.4 Termination
+
+Three independent paths:
+
+1. **Tool-driven** — an agent calls a tool the registry recognises
+   as terminal (`local_inc:mark_resolved`, `…:mark_escalated`).
+   The tool sets `inc.status` directly.
+2. **Inferred** — `_finalize_session_status` walks `tool_calls`
+   matching against `cfg.orchestrator.terminal_tools` rules.
+3. **Default** — falls through to
+   `cfg.orchestrator.default_terminal_status` when no rule fires
+   AND the graph wasn't paused on a HITL gate.
+
+The pause-aware guard
+(`Orchestrator._is_graph_paused`) is what keeps a paused HITL
+session from being coerced to `default_terminal_status` while the
+operator is still deciding.
+
+---
+
+## 5. LLM provider story
+
+### 5.1 Three layers
+
+```
++----------------------------------------------------------+
+| Skill (YAML)         model: gpt_oss_cheap                |
++----------------------------------------------------------+
+| runtime.llm.get_llm  resolves name → cfg.models[name]    |
+|                       → ProviderConfig → BaseChatModel    |
++----------------------------------------------------------+
+| LangChain provider class                                 |
+|   - ChatOpenAI         openai_compat (OpenRouter)        |
+|   - ChatOllama         ollama (Ollama Cloud + local)     |
+|   - AzureChatOpenAI    azure_openai                      |
++----------------------------------------------------------+
+| Driven by langchain.agents.create_agent (langgraph subgraph) |
++----------------------------------------------------------+
+```
+
+### 5.2 Provider config
+
+`config/config.yaml` declares providers + named models:
+
+```yaml
+llm:
+  default: workhorse
+  providers:
+    ollama_cloud:
+      kind: ollama
+      base_url: https://ollama.com
+      api_key: ${OLLAMA_API_KEY}
+    azure:
+      kind: azure_openai
+      endpoint: ${AZURE_ENDPOINT}
+      api_version: 2024-08-01-preview
+      api_key: ${AZURE_OPENAI_KEY}
+    openrouter:
+      kind: openai_compat
+      base_url: https://openrouter.ai/api/v1
+      api_key: ${OPENROUTER_API_KEY}
+  models:
+    workhorse:
+      provider: openrouter
+      model: inclusionai/ring-2.6-1t:free
+    gpt_oss:
+      provider: ollama_cloud
+      model: gpt-oss:20b
+    smart:
+      provider: azure
+      model: gpt-4o
+      deployment: gpt-4o
+```
+
+### 5.3 429 retry regime (v1.5-D)
+
+`_ainvoke_with_retry` (`src/runtime/graph.py`) splits transient
+errors into two classes:
+
+| Class | Markers | Backoff | Total |
+|---|---|---|---|
+| 5xx + connection | `internal server error`, `status code: 5xx`, `connection reset`, `remoteprotocolerror`, `incomplete chunked read` | 1.5s × attempt | ~9s |
+| 429 / rate-limit | `status code: 429`, `error code: 429`, ` 429`, `429 `, `ratelimiterror`, `rate limit`, `rate-limited`, `too many requests` | 7.5s × attempt | ~45s |
+
+Non-429 4xx (auth, validation) propagates immediately so quota /
+schema problems fail fast.
+
+### 5.4 Live verification
+
+`tests/test_integration_driver_s1.py` parametrises three legs
+(`local`, `workhorse`, `azure`); each skips independently if its
+keys are absent. Run with `OLLAMA_API_KEY + OLLAMA_BASE_URL`,
+`OPENROUTER_API_KEY`, and/or `AZURE_OPENAI_KEY + AZURE_ENDPOINT`
+exported.
+
+---
+
+## 6. Markdown turn-output contract (Phase 22)
+
+### 6.1 Why
+
+Pre-Phase-22 the framework forced agents through
+`response_format=AgentTurnOutput` (a JSON schema). Multiple problems:
+
+- gpt-oss / Ollama models drifted on JSON schema adherence.
+- LangGraph's `with_structured_output` second pass interacted badly
+  with the React END signal under `recursion_limit=25`.
+- Adding tools to the schema confused some providers' tool dispatch.
+
+Phase 22 dropped `response_format` and made the agent close its turn
+with a markdown contract block. Markdown is the format every chat
+model writes well; the parse step happens in the framework where
+leniency is in our control.
+
+### 6.2 The contract
+
+Every skill prompt ends with:
+
+```
+## Output contract — REQUIRED
+
+Every final reply MUST end with these three sections, in order, each
+preceded by a level-2 markdown header:
+
+  ## Response
+  <body>
+
+  ## Confidence
+  <0.0-1.0 float> -- <one-line rationale>
+
+  ## Signal
+  <success|failed|needs_input|none>
+
+**CRITICAL — final-reply rule:** the markdown envelope is mandatory;
+the framework hard-fails if it is missing.
+```
+
+### 6.3 Parse paths
+
+`parse_envelope_from_result` walks 6 paths and falls through to a
+hard fail:
+
+| Path | Source | When it fires |
+|---|---|---|
+| 1 | `result["structured_response"]` | Pre-Phase-22 stub fixtures and explicit-schema callers |
+| 2 | JSON-decode last AIMessage content | Models that still emit valid JSON |
+| 4 | `_parse_confidence_line` over the `## Confidence` body | Markdown-primary path; the production happy path |
+| 5 | Typed-terminal-tool args (`confidence`, `confidence_rationale`, `resolution_summary`) | Models that treat a terminal tool call as completion |
+| 6 | Permissive: any tool was called → synthesise a 0.30-confidence placeholder | Last-ditch fallback so the session reaches a terminal status instead of hard-failing |
+| 7 | `raise EnvelopeMissingError` | Truly nothing parseable |
+
+(Path 3 was the original location for what became Path 4; the
+numbering is preserved in code comments to keep historical commits
+diff-friendly.)
+
+### 6.4 gpt-oss compatibility quirks
+
+- gpt-oss prefers EN DASH (`–`, `–`) over EM DASH (`—`,
+  `—`); the dash separator accepts the full Unicode Pd block.
+- gpt-oss sometimes emits an empty closing AIMessage after a tool
+  call; Path 5 / Path 6 cover that.
+- The skill prompts carry an explicit
+  `**CRITICAL — final-reply rule:**` paragraph because gpt-oss
+  initially treated the first tool result as completion.
+
+The procedural confidence-line parser
+(`_parse_confidence_line`) replaces an earlier regex that Sonar's
+S5852 (regex DoS) flagged; the procedural form has no backtracking
+surface to attack.
+
+---
+
+## 7. HITL approve / reject
+
+### 7.1 The risk-rated gateway
+
+Tools are policy-gated per
+`cfg.runtime.gateway.policy`:
+
+```yaml
+runtime:
+  gateway:
+    policy:
+      apply_fix: high                 # gate
+      restart_service: medium         # notify-only audit
+      get_logs: low                   # default; no row written
+```
+
+Apps configure `cfg.orchestrator.gate_policy` for cross-cutting
+behaviour:
+
+```yaml
+gate_policy:
+  threshold: 0.75
+  gated_environments: [production]
+  gated_risk_actions: [approve]
+  resolution_trigger_tools: ['local_remediation:apply_*']
+```
+
+### 7.2 Pause / resume on langgraph 1.x (PR #6)
+
+langgraph 1.x changed the `interrupt()` contract: a tool calling
+`interrupt()` no longer raises `GraphInterrupt` to the caller —
+`agent.ainvoke()` returns a normal result with
+`result["__interrupt__"]` populated. The framework's wrapper had to
+catch up:
+
+- `_drive_agent_with_resume` (`src/runtime/graph.py`) detects an
+  inner pause via `agent_executor.aget_state(inner_cfg).next` being
+  non-empty, calls outer `interrupt()` to fetch the verdict, and
+  forwards via `agent_executor.ainvoke(Command(resume=verdict),
+  config=inner_cfg)`.
+- The inner `create_agent` now receives the orchestrator's
+  checkpointer + a deterministic per-invocation thread id
+  (`f"{inc_id}:agent:{skill.name}:turn{len(agents_run)}"`). Without
+  these, `Command(resume=…)` raises and the gated tool gets silently
+  skipped.
+- `make_agent_node` reloads from `store.load(inc_id)` at entry —
+  defends against stale `state["session"]` snapshots from outer
+  Pregel checkpoints (which capture state at step boundaries, not
+  mid-step).
+- `gateway.wrap_tool` calls `store.save` after every status
+  transition (rejected / timeout / approved) so the audit row in
+  the DB matches the operator's actual decision.
+- `Orchestrator._is_graph_paused` guards
+  `_finalize_session_status_async` in `stream_session` /
+  `retry_session` / the API approval handler — a HITL pause must
+  not be coerced into `default_terminal_status`.
+
+These five fixes shipped together as PR #6; before them, clicking
+Approve would do nothing because the framework had already moved
+past the pause point.
+
+### 7.3 Approval surface
+
+Two ways to resolve a `pending_approval`:
+
+- **UI** — `_render_pending_approvals_block` shows the Approve /
+  Reject buttons and rationale field; click drives
+  `Command(resume={"decision": "approve", ...})` via
+  `OrchestratorService.submit_and_wait`.
+- **API** — `POST /sessions/{sid}/approvals/{tcid}` does the same
+  resume, scoped under the per-session lock so two concurrent
+  approvals on the same thread can't race.
+
+### 7.4 Approval watchdog
+(`src/runtime/tools/approval_watchdog.py`)
+
+Background task that scans `pending_approval` rows older than
+`framework.approval_timeout` and resolves them with `verdict=timeout`,
+freeing operators from manual intervention on stale rows. Triggered
+by the lifespan startup hook.
+
+---
+
+## 8. Storage
+
+### 8.1 SessionStore (`src/runtime/storage/session_store.py`)
+
+CRUD for the row schema. Owns:
+
+- `_next_id` — monotonic per-day sequence; respects
+  `state_cls.id_format(seq=…, prefix=…)` so each app picks its own
+  ID namespace (`INC-…`, `CR-…`, etc.).
+- `save` — optimistic-version update. Bumps `version`; raises
+  `StaleVersionError` on mismatch so the caller can reload + retry.
+- `_row_to_incident` / `_incident_to_row_dict` — round-trip
+  between `IncidentRow` (SQLAlchemy) and the app's `Session`
+  subclass. Fields the row schema has columns for go to typed
+  fields; everything else lands in `extra_fields` JSON.
+- Vector write-through — `_persist_vector` / `_add_vector` /
+  `_refresh_vector` keep a FAISS index aligned with the row
+  table.
+
+### 8.2 IncidentRow (`src/runtime/storage/models.py`)
+
+The persistent schema. Fields are deliberately broad enough to host
+the example apps' typed fields (`severity`, `reporter_id`,
+`reporter_team`, `summary`, `tags`, `parent_session_id`,
+`dedup_rationale`, `extra_fields` JSON) without forcing every app
+to declare them. An app's `Session` subclass declares whichever
+typed fields it cares about; the rest stay in `extra_fields`.
+
+The `severity` / `reporter_*` columns ARE incident-shaped — the
+v1.5-B generic-noun pass left them in place because renaming would
+require a schema migration. Apps that don't model severity or a
+human submitter ignore those columns; the round-trip silently
+omits them.
+
+### 8.3 HistoryStore (`src/runtime/storage/history_store.py`)
+
+Read-only similarity search over the same engine + vector store.
+Used by intake's similarity retrieval (`lookup_similar_incidents`).
+Filter dimensions are pluggable — apps construct a
+`HistoryStore(filter_resolver=…)` matching their own row shape.
+
+### 8.4 LangGraph checkpointer
+(`src/runtime/checkpointer.py`)
+
+Separate from the SessionStore. SQLite default (`sqlite:////tmp/asr.db`),
+Postgres optional via `runtime.checkpointer_postgres`. Holds langgraph
+Pregel state + pending interrupts. The HITL approve / reject path
+relies on this checkpointer being durable.
+
+### 8.5 EventLog (`src/runtime/storage/event_log.py`)
+
+Append-only `session_events` table. Records:
+
+- `agent_started`, `agent_finished`, `confidence_emitted`,
+  `route_decided`
+- `tool_invoked` (every wrapped tool call, with latency + result_kind)
+- `gate_fired` (HITL gate decisions)
+- `status_changed` (terminal-status transitions with cause)
+- `lesson_extracted` (M5/M6 auto-learning)
+
+Per-step events feed any external observability stack and the
+auto-learning pipeline.
+
+---
+
+## 9. Memory layers (incident_management example)
+
+The incident-management app ships an ASR (Automated Site Reliability)
+memory bundle hydrated by the supervisor at intake:
+
+| Layer | What | Backend |
+|---|---|---|
+| L2 | Knowledge Graph — services, owners, runbooks, dependencies | `examples/incident_management/asr/kg_store.py` (filesystem JSON) |
+| L5 | Release Context — recent deploys per service | `release_store.py` (filesystem JSON) |
+| L7 | Playbooks — known-good remediation steps per failure mode | `playbook_store.py` (filesystem JSON) |
+
+`hydrate_and_gate` (in the example's MCP server) walks the user's
+query, extracts mentioned components, and returns a `MemoryLayerState`
+bundle that the triage agent reads as additional context.
+
+This is **app-level**, not framework — the runtime stays memory-
+agnostic. A different app can ship its own L1/L2/L3 memory layers
+without touching `runtime/`.
+
+---
+
+## 10. Deployment
+
+### 10.1 Air-gapped target
+
+The deployment env is corporate / air-gapped: no public-internet
+runtime calls, no CDN fetches, no `pip install` at deploy time.
+
+### 10.2 Single-file bundle (BUNDLER-01)
+
+`scripts/build_single_file.py` flattens the runtime + each app into
+self-contained `.py` files under `dist/`:
+
+| File | Contents |
+|---|---|
+| `dist/app.py` | framework only — no example code |
+| `dist/apps/incident-management.py` | framework + incident_management example |
+| `dist/apps/code-review.py` | framework + code_review example |
+| `dist/ui.py` | Streamlit shell |
+
+CI gate `Bundle staleness gate (HARD-08)` rebuilds the bundles
+from `src/` and fails the build if they don't match the committed
+`dist/*` — this keeps the deploy bundles "repaired by construction"
+on every merge.
+
+### 10.3 7-file deploy payload
+
+Copy onto the target host:
+
+```
+app.py                    (renamed from dist/apps/<app>.py)
+ui.py                     (dist/ui.py)
+config/config.yaml        (framework: LLM, MCP, storage)
+config/<app>.yaml         (app: severity aliases, escalation roster, …)
+config/skills/            (skill prompts, optional override)
+.env                      (provider keys)
+```
+
+Boot:
+
+```bash
+python -m runtime --config config/<app>.yaml
+streamlit run ui.py --server.port 37777
+```
+
+### 10.4 Reproducible install (HARD-02)
+
+`uv.lock` pins direct + transitive deps with sha256 hashes. CI
+installs from the lock with `uv sync --frozen`; an internal
+package mirror is sufficient for a fully offline build. See
+`docs/AIRGAP_INSTALL.md`.
+
+---
+
+## 11. Telemetry + auto-learning (M1–M9)
+
+### 11.1 Per-step events
+
+Every meaningful boundary emits an `EventLog` row keyed by
+`session_id`. The four agent-boundary events
+(`agent_started → confidence_emitted → route_decided →
+agent_finished`) fire in order; `tool_invoked` and `gate_fired`
+fire at the gateway boundary.
+
+### 11.2 Lesson store
+
+`src/runtime/learning/extractor.py` runs at session finalize and
+distills outcome + winning hypothesis + applied fix into a
+`Lesson` row. The intake supervisor reads recent lessons via
+`LessonStore.find_relevant(query, …)` to prime the next session.
+
+### 11.3 Lesson refresher
+
+`src/runtime/learning/scheduler.py` runs an APScheduler job
+nightly (configurable) that walks recent sessions and extracts
+lessons missed at finalize time (e.g. sessions resolved manually
+in the UI long after the agent's run).
+
+---
+
+## 12. Decision log
+
+Compact rationale for the non-obvious calls. Each entry is a single
+"why".
+
+### DEC-001. LangGraph as orchestration engine
+
+**When.** From the start.
+**Why.** Out-of-the-box Pregel-style step boundaries +
+checkpointing + first-class HITL `interrupt()` semantics. We don't
+maintain a graph engine ourselves; we just wrap it.
+
+### DEC-002. `langchain.agents.create_agent` for the per-agent loop (Phase 15)
+
+**When.** v1.3 hardening, after `langgraph.prebuilt.create_react_agent`
+was deprecated.
+**Why.** Single tool-loop with native ToolStrategy fallback, removes
+the `recursion_limit=25` workaround we previously needed.
+
+### DEC-003. Markdown contract over `response_format` JSON (Phase 22)
+
+**When.** v1.5-A.
+**Why.** JSON-schema-shaped output via `response_format` triggered a
+class of brittleness across providers (model-specific JSON drift,
+tool-strategy + React END interaction, recursion_limit ceilings).
+Markdown is the native format every chat model writes well; the parse
+step happens in the framework where leniency is in our control. Path
+5 / Path 6 fallbacks cover models that occasionally drop the
+contract.
+
+### DEC-004. Pure-policy HITL gating (Phase 11)
+
+**When.** v1.2.
+**Why.** The gate decision (high-risk tool? gated env? low
+confidence?) was previously scattered across the gateway, the
+orchestrator, and the skill prompts. Phase 11 moved it into a single
+pure function `should_gate(session, tool_call, confidence, cfg)` so
+auditing what gates is a one-grep operation.
+
+### DEC-005. Generic `Session` base + `extra_fields` JSON (v1.1 decoupling)
+
+**When.** v1.1.
+**Why.** Pre-v1.1 the framework had `IncidentState` baked in. Adding
+a second app (code_review) was the forcing function — every
+"incident-shaped" leak that surfaced moved into the framework as
+`Session.extra_fields` (the JSON bag) or the row schema's existing
+typed columns. Apps now subclass `Session` and write whatever fields
+they need; the framework stays domain-agnostic.
+
+### DEC-006. Per-agent `skill.model` override (v1.5-C / M8)
+
+**When.** v1.5-C.
+**Why.** The intake supervisor can run on a fast / cheap model
+while the deep-investigator agent needs a smarter (more expensive)
+one. `_build_agent_nodes` resolves `get_llm(cfg.llm, skill.model,
+role=agent_name)` per skill; falls back to `cfg.llm.default` when
+`model` is `None`.
+
+### DEC-007. Single-file bundle for air-gap deploy (BUNDLER-01)
+
+**When.** v1.3.
+**Why.** Corporate deploy env is copy-only. A multi-file
+`pip install` step is out of scope. The bundler turns the
+multi-file source tree into the smallest possible deploy payload
+(7 files total).
+
+### DEC-008. Concept-leak ratchet for framework genericity (v1.5-B)
+
+**When.** v1.5-B.
+**Why.** The decoupling work (DEC-005) wasn't binary — `incident` /
+`severity` / `reporter` tokens kept creeping into `src/runtime/` via
+local variables, docstrings, and helper names. The ratchet test
+counts those tokens and fails the build if the count grows. v1.5-B
+took it from 156 down to 39 (the residual 39 are
+schema-coupled / public-API / intentional example-app callouts).
+
+### DEC-009. 429 separate retry regime with longer backoff (v1.5-D)
+
+**When.** v1.5-D.
+**Why.** Free / shared upstream tiers (e.g. OpenRouter `…:free`)
+throttle on 30-60s windows; the 5xx default backoff (1.5s/3s/4.5s)
+exhausted retries before the window cleared. Now 429 retries on
+7.5s/15s/22.5s (~45s total).
+
+### DEC-010. Inner agent checkpointer + reload-on-entry to fix HITL stale state (PR #6)
+
+**When.** v1.5-A.
+**Why.** Outer Pregel checkpoints at step boundaries, not mid-step.
+On resume, `state["session"]` reflects the prior step's output, NOT
+the gateway's pending_approval row + version bump that happened
+mid-step. Without `make_agent_node` reloading from store at entry,
+the gateway sees no pending row, double-appends, and `store.save`
+raises `StaleVersionError`. The reload + the inner checkpointer
+together are what make Approve / Reject actually drive the gated
+tool to completion.
+
+### DEC-011. Two example apps to prove genericity
+
+**When.** v1.1 (incident_management lifted), Phase 8
+(code_review added).
+**Why.** Without a second app, "is the framework generic?" is
+unanswerable. The code_review app was built specifically to surface
+every incident-shaped assumption that hadn't been lifted yet — id
+format, row schema, build pipeline, intra-bundle imports. Each
+leak became a framework PR rather than an app workaround.
+
+### DEC-012. Bundle staleness CI gate (HARD-08)
+
+**When.** v1.3.
+**Why.** dist/ files drift if a contributor updates `src/runtime/`
+or `examples/` without re-running the bundler. The drift turns into
+a deploy-time bug ("works in dev, broken in prod"). The CI gate
+rebuilds the bundles from source on every PR and refuses the merge
+if they differ from the committed `dist/*`.
+
+---
+
+## 13. Milestone history
+
+| Milestone | Title | PR | Squash SHA | Headline change |
+|---|---|---|---|---|
+| v1.0 | Prompt-vs-Code Remediation | #1 | `02378dd` | Code becomes the authority — skill prompts no longer carry policy logic |
+| v1.1 | Framework De-coupling | #2 | `0ff8914` | Generic runtime, ASR as use case |
+| v1.2 | Framework Owns Flow Control | bundled into #5 | `9018371` | FOC-01..06 — gate / retry / signal / dedup all framework-owned |
+| v1.3 | Hardening + Real-LLM Compatibility | bundled into #5 | `9018371` | HARD-01..09 + LLM-COMPAT-01 + BUNDLER-01 + SKILL-LINTER-01 |
+| v1.4 | Per-step telemetry + auto-learning intake + React-ready API | #5 | `9018371` | M1..M9 telemetry + LessonStore + generic /sessions/* + SSE + WebSocket + CORS + structured error envelope |
+| v1.5-A | Markdown turn output (Phase 22) + HITL approve/reject end-to-end on langgraph 1.x | #6 + #7 | `f0586a8`, `3f0eb5f` | DEC-003 + DEC-010 |
+| v1.5-B | Generic-noun pass — concept-leak ratchet 156 → 39 | #8 | `25e363c` | DEC-008 |
+| v1.5-C | Per-agent LLM proof point — intake on Ollama Cloud, downstream on `llm.default` | #9 | `54a830d` | DEC-006 |
+| v1.5-D | 429 rate-limit retry + multi-provider integration driver | #10 | `adefae6` | DEC-009 |
+
+Per-phase artefacts under `.planning/phases/<NN>-<slug>/` (gitignored
+working state; selected artefacts are committed for historical record).
+
+---
+
+## 14. Pending / known gaps
+
+### v2.0 — React UI (the long pole)
+
+Stack pick + scaffold + parity-port against the v1.4
+`/sessions/*` REST + SSE/WebSocket API. ~1–2 weeks. The Streamlit
+shell stays as the prototype until React reaches parity.
+
+### Smaller cleanups
+
+- **Duplicate ToolCall audit rows.** The gateway records the gated
+  tool under the FastMCP composite name (`local_remediation:apply_fix`,
+  colon form), the harvester records the same tool call under the
+  LLM-visible name (`local_remediation__apply_fix`, double-underscore
+  form). Cosmetic in the UI; matters if any consumer aggregates tool
+  counts. Fix: align both on the `__` form. ~30 min.
+- **`ApprovalWatchdog` regression test.** PR #6 added gateway saves
+  on resolution transitions; the watchdog should observe a faster
+  cleanup signal but no focused test was added. ~15 min.
+- **`ASR_LOG_LEVEL` env var documentation.** Added in PR #6, no
+  README mention. One-line doc fix.
+- **`src/runtime/locks.py:49` — `TODO(v2)`.** Evict idle slots to cap
+  memory in long-running servers. Real concern for production; not
+  urgent for HITL-paced workloads.
+
+---
+
+## 15. Where to find what
+
+| You want to… | Look at |
+|---|---|
+| Add a new skill | `examples/<app>/skills/<name>/{config.yaml, system.md}` |
+| Add a new app | New folder under `examples/`; subclass `Session` in `state.py`; declare `App*Config` in `config.py`; write MCP servers and skills |
+| Add a tool | App's `mcp_server.py`; register in YAML; gateway picks up risk policy from `cfg.runtime.gateway.policy` |
+| Change LLM provider | `config/config.yaml` `llm.providers` / `llm.models`; per-agent override on `skill.model` |
+| Change HITL policy | `cfg.orchestrator.gate_policy` (cross-cutting), `cfg.runtime.gateway.policy` (per-tool) |
+| Trace one session end-to-end | `EventLog` rows for that `session_id`; `agents_run` and `tool_calls` on the row; `session_events` table |
+| Update the bundle | `uv run python scripts/build_single_file.py`; commit `dist/*` |
+| Add a new framework module | `RUNTIME_MODULE_ORDER` in `scripts/build_single_file.py` (after deps); regen + commit |
+| Run live LLM tests | Set `OLLAMA_API_KEY + OLLAMA_BASE_URL`, `OPENROUTER_API_KEY`, `AZURE_OPENAI_KEY + AZURE_ENDPOINT`; `uv run pytest tests/test_integration_driver_s1.py -v` |
+| Reset state for a fresh run | `rm /tmp/asr.db /tmp/asr.db-{wal,shm}; rm -rf /tmp/asr-faiss` then restart |
+
+---
+
+## 16. Document map
+
+- **`docs/DESIGN.md`** (this file) — architecture, abstractions,
+  decisions, milestone history.
+- **`docs/DEVELOPMENT.md`** — day-to-day contributor loop (setup,
+  bundle regeneration, adding modules).
+- **`docs/AIRGAP_INSTALL.md`** — corporate-mirror install procedure.
+- **`README.md`** (repo root) — one-screen overview pointing at the
+  three docs above.
+- **`examples/incident_management/README.md`** — incident-management
+  app surface; per-skill prompts under `skills/`.
+- **`examples/code_review/README.md`** — code-review app surface;
+  per-skill prompts under `skills/`.
+- **`.planning/`** (gitignored) — working state for the GSD planning
+  workflow (`STATE.md`, `ROADMAP.md`, `phases/<NN>-<slug>/`). Not
+  shipped; selected phase artefacts are committed for the historical
+  record.
diff --git a/examples/code_review/README.md b/examples/code_review/README.md
index 6f8cc7c..6a4a694 100644
--- a/examples/code_review/README.md
+++ b/examples/code_review/README.md
@@ -1,120 +1,87 @@
 # Code Review — Example Application
 
-Second example app for the `runtime` framework. Built in Phase 8 to *prove* the framework is genuinely generic — every framework leak that surfaced while this app was being built (id format, row schema, build pipeline, intra-bundle imports) was lifted into the framework rather than worked around.
+Second example app for the framework. A 3-skill PR review pipeline
+(intake → analyzer → recommender) that walks a diff, files structured
+findings, and emits an approve / request-changes / comment verdict.
+
+This app exists to **prove the framework is genuinely generic** —
+it was built specifically to surface every incident-shaped
+assumption that hadn't yet been lifted out of `src/runtime/`. Each
+leak became a framework PR rather than an app workaround.
+
+For framework-wide design + decisions, see
+[`docs/DESIGN.md`](../../docs/DESIGN.md). This README only covers
+the bits specific to this app.
 
 ## Run
 
 ```bash
-python -m runtime --config config/code_review.yaml
+uv run python -m runtime --config config/code_review.yaml
+ASR_LOG_LEVEL=INFO uv run streamlit run src/runtime/ui.py --server.port 37777
 ```
 
-Boots the long-lived orchestrator service against this app's config.
-The Streamlit UI is the framework's generic shell at
-`ui/streamlit_app.py` (`streamlit run ui/streamlit_app.py`) — it
-duck-types on `Session.extra_fields` for code-review rows and renders
-them in the same accordion shell the incident app uses.
-
-## Architecture
-
-A 3-skill responsive pipeline (`intake → analyzer → recommender`) that consumes a PR description, walks the diff, files structured `ReviewFinding`s, and emits an approve / request-changes / comment recommendation. The framework owns session lifecycle, agent dispatch, and tool gateway; this example owns domain shape, skill prompts, and MCP tools.
+## Layout
 
 ```
 examples/code_review/
 ├── state.py             CodeReviewState(Session) + PullRequest + ReviewFinding
 ├── config.py            CodeReviewAppConfig + load_code_review_app_config
 ├── config.yaml          severity_categories, auto_request_changes_on, repos_in_scope
-├── mcp_server.py        CodeReviewMCPServer with 3 tools
+├── mcp_server.py        CodeReviewMCPServer + 3 tools
 ├── skills/              3 agent YAML configs + _common/ shared style prompt
 │   ├── _common/style.md
 │   ├── intake/
 │   ├── analyzer/
 │   └── recommender/
-├── ui.py                Streamlit read-only viewer (mirrors incident UI patterns)
-├── __main__.py          Entry point
-└── README.md            this file
+├── ui.py                Streamlit read-only viewer
+└── __main__.py          entry point
 ```
 
-## State Model
+## Domain shape
 
-`CodeReviewState(Session)` extends the framework's generic `Session` with:
+`CodeReviewState(Session)` adds `pr: PullRequest`,
+`review_findings: list[ReviewFinding]`, `overall_recommendation`,
+`review_summary`, `review_token_budget`. Session ids look like
+`CR-YYYYMMDD-NNN`.
 
-| Field | Type | Purpose |
-|---|---|---|
-| `pr` | `PullRequest` | repo, PR number, title, author, base/head SHAs, line counts |
-| `review_findings` | `list[ReviewFinding]` | severity, file, line, category, message, optional suggestion |
-| `overall_recommendation` | `"approve" \| "request_changes" \| "comment" \| None` | final verdict |
-| `review_summary` | `str` | rolled-up narrative for the human reviewer |
-| `review_token_budget` | `int` | telemetry — running token spend on this review |
-
-The framework only reads/writes the inherited `Session` lifecycle/telemetry fields (`id`, `status`, `created_at`, `agents_run`, `tool_calls`, `findings`, `pending_intervention`, `token_usage`). Every domain field above lands in the row's `extra_fields` JSON column on save and is hydrated back into the model on load — no incident-shaped row schema leaks here (P8-J).
+`PullRequest` carries repo / number / title / author / base+head SHAs
+/ line counts. `ReviewFinding` carries severity / file / line /
+category / message / optional suggestion. Both are pydantic models
+declared in this app's `state.py`.
 
-## ID Format
+## MCP tools
 
-Session ids look like `CR-YYYYMMDD-NNN` (e.g. `CR-20260503-001`). The format is owned by `CodeReviewState.id_format(seq=...)` (P8-C) so the code-review id namespace is disjoint from incident-management's `INC-...` namespace — both apps can share the same metadata DB without collisions.
+`CodeReviewMCPServer` exposes:
 
-## Configuration
+- `fetch_pr_diff(repo, number)` — **mock**: reads from
+  `tests/fixtures/code_review/<repo>/<number>.json` if present,
+  otherwise returns a small canned diff so the example runs offline.
+- `add_review_finding(session_id, severity, file, line, category,
+  message, suggestion=None)` — append a structured finding to
+  `state.review_findings`. Severity is validated against
+  `severity_categories` from `CodeReviewAppConfig`.
+- `set_recommendation(session_id, recommendation, summary)` —
+  finalize the review. Sets `state.overall_recommendation` +
+  `state.review_summary`.
 
-Two layers, in order of precedence:
-
-| Layer | File | What it owns |
-|---|---|---|
-| Framework | `config/config.yaml` | LLM providers + models, MCP servers, storage URL, paths, `runtime.state_class` |
-| App | `examples/code_review/config.yaml` | `severity_categories`, `auto_request_changes_on`, `repos_in_scope`, `review_max_diff_kb` |
-
-Set `runtime.state_class: examples.code_review.state.CodeReviewState` in the framework config so row hydration produces `CodeReviewState` instances and `id_format` is called on the right class.
-
-## MCP Tools
-
-`CodeReviewMCPServer` (FastMCP, name `"code_review"`) exposes three tools to the agents:
-
-- `fetch_pr_diff(repo, number)` — returns `{diff, files_changed, additions, deletions}`. Reads from `tests/fixtures/code_review/<repo>/<number>.json` if present; otherwise synthesises a tiny canned diff so the example runs offline. **Mock — not a real GitHub fetch.**
-- `add_review_finding(session_id, severity, file, line, category, message, suggestion=None)` — append a structured finding to `state.review_findings`. Validated against `severity_categories` from `CodeReviewAppConfig`.
-- `set_recommendation(session_id, recommendation, summary)` — set `state.overall_recommendation` + `state.review_summary` and finalize the review.
-
-The MCP loader picks this server up via `mcp.servers[*].module = examples.code_review.mcp_server` in the framework config.
+No real GitHub/GitLab integration; tools are mocks for demonstration.
 
 ## Skills
 
-| Skill | Kind | Tools | Routes (success / default → fail) |
-|---|---|---|---|
-| `intake` | responsive | `fetch_pr_diff` | `→ analyzer` / `→ analyzer` / `→ __end__` |
-| `analyzer` | responsive | `fetch_pr_diff`, `add_review_finding` | `→ recommender` / `→ recommender` / `→ __end__` |
-| `recommender` | responsive | `set_recommendation` | `→ __end__` |
-
-All three are `kind: responsive` (no supervisor / monitor) — Phase-6 supervisor support is not exercised here. Common prompt fragments (severity calibration, output shape) live in `skills/_common/style.md` and are inherited by every skill.
-
-## Bundle
-
-Like incident-management, code-review ships as a single self-contained file: `dist/apps/code-review.py`. Build via:
-
-```bash
-python scripts/build_single_file.py
-```
-
-This produces `dist/app.py` (framework-only), `dist/apps/incident-management.py`, **and** `dist/apps/code-review.py` from the same flattening pipeline (P8-K). All three are `ast.parse`-clean and runnable on a clean venv with only vendored deps.
-
-## Limits / Out of Scope
-
-- Tools are **mocked** — there is no real GitHub or GitLab integration. `fetch_pr_diff` reads a JSON fixture or returns synthetic data; `add_review_finding` and `set_recommendation` write only to the in-process session state.
-- No incremental re-review — re-firing the trigger creates a new session.
-- No supervisor skills — the diff is walked sequentially by the analyzer agent.
-- No PR-author identity model — the framework does not ship a generic `Reporter` / `Actor` concept; each app names its own (`pr.author` here, `Reporter(id, team)` for incident-management).
-
-## How This Proves the Framework Is Generic
-
-Phase 8 was written *to surface and fix* framework leaks. The fixes that landed because this app needed them:
-
-- **P8-C** — `Session.id_format()` classmethod hook. Every `Session` subclass mints its own id format (`INC-...` for incidents, `CR-...` here, anything for future apps). `SessionStore._next_id` no longer hard-codes the incident shape.
-- **P8-J** — `extra_fields: JSON` column on the row schema. Round-trip is driven by `state_cls.model_fields`; typed-column fields stay typed, everything else round-trips through the JSON bag. Incident round-trip is preserved; code-review's `pr` / `review_findings` / `overall_recommendation` / `review_summary` / `review_token_budget` now persist losslessly.
-- **P8-K** — bundler emits `dist/apps/code-review.py` from the same flattening pipeline as `dist/apps/incident-management.py`.
-- **P8-L** — integration test: both apps run side-by-side on isolated metadata DBs without colliding on id space, leaking field shapes, or sharing state.
-
-Phase 9 (ASR) builds on the framework as it stands after these fixes.
+| Skill | Tools | Routes |
+|---|---|---|
+| `intake` | `fetch_pr_diff` | → analyzer |
+| `analyzer` | `fetch_pr_diff`, `add_review_finding` | → recommender |
+| `recommender` | `set_recommendation` | → __end__ |
 
-## Testing
+All three are `kind: responsive`. Common prompt fragments live in
+`skills/_common/style.md` and are inherited.
 
-```bash
-pytest tests/test_code_review_*.py tests/test_two_apps_coexist.py tests/test_generic_round_trip.py tests/test_session_id_format.py tests/test_bundle_code_review.py -q --no-cov
-```
+## Limits / Out of scope
 
-App-level pin tests live alongside `tests/test_code_review_*.py`; the Phase-8 framework-leak fixes are pinned by `tests/test_session_id_format.py`, `tests/test_generic_round_trip.py`, `tests/test_bundle_code_review.py`, and `tests/test_two_apps_coexist.py`.
+- Tools are mocked (no real GitHub/GitLab API calls).
+- No incremental re-review (re-firing the trigger creates a new
+  session).
+- No supervisor / monitor skills exercised.
+- No PR-author identity model — each app names its own
+  (`pr.author` here, `Reporter(id, team)` in incident-management).
diff --git a/examples/incident_management/README.md b/examples/incident_management/README.md
index d8cc3b0..dc2d89a 100644
--- a/examples/incident_management/README.md
+++ b/examples/incident_management/README.md
@@ -1,229 +1,84 @@
 # Incident Management — Example Application
 
-The flagship example app for the `runtime` framework. Demonstrates how to layer a domain-specific agent application on top of the generic orchestration runtime.
+The flagship example app for the framework. A 4-skill investigation
+pipeline (intake → triage → deep_investigator → resolution) with
+ASR memory layers (L2 Knowledge Graph, L5 Release Context, L7
+Playbook Store).
+
+For framework-wide design + decisions, see
+[`docs/DESIGN.md`](../../docs/DESIGN.md). This README only covers
+the bits specific to this app.
 
 ## Run
 
 ```bash
-python -m runtime --config config/incident_management.yaml
+uv run python -m runtime --config config/incident_management.yaml
+ASR_LOG_LEVEL=INFO uv run streamlit run src/runtime/ui.py --server.port 37777
 ```
 
-That boots the long-lived orchestrator service against this app's
-config. The Streamlit UI ships separately under `ui/streamlit_app.py`
-(`streamlit run ui/streamlit_app.py`) and binds to the same service.
-
-## Architecture
-
-This example extends the generic `Session` model with incident-specific state and provides a 4-agent investigation pipeline (intake → triage → deep_investigator → resolution). The framework owns session lifecycle, agent dispatch, and tool gateway; this example owns domain shape, skill prompts, and MCP tools.
+## Layout
 
 ```
 examples/incident_management/
-├── state.py             IncidentState(Session) + Reporter + IncidentStatus
-├── config.py            IncidentAppConfig + load_incident_app_config
-├── config.yaml          severity_aliases, escalation_teams, environments, thresholds
-├── mcp_server.py        IncidentMCPServer with 3 tools
-├── asr/                 ASR memory layers (Phase 9)
-│   ├── memory_state.py    MemoryLayerState + L2/L5/L7 pydantic models
-│   ├── kg_store.py        L2 Knowledge Graph (filesystem)
-│   ├── release_store.py   L5 Release Context (filesystem)
-│   ├── playbook_store.py  L7 Playbook Store (filesystem)
-│   └── seeds/             bundled JSON / YAML seed data per layer
-├── skills/              4 agent YAML configs + _common/ shared prompts
+├── state.py                 IncidentState(Session) + Reporter + IncidentStatus
+├── config.py                IncidentAppConfig + load_incident_app_config
+├── config.yaml              severity_aliases, escalation_teams, environments, thresholds
+├── mcp_server.py            IncidentMCPServer + 3 tools
+├── mcp_servers/             observability + remediation + user_context tools
+├── asr/                     ASR memory layers
+│   ├── memory_state.py        MemoryLayerState + L2/L5/L7 pydantic models
+│   ├── kg_store.py            L2 Knowledge Graph (filesystem)
+│   ├── release_store.py       L5 Release Context (filesystem)
+│   ├── playbook_store.py      L7 Playbook Store (filesystem)
+│   └── seeds/                 seed data per layer
+├── skills/                  4 agent YAML configs + _common/ shared prompts
 │   ├── _common/
-│   ├── intake/
-│   ├── triage/
-│   ├── deep_investigator/
-│   └── resolution/
-├── ui.py                Streamlit accordion-per-incident UI
-├── __main__.py          Entry point
-└── README.md            this file
-```
-
-## Configuration
-
-Two layers, in order of precedence:
-
-| Layer | File | What it owns |
-|---|---|---|
-| Framework | `config/config.yaml` | LLM providers + models, MCP servers, storage URL, paths |
-| App | `examples/incident_management/config.yaml` | severity_aliases, escalation_teams, environments, similarity_threshold, confidence_threshold |
-
-The framework's `AppConfig` does **not** contain incident-flavored keys — they all live in `IncidentAppConfig`. Adding a new domain field is a one-line addition to `IncidentAppConfig`, never to `runtime.config.AppConfig`.
-
-## State Model
-
-`IncidentState(Session)` extends the framework's `Session` base with:
-
-- `query: str` — initial user description
-- `environment: str` — production/staging/dev/local
-- `reporter: Reporter` — who filed the incident
-- `summary: str` — agent-produced narrative
-- `tags: list[str]`
-- `severity: str | None` — high/medium/low after triage
-- `category: str | None`
-- `matched_prior_inc: str | None` — id of similar resolved incident, if any
-- `resolution: Any` — final outcome
-- `memory: MemoryLayerState` — ASR memory-layer slots (L2 KG / L5 Release / L7 Playbooks); see "ASR memory layers" below
-
-The framework only reads/writes the inherited `Session` fields (id, status, created_at, agents_run, tool_calls, findings, pending_intervention, token_usage). Domain fields above are read/written exclusively by example-app code.
-
-## MCP Tools
-
-`IncidentMCPServer` exposes three tools to the agents:
-
-- `lookup_similar_incidents(query, environment)` — embedding similarity over closed incidents
-- `create_incident(query, environment, reporter_id, reporter_team)` — start a new investigation
-- `update_incident(incident_id, patch)` — write to status, severity, category, summary, tags, findings, resolution
-
-The MCP loader reads the registry from `config/config.yaml` (`mcp.servers[*].module`), which points at `examples.incident_management.mcp_server` for this app.
-
-## Skills
-
-Each agent (intake/triage/deep_investigator/resolution) is a `Skill` defined by a `config.yaml` + `system.md` pair under `skills/<agent>/`. The `_common/` directory holds shared snippets all skills inherit. The framework's skill loader (`runtime.skill.load_all_skills`) takes a directory; `paths.skills_dir` in the framework config points at this directory.
-
-## Durable Memory
-
-Sessions survive cold restart. The framework wires LangGraph's `AsyncSqliteSaver` (or `AsyncPostgresSaver` for production) to the same database URL declared in `config.yaml`'s `storage.metadata.url`, on a separate connection pool with WAL + `busy_timeout=30s` configured on both sides. Resume after a crash: load the session by id and call `Orchestrator.resume_session(incident_id, user_input)` — it dispatches `Command(resume=...)` against the persisted graph state. Pending interventions are dual-written to both the LangGraph checkpoint and `IncidentRow.pending_intervention` so dashboards reading the relational row stay accurate.
-
-The state class is configurable. `config/config.yaml` sets `runtime.state_class: examples.incident_management.state.IncidentState` so row hydration produces `IncidentState` instances, not bare `Session` instances. A different app subclassing `Session` simply points this key at its own state class — no framework changes.
-
-## Multi-Session
-
-The orchestrator runs as a long-lived `OrchestratorService` (single asyncio loop on a background thread, single shared FastMCP client pool). Each session is an asyncio task on that loop, started via `service.start_session(query=..., environment=..., reporter_id=..., reporter_team=...)` which returns the session id immediately while the agent run continues in the background.
-
-Concurrent sessions are isolated at the row level (each writes to its own `IncidentRow`) but share the MCP client pool. `service.list_active_sessions()` returns a thread-safe snapshot of in-flight sessions; `service.stop_session(session_id)` cancels a task and marks the row `status="stopped"`. Default cap is 8 concurrent sessions; raise `SessionCapExceeded` (HTTP 429) on overflow. Configure via `runtime.max_concurrent_sessions` in `config.yaml`.
-
-Three new HTTP endpoints expose this surface: `POST /sessions` (start), `GET /sessions` (list active), `DELETE /sessions/{id}` (stop). Legacy `POST /investigate` is preserved as a deprecated alias delegating to the same code path.
-
-The Streamlit UI now shows two sections in the sidebar: **In-flight** (live, polled from `list_active_sessions()`) and **History** (closed sessions). The detail pane auto-polls every 1.5s while a session's status is non-terminal; polling stops once status is `resolved` / `escalated` / `stopped`.
-
-## Risk-rated tool gateway
-
-Phase 4 adds a per-tool risk gateway that sits between every agent and every MCP tool call. Each tool is tagged in `runtime.gateway.policy` (`low` / `medium` / `high`) and the gateway dispatches on the resolved action: `low` runs without overhead, `medium` runs and persists `ToolCall(status="executed_with_notify")` for soft audit, and `high` raises `langgraph.types.interrupt(...)` to pause the graph for human approval — the wrap closure captures the live `Session` per agent invocation so audit lands on the right row.
-
-A prod-environment override tightens the policy further: when a session's `environment` is in `prod_environments` and the tool name matches a `resolution_trigger_tools` glob, the gateway forces `approve` regardless of the tool's risk tier. This guarantees that "blast-radius" tools (apply_fix, deploy, mass_update_*) always get a human in the loop in production, even when the underlying tier is `low` or `medium`.
-
-Operators resolve pending approvals via `POST /sessions/{sid}/approvals/{tool_call_id}` (decision=approve|reject + approver + optional rationale) or via the **Pending Approvals** cards in the Streamlit detail pane. Both paths drive `Command(resume={...})` against the same graph thread_id so HTTP and UI clients share the resume contract.
-
-Legacy `tool_calls` rows from before Phase 4 are migrated lazily by `runtime.storage.migrate_tool_calls_audit` — idempotent JSON walk that fills the new audit fields with their defaults. Run once at orchestrator startup or as a one-off ops job.
-
-## ASR memory layers (Phase 9)
-
-Phase 9 lays the foundation for ASR's 7-layer memory architecture (see `ASR.md` §3 / §6). Three of those layers ship in this batch as filesystem-backed read-only stores under `examples/incident_management/asr/`. No Neo4j / Redis / pgvector dependency — air-gapped friendly per `rules/build.md`.
-
-| Layer | Class | Backing files | Surface |
-|---|---|---|---|
-| **L2** Knowledge Graph | `KGStore` | `incidents/kg/{components,edges}.json` | `get_component` / `find_by_name` / `neighbors` / `subgraph` |
-| **L5** Release Context | `ReleaseStore` | `incidents/releases/recent.json` | `recent_for_service` / `suspect_at` / `context` |
-| **L7** Playbook Store | `PlaybookStore` | `incidents/playbooks/*.yaml` | `get` / `list_all` / `match` |
-
-Each store accepts a `root: Path` for testability. When the configured layer directory is empty, the store falls back to the seed bundle at `examples/incident_management/asr/seeds/<layer>/` so a fresh checkout has working data without provisioning `incidents/`.
-
-Investigations attach context fetched from each layer to `IncidentState.memory` — the `MemoryLayerState` container with `l2_kg: L2KGContext | None`, `l5_release: L5ReleaseContext | None`, `l7_playbooks: list[L7PlaybookSuggestion]`. The whole bundle round-trips through the P8-J `extra_fields` JSON column, so no row schema change is needed. Mutation paths (writes from agents, playbook authoring) are deferred to later sub-phases (9e–9g).
-
-## ASR MVP investigation flow (Phase 9 — 9h/9i/9k/9m)
-
-The MVP slice wires a deliberate, end-to-end investigation pipeline on top of the memory-layer foundation. Three new skills + helpers + UI panels:
-
+│   ├── intake/                kind: supervisor, runs similarity + memory hydration
+│   ├── triage/                hypothesis-loop investigator
+│   ├── deep_investigator/     evidence gathering
+│   └── resolution/            propose / apply fix or escalate
+├── ui.py                    Streamlit accordion-per-incident view
+└── __main__.py              entry point
 ```
-intake → triage (hypothesis loop) → deep_investigator → resolution (L7 + gateway)
-```
-
-**1. Supervisor (`intake`, P9-9h + 2026-05-03 generalisation).** Default entry agent (framework default `entry_agent='intake'` matches; no override needed in `config/config.yaml`). The `intake` skill is `kind: supervisor` whose runner composes `runtime.intake.default_intake_runner` (framework — similarity retrieval + dedup gate) with `examples.incident_management.asr.supervisor_node:default_supervisor_runner`'s memory hydration. Hydrates `session.memory` with L2 KG / L5 Release / L7 Playbook context fetched from the affected service set (extracted heuristically from the query). Applies the **single-active-investigation gate**: if another in-flight session is already covering the same components, the new session is tagged `status="duplicate"` with `parent_session_id` pointing at the active one (reuses P7 dedup linkage), and routed to `__end__`. Helper module: `examples/incident_management/asr/supervisor_node.py`.
-
-**2. Triage hypothesis loop (P9-9i).** The triage skill now runs a bounded inner loop: generate hypothesis → gather evidence (L1 current findings, L3-equivalent past similar incidents via `lookup_similar_incidents`, L5 recent suspect deploys from `session.memory.l5_release`) → score → refine or accept. Hard cap of 3 iterations. The deterministic scorer (`asr.hypothesis_loop.score_hypothesis` — token-overlap, no LLM) and the `should_refine` predicate are unit-tested separately so the loop's safety net isn't LLM-dependent. Each iteration writes `{iteration, hypothesis, score, rationale}` to `findings.findings_triage` for the UI's hypothesis trail panel.
-
-**3. Resolution + prod-HITL (P9-9k).** The resolution skill consults `session.memory.l7_playbooks`, picks the top match, and translates `playbook.remediation` into tool calls via `asr.resolution_helpers.playbook_to_tool_calls`. Every call routes through the framework gateway. The `runtime.gateway` block in `config/config.yaml` locks the prod-environment override: `update_incident` (medium) and any `remediation:*` tool ALWAYS require approval in `production`, regardless of risk tier. The override only TIGHTENS — it can never relax a higher-risk tool to `auto`.
-
-**4. UI panels (P9-9m-sliver).** Two read-only views on the incident detail page:
-
-- **Approval Inbox** — already shipped in P4-H; surfaces every tool call with `status="pending_approval"` as an Approve / Reject card.
-- **Hypothesis Trail** — collapsed accordion showing the triage agent's iterative `{iteration, hypothesis, score, rationale}` log, sourced from `session.findings`. No new persistent state.
-
-## Agent kinds
-
-Phase 6 introduces a `kind` discriminator on every `Skill`, allowing three execution models behind a single config schema:
-
-| `kind`       | Where it runs                                | Writes `AgentRun`? |
-|--------------|----------------------------------------------|--------------------|
-| `responsive` | LangGraph node, on a session turn (today's path) | yes |
-| `supervisor` | LangGraph node, dispatches to a subordinate via `Send()` | **no** (dispatch log only) |
-| `monitor`    | Out-of-band, scheduled via `MonitorRunner`   | no (signals only) |
-
-Each existing skill in this example carries `kind: responsive` explicitly; the loader still defaults the field to `responsive` when omitted, so legacy YAML keeps working unchanged. A `supervisor` skill declares `subordinates`, `dispatch_strategy: llm|rule`, and either a `dispatch_prompt` (for `llm`) or a `dispatch_rules` list (for `rule`); supervisor dispatches emit a structured `supervisor_dispatch` log entry instead of bloating `agents_run` with router rows. A `monitor` skill declares a 5-field `schedule:` cron expression, an `observe:` list of tool names, an `emit_signal_when:` safe-eval expression, and a `trigger_target:` naming a Phase-5 trigger to fire when the expression is true. Monitors run on a small bounded thread pool (`max_workers=4`); each tick has a per-monitor `tick_timeout_seconds` so one slow `observe` tool cannot stall the others. Dangerous expression constructs (calls, attribute access, comprehensions, lambda) are rejected by an AST allowlist at skill-load time — `eval()`/`exec()` are never used on user-supplied strings.
-
-## Triggers
-
-Phase 5 adds a declarative trigger registry that generalises session-start beyond the legacy `POST /investigate` route. After Phase 5 the framework can fire `Orchestrator.start_session` from four transport flavours: `api` (back-compat), `webhook` (third-party POST `/triggers/{name}`), `schedule` (in-process APScheduler cron), and `plugin` (custom transport registered via setuptools entry-points or explicit `plugin_transports={"kind": Class}` on `TriggerRegistry.create`). All four are wired off a single `triggers:` block in `config.yaml`.
-
-```yaml
-triggers:
-  - name: pagerduty-incident
-    transport: webhook
-    target_app: incident_management
-    payload_schema: examples.incident_management.triggers.PagerDutyPayload
-    transform: examples.incident_management.triggers.transform_pagerduty
-    auth: bearer
-    auth_token_env: PAGERDUTY_WEBHOOK_TOKEN
-    idempotency_ttl_hours: 24
-
-  - name: nightly-prod-scan
-    transport: schedule
-    target_app: incident_management
-    transform: examples.incident_management.triggers.transform_schedule_heartbeat
-    schedule: "0 2 * * *"        # 5-field cron (UTC by default)
-    timezone: UTC
-    payload:
-      query: "Nightly health check"
-      environment: production
-```
-
-**Webhook routing:** the registry mounts one `POST /triggers/{name}` route per webhook trigger. Each trigger config declares a Pydantic `payload_schema` (validated on every request — bad body returns 422) and a `transform` callable that maps the parsed payload to `start_session(**kwargs)`. The transform error policy is fail-closed: any exception from `transform` returns `422 Unprocessable Entity` and is **not** cached for idempotency, so a retried request gets a fresh attempt.
 
-**Bearer auth:** when `auth: bearer`, the route requires `Authorization: Bearer $auth_token_env`. The token is read from the named env var **at app startup** — rotating the secret requires a process restart. No raw secrets ever land in YAML. Constant-time comparison (`hmac.compare_digest`) guards against timing oracles. HMAC signature transports (PagerDuty `x-pagerduty-signature`, Slack `x-slack-signature`) are deferred to a later phase via the same `auth:` discriminator.
+## Domain shape
 
-**Idempotency-Key:** webhook clients can include `Idempotency-Key: <token>` to dedupe retries. The registry stores `(trigger_name, key)` -> `session_id` in a per-process LRU and a SQLite-backed table `trigger_idempotency_keys` on the same DB used for session metadata (`storage.metadata.url`). Cold restart is survived: on LRU miss, the disk row is read; entries past `ttl_hours` are purged opportunistically. Content-based dedup (hash of body) is **out of scope until Phase 7**; only the explicit `Idempotency-Key` header is honoured in Phase 5.
+`IncidentState(Session)` adds `query`, `environment`, `reporter`,
+`summary`, `tags`, `severity`, `category`, `matched_prior_inc`,
+`resolution`, `memory: MemoryLayerState`. Session ids look like
+`INC-YYYYMMDD-NNN`.
 
-**Schedule cron:** `schedule:` is a standard 5-field cron string interpreted via APScheduler's `CronTrigger.from_crontab`. The 6-field APScheduler-native form is rejected at config-load time. Drift: in-process APScheduler is good for ±1 minute under normal load — tighter SLOs need an external scheduler (Celery beat, k8s `CronJob`).
+## ASR memory layers
 
-**Plugin transports:** to ship a transport for SQS / Kafka / NATS, subclass `runtime.triggers.base.TriggerTransport` and register the class either via the `runtime.triggers` setuptools entry-point group or by passing `plugin_transports={"kind": Class}` to `TriggerRegistry.create`. Explicit registrations win on key collision.
-
-**Provenance:** every session started via a trigger receives a `TriggerInfo(name, transport, target_app, received_at)` stamped onto `inc.findings['trigger']` before the graph runs, so dashboards and audit logs can answer "where did this session come from?" without re-deriving from disjoint sources.
-
-## Adding a new app
-
-The framework is genuinely generic — Phase 8 lifted every domain-specific assumption out of `src/runtime/` and pinned it with the second example at `examples/code_review/`. To stand up your own app, mirror this structure under `examples/<your_app>/` (no framework changes required):
-
-| File | What it owns | Hook into the framework |
+| Layer | Class | Backing |
 |---|---|---|
-| `state.py` | Your `Session` subclass with domain fields | `runtime.state_class` (dotted path) |
-| `state.py` (`id_format` classmethod) | Your session id shape (e.g. `MYAPP-NNN`) | `Session.id_format(seq=...)` (P8-C) |
-| `config.py` / `config.yaml` | Your `AppConfig` subclass for app-specific tunables | Loaded by your own loader; framework doesn't touch it |
-| `mcp_server.py` | Your domain MCP tools | `mcp.servers[*].module` |
-| `skills/<name>/{config,system}.{yaml,md}` | Per-skill prompt + tool wiring | `paths.skills_dir` |
-| (none) | Entry point lives in the framework: `python -m runtime --config config/<your_app>.yaml` | n/a |
-
-**Round-trip:** any field you declare on your `Session` subclass that is *not* an incident-shaped typed column on `IncidentRow` (`query`, `environment`, `severity`, `tags`, ...) lands in the row's `extra_fields` JSON column on save and is hydrated back via `state_cls.model_fields` on load (P8-J). You don't need to touch the framework's row schema or converters.
-
-**Bundle:** add a `<YOUR_APP>_APP_MODULE_ORDER` and a `build_<your_app>_app()` function in `scripts/build_single_file.py`, then call it from `main()`. The flattening pipeline + intra-import stripping pattern is the same for every app (see how `examples.code_review` does it).
-
-The second example at [`examples/code_review/`](../code_review/README.md) is a deliberate non-incident-flavored app (PR review). It exists to *prove* the framework is generic by being a second concrete instance of the same pattern. If you're stuck on how a piece should land, check what code-review does first — the pattern is almost always already there.
-
-## Testing
-
-```bash
-pytest tests/ -q --no-cov
-```
-
-Pin tests for this example live in `tests/test_incident_state.py` (state shape), `tests/test_mcp_incident_server.py` (MCP server), and the broader integration suite under `tests/test_*`.
-
-## Genericity ratchet
-
-`scripts/check_genericity.py` counts occurrences of incident-flavored tokens (`incident`, `severity`, `reporter`) inside `src/runtime/`. `tests/test_genericity_ratchet.py` enforces that the total stays at or below `BASELINE_TOTAL` — so new domain leaks into the framework layer fail CI.
-
-```bash
-python scripts/check_genericity.py            # print current counts
-python scripts/check_genericity.py --baseline 140  # exit non-zero if exceeded
-```
-
-To lower the baseline: refactor a leak out of `src/runtime/`, then update `BASELINE_TOTAL` in `tests/test_genericity_ratchet.py` in the same commit. Raising the baseline requires an architecture rationale in the commit message and is a code-review red flag.
+| L2 Knowledge Graph | `KGStore` | `incidents/kg/{components,edges}.json` (or seeds) |
+| L5 Release Context | `ReleaseStore` | `incidents/releases/recent.json` (or seeds) |
+| L7 Playbook Store | `PlaybookStore` | `incidents/playbooks/*.yaml` (or seeds) |
+
+The intake supervisor hydrates `IncidentState.memory` from these
+stores using components extracted from the user's query. The
+triage / DI / resolution agents read the bundle as additional
+context. Mutation paths (write-back) are deferred.
+
+## MCP tools
+
+`IncidentMCPServer` exposes `lookup_similar_incidents`,
+`create_incident`, `update_incident`, `submit_hypothesis`,
+`mark_resolved`, `mark_escalated`. Sibling MCP servers under
+`mcp_servers/` add observability (`get_logs`, `get_metrics`,
+`get_service_health`, `check_deployment_history`) and remediation
+(`propose_fix`, `apply_fix`, `notify_oncall`).
+
+The risk-rated gateway (`runtime.gateway.policy`) tags `apply_fix`
+as `high` so production runs pause for operator approval before
+applying any fix. See [DESIGN § 7](../../docs/DESIGN.md#7-hitl-approve--reject)
+for the HITL pause/resume mechanics.
+
+## Skill model
+
+Per-agent LLM override: intake declares `model: gpt_oss_cheap` (a
+fast / cheap model on Ollama Cloud) so the supervisor pre-filter is
+cheap; downstream agents follow `llm.default`. See
+[DESIGN § 5.3](../../docs/DESIGN.md#5-llm-provider-story) for the
+per-agent dispatch.