diff --git a/.gitignore b/.gitignore
index c5d3e84..21a3985 100644
--- a/.gitignore
+++ b/.gitignore
@@ -58,10 +58,26 @@ ASR.md
 #   - DESIGN.md:         consolidated architecture + decision log.
 #   - AIRGAP_INSTALL.md: Phase 14 (HARD-02) air-gap install path.
 #   - DEVELOPMENT.md:    Phase 16 (BUNDLER-01) contributor workflow.
+#   - 00-…-11-…:         brownfield documentation set (per-topic).
+#   - adr/*.md:          Architecture Decision Records.
 docs/*
 !docs/DESIGN.md
 !docs/AIRGAP_INSTALL.md
 !docs/DEVELOPMENT.md
+!docs/00-project-overview.md
+!docs/01-local-setup.md
+!docs/02-architecture.md
+!docs/03-code-map.md
+!docs/04-main-flows.md
+!docs/05-configuration.md
+!docs/06-data-model.md
+!docs/07-integrations.md
+!docs/08-testing.md
+!docs/09-build-deploy-release.md
+!docs/10-known-risks-and-todos.md
+!docs/11-agent-handoff.md
+!docs/adr/
+!docs/adr/*.md
 REVIEW_*.md
 review_*.md
 .planning/
diff --git a/CLAUDE.md b/CLAUDE.md
new file mode 100644
index 0000000..e3776ff
--- /dev/null
+++ b/CLAUDE.md
@@ -0,0 +1,199 @@
+# CLAUDE.md — project context for AI agents
+
+> Loaded automatically by Claude Code (and equivalent agents) for
+> every session in this repo. Companion to
+> [`docs/11-agent-handoff.md`](docs/11-agent-handoff.md), which has
+> the longer "action card" format with explanations.
+
+## What this project is
+
+Generic Python multi-agent runtime framework on **LangGraph**
+(orchestration) + **LangChain** (provider + agent factory) +
+**FastMCP** (tools). Single-file deploy bundle for air-gapped
+corporate environments. Two reference apps in `examples/`:
+`incident_management` (flagship) and `code_review` (proves the
+framework is generic).
+
+`main` is at v1.5 (see [`docs/DESIGN.md`](docs/DESIGN.md) § 13 for
+milestone history).
+
+## Read these first
+
+In order:
+1. [`docs/DESIGN.md`](docs/DESIGN.md) — long-form architecture +
+   12 numbered DEC-NNN decisions + milestone history
+2. [`docs/11-agent-handoff.md`](docs/11-agent-handoff.md) — top
+   20 files to read, command allowlist / denylist, common traps
+3. [`docs/02-architecture.md`](docs/02-architecture.md) —
+   quick-scan layered diagram
+4. [`docs/04-main-flows.md`](docs/04-main-flows.md) — entry points
+   + failure modes per flow
+
+## Always-on commands
+
+```bash
+# install / sync deps (uses uv.lock)
+uv sync --frozen --extra dev
+
+# tests (full)
+uv run pytest -x
+
+# tests (single file, fast)
+uv run pytest tests/<file>.py -xvs --no-cov
+
+# lint + type check + ratchets
+uv run ruff check src/ tests/
+uv run pyright src/runtime
+python scripts/check_genericity.py
+uv run python scripts/lint_skill_prompts.py
+
+# regenerate single-file bundle (REQUIRED after touching src/runtime/ or examples/)
+uv run python scripts/build_single_file.py
+
+# coverage gate
+uv run pytest --cov=src/runtime --cov-fail-under=85 -x
+```
+
+## DO
+
+- Use `uv run pytest …` (NOT bare `pytest`) — pythonpath is in
+  `pyproject.toml`.
+- Regenerate `dist/*` after ANY change to `src/runtime/` or
+  `examples/`. CI's "Bundle staleness gate (HARD-08)" fails
+  otherwise.
+- Run `uv lock` and commit `uv.lock` if you change `pyproject.toml`.
+  CI's "Lockfile freshness gate (HARD-02)" fails otherwise.
+- Work on a feature branch, open a PR, squash-merge.
+  Conventional-commit subjects: `feat(area): …`, `fix(area): …`,
+  `refactor(area): …`, `docs: …`, `build: …`, `chore(area): …`.
+- Use `extra_fields` JSON for app-specific fields. Do NOT add
+  app-specific columns to `IncidentRow`.
+- Use stub LLMs (`LLMConfig.stub()` + `EnvelopeStubChatModel` from
+  `tests/_envelope_helpers.py`) in tests. Live LLM tests are
+  env-gated.
+- Re-read [`docs/DESIGN.md`](docs/DESIGN.md) § 12 (decision log)
+  before any architectural change.
+
+## DO NOT
+
+- Do NOT `pip install …` — bypasses uv lockfile. Use `uv add` +
+  `uv sync`.
+- Do NOT edit `dist/*` directly — they're generated.
+- Do NOT add `TODO`/`FIXME`/`HACK` comments — fix root cause or
+  open an issue. The only intentional `TODO(v2)` is in
+  `src/runtime/locks.py:49` (slot eviction; documented).
+- Do NOT add `except Exception: pass` — Phase 18 / HARD-04
+  removed all of these. Log + re-raise or catch a typed exception.
+- Do NOT touch SQLAlchemy column names on `IncidentRow` —
+  destructive migration. Add to `extra_fields` instead.
+- Do NOT commit anything in `.planning/` — gitignored;
+  local-only working state for the GSD planning workflow.
+- Do NOT commit agent-generated `*.md` outside `docs/` unless
+  the user explicitly asks them to ship. `docs/*` is gitignored
+  except for the explicit allowlist in `.gitignore`.
+- Do NOT call live LLM providers in CI tests — keys are dummy in
+  `.github/workflows/ci.yml`.
+- Do NOT introduce a public-internet runtime dependency in
+  `src/runtime/`. Air-gap is the deploy target. The hardcoded
+  `https://ollama.com` fallback was explicitly removed in
+  Phase 13 (HARD-05); don't re-introduce.
+- Do NOT force-push or rewrite history on `main` (or any branch
+  with collaborators). PRs only.
+- Do NOT skip the bundle regeneration step ("I'll do it before
+  PR" leads to CI fails and time wasted on rebases).
+- Do NOT bypass the concept-leak ratchet by raising
+  `BASELINE_TOTAL` without a rationale entry. Lowering it is
+  encouraged; raising requires architectural justification in the
+  commit message.
+
+## Architectural rules (load-bearing)
+
+See [`docs/11-agent-handoff.md`](docs/11-agent-handoff.md) §
+"Architectural rules" for the 8 rules. Quick recap:
+
+1. Framework stays domain-agnostic
+2. One source of truth per concern (`should_gate`, `should_retry`,
+   `_finalize_session_status`)
+3. HITL pause is NOT an error
+4. Append-only audit trails
+5. The bundle is the deploy unit
+6. Provider abstraction stays in `runtime.llm`
+7. Tests use stubs by default
+8. No public-internet runtime calls in air-gap path
+
+## Common traps (skim before debugging)
+
+- `pytest` (bare) → `ModuleNotFoundError: runtime`. Use `uv run pytest …`.
+- Touching `src/` without regenerating `dist/` → CI bundle gate fails.
+- Approving a HITL session created on pre-PR-#6 code → silent no-op.
+  Tell the user to start a fresh session.
+- Live OpenRouter `:free` model 429s on first call → retry usually
+  works (v1.5-D 429 backoff is 7.5s/15s/22.5s).
+- Streamlit `AssertionError: scope["type"] == "http"` storm under
+  Python 3.14 → cosmetic Starlette compat bug; HTTP traffic still
+  works.
+
+## Repo conventions
+
+- **Branches:** `feat/`, `fix/`, `refactor/`, `docs/`, `chore/`,
+  `build/`. Squash-merge into `main`.
+- **Commits:** Conventional Commits style. Verbose body with the
+  "why" + key file references when non-trivial.
+- **PRs:** Use `gh pr create` with title + body; CI runs lint /
+  type / test / sonar / bundle / skill-lint. Squash-merge with
+  `gh pr merge <n> --squash --delete-branch --subject "…"`.
+- **Tests:** `tests/test_*.py`. Async tests need no decorator
+  (`asyncio_mode=auto`). Stub LLMs from `tests/_envelope_helpers.py`.
+- **Coverage:** ≥ 85% on `src/runtime/`. UI / `__main__` /
+  postgres saver / plugin transport are excluded
+  (`pyproject.toml:[tool.coverage.run].omit`).
+- **Type-checker:** pyright fail-on-error (Phase 19 / HARD-03);
+  use `# pyright: ignore[<rule>] -- <rationale>` for legitimate
+  stub gaps.
+- **Skill prompts:** `examples/<app>/skills/<name>/{config.yaml, system.md}`.
+  Must include the markdown turn-output contract block (see
+  `_common/output.md`).
+
+## Worktree workflow
+
+This repo is set up for parallel-agent worktrees under
+`.claude/worktrees/`. If you're given the EnterWorktree tool:
+
+- Use it BEFORE making any code changes — keeps the user's main
+  checkout clean.
+- After CI passes and the PR merges, ExitWorktree with
+  `action=remove, discard_changes=true` (the squashed commits are
+  on `main`; the original SHAs are dropped, content is preserved).
+
+If you're not given EnterWorktree, work in the main checkout but
+let the user know.
+
+## Current state snapshot (as of last update)
+
+- Tests: 1265 passing, 8 skipped
+- Coverage: 87.04%
+- Concept-leak ratchet: 39 (down from 156 pre-v1.5-B)
+- Ruff: clean
+- SonarCloud quality gate: green
+- Latest milestone: v1.5 (markdown turn output + HITL fix +
+  generic-noun pass + per-agent LLM + 429 retry)
+- Next big move: v2.0 React UI (Streamlit retirement)
+
+## Where to find what
+
+| You want to … | Read |
+|---|---|
+| Understand the architecture | [`docs/DESIGN.md`](docs/DESIGN.md), [`docs/02-architecture.md`](docs/02-architecture.md) |
+| Local setup | [`docs/01-local-setup.md`](docs/01-local-setup.md) |
+| Find a file by purpose | [`docs/03-code-map.md`](docs/03-code-map.md) |
+| Understand a flow end-to-end | [`docs/04-main-flows.md`](docs/04-main-flows.md) |
+| Configure deployment | [`docs/05-configuration.md`](docs/05-configuration.md) |
+| Inspect storage / data | [`docs/06-data-model.md`](docs/06-data-model.md) |
+| External integrations | [`docs/07-integrations.md`](docs/07-integrations.md) |
+| Run / write tests | [`docs/08-testing.md`](docs/08-testing.md) |
+| Build / deploy / release | [`docs/09-build-deploy-release.md`](docs/09-build-deploy-release.md) |
+| Risk / debt inventory | [`docs/10-known-risks-and-todos.md`](docs/10-known-risks-and-todos.md) |
+| Action card for AI agents | [`docs/11-agent-handoff.md`](docs/11-agent-handoff.md) |
+| Architectural baseline | [`docs/adr/0001-current-architecture.md`](docs/adr/0001-current-architecture.md) |
+| Dev workflow (regenerate dist, add module) | [`docs/DEVELOPMENT.md`](docs/DEVELOPMENT.md) |
+| Air-gap install | [`docs/AIRGAP_INSTALL.md`](docs/AIRGAP_INSTALL.md) |
diff --git a/README.md b/README.md
index 046a5d5..0bb84cd 100644
--- a/README.md
+++ b/README.md
@@ -1,5 +1,15 @@
 # ASR — Multi-Agent Runtime Framework
 
+[![Python](https://img.shields.io/badge/python-3.11%2B-blue?style=for-the-badge&logo=python&logoColor=white)](https://www.python.org/)
+[![LangGraph](https://img.shields.io/badge/LangGraph-1.x-orange?style=for-the-badge)](https://github.com/langchain-ai/langgraph)
+[![FastMCP](https://img.shields.io/badge/FastMCP-2.x-purple?style=for-the-badge)](https://github.com/jlowin/fastmcp)
+[![CI](https://img.shields.io/github/actions/workflow/status/RandomCodeSpace/asr/ci.yml?branch=main&style=for-the-badge&logo=github)](https://github.com/RandomCodeSpace/asr/actions/workflows/ci.yml)
+[![Quality Gate](https://img.shields.io/sonar/quality_gate/RandomCodeSpace_asr?server=https%3A%2F%2Fsonarcloud.io&style=for-the-badge&logo=sonarcloud)](https://sonarcloud.io/project/overview?id=RandomCodeSpace_asr)
+[![Coverage](https://img.shields.io/sonar/coverage/RandomCodeSpace_asr?server=https%3A%2F%2Fsonarcloud.io&style=for-the-badge&logo=sonarcloud)](https://sonarcloud.io/component_measures?id=RandomCodeSpace_asr&metric=coverage)
+[![Tests](https://img.shields.io/badge/tests-1265%20passing-brightgreen?style=for-the-badge)](https://github.com/RandomCodeSpace/asr/actions)
+[![Ruff](https://img.shields.io/badge/lint-ruff-261230?style=for-the-badge&logo=ruff)](https://github.com/astral-sh/ruff)
+[![Pyright](https://img.shields.io/badge/types-pyright-yellow?style=for-the-badge)](https://github.com/microsoft/pyright)
+
 Python multi-agent runtime built on **LangGraph** (orchestration) +
 **FastMCP** (tool dispatch), with HITL gate, markdown turn-output
 contract, and a single-file deploy bundle for air-gapped corporate
diff --git a/docs/00-project-overview.md b/docs/00-project-overview.md
new file mode 100644
index 0000000..4d9a1e5
--- /dev/null
+++ b/docs/00-project-overview.md
@@ -0,0 +1,97 @@
+# 00 — Project overview
+
+## What it does
+
+ASR is a generic Python multi-agent runtime framework that wraps
+**LangGraph** (orchestration), **LangChain** (LLM provider
+abstraction + agent factory), and **FastMCP** (tool dispatch). It
+adds a risk-rated HITL gateway, a markdown turn-output contract,
+per-step telemetry, an auto-learning lesson store, and a single-file
+deploy bundle for air-gapped corporate targets.
+
+Two reference apps live in the same repo to prove the runtime is
+genuinely generic:
+
+- **`examples/incident_management/`** — 4-skill investigation pipeline
+  (intake → triage → deep_investigator → resolution) with ASR memory
+  layers (L2 Knowledge Graph, L5 Release Context, L7 Playbook Store).
+- **`examples/code_review/`** — 3-skill PR review pipeline (intake
+  → analyzer → recommender). Built specifically to surface every
+  framework leak that would have made the runtime
+  incident-shaped — those leaks were lifted into the framework.
+
+References: [`docs/DESIGN.md`](DESIGN.md), [`pyproject.toml`](../pyproject.toml).
+
+## Target users
+
+- **Operators** of internal SRE / on-call automation in regulated /
+  air-gapped corporate environments. The deployment story is a
+  copy-only 7-file payload (no `pip install` at deploy time, no
+  runtime CDN/internet calls).
+- **Application authors** building domain-specific agent apps on top
+  of the framework. Add a folder under `examples/<your_app>/` with a
+  `Session` subclass, MCP servers, and skill prompts.
+- **Framework contributors** working on the `src/runtime/` layer.
+
+## Core features
+
+| Feature | Implemented in |
+|---|---|
+| LangGraph-driven multi-agent dispatch | `src/runtime/graph.py`, `src/runtime/agents/*.py` |
+| LangChain-driven LLM provider abstraction (Ollama, Azure OpenAI, OpenAI-compat) | `src/runtime/llm.py` |
+| FastMCP tool servers (in-process / stdio / http) | `src/runtime/mcp_loader.py` |
+| Risk-rated HITL gateway with `interrupt()` / `Command(resume=…)` | `src/runtime/tools/gateway.py` |
+| Markdown turn-output contract + 6-path parser + permissive fallback | `src/runtime/agents/turn_output.py` |
+| Per-step telemetry events (agent_started, tool_invoked, gate_fired, etc.) | `src/runtime/storage/event_log.py` |
+| Auto-learning lesson store + nightly refresher | `src/runtime/learning/extractor.py`, `src/runtime/learning/scheduler.py` |
+| Two-stage dedup (embedding + LLM) | `src/runtime/dedup.py` |
+| Optimistic-concurrency `SessionStore` over SQLAlchemy | `src/runtime/storage/session_store.py` |
+| Read-only similarity store | `src/runtime/storage/history_store.py` |
+| Trigger registry (api / webhook / schedule / plugin) | `src/runtime/triggers/` |
+| Single-file deploy bundle (`dist/`) | `scripts/build_single_file.py` |
+| Streamlit UI shell | `src/runtime/ui.py`, `ui/streamlit_app.py` |
+| FastAPI surface (`/sessions/*`, SSE/WebSocket, approvals) | `src/runtime/api.py` |
+| Concept-leak ratchet (CI-enforced framework genericity) | `tests/test_genericity_ratchet.py`, `scripts/check_genericity.py` |
+
+## Current status
+
+`main` is at v1.5 (last squash commit `b97ddb3`). All milestones
+shipped:
+
+| Milestone | Title | PR |
+|---|---|---|
+| v1.0 | Prompt-vs-Code Remediation | #1 |
+| v1.1 | Framework De-coupling (generic runtime) | #2 |
+| v1.2 + v1.3 + v1.4 | FOC + HARD + telemetry + auto-learning + React-ready API | bundled into #5 |
+| v1.5-A | Markdown turn output + HITL fix on langgraph 1.x | #6 / #7 |
+| v1.5-B | Generic-noun pass (concept-leak ratchet 156 → 39) | #8 |
+| v1.5-C | Per-agent LLM proof point | #9 |
+| v1.5-D | 429 rate-limit retry + multi-provider integration driver | #10 |
+
+**1265 tests passing**, **87% coverage**, **ratchet at 39**, ruff
+clean, SonarCloud quality gate green. See [`docs/DESIGN.md` § 13](DESIGN.md#13-milestone-history)
+for the full history.
+
+## Production-ready vs experimental
+
+| Surface | Status | Notes |
+|---|---|---|
+| Framework runtime (`src/runtime/`) | **production** | Used in air-gapped corporate environments |
+| `incident_management` example | **production** | Flagship use case |
+| `code_review` example | **demo / proof-of-genericity** | Tools are mocks (no real GitHub/GitLab fetch) — `examples/code_review/README.md` |
+| Streamlit UI | **prototype** | Stable but slated for replacement by React in v2.0 |
+| FastAPI surface | **production-ready** | v1.4 added generic `/sessions/*` REST + SSE/WebSocket + CORS + structured error envelope |
+| Postgres checkpointer | **optional / opt-in** | Default is SQLite; install `pip install asr[postgres]` (`pyproject.toml:39`) |
+| Trigger registry — webhook / schedule | **functional, lightly exercised** | Used by the example apps; no large-scale fan-in tested |
+| Trigger registry — plugin transport | **stub** (`src/runtime/triggers/transports/plugin.py`) — Inference: scaffold for future SQS/Kafka/NATS transports |
+| ASR memory layers (incident_management) | **read-only** | Mutation paths (write-back) deferred per `examples/incident_management/README.md` |
+| Auto-learning lesson refresher | **production** | Nightly APScheduler job, gated on config |
+
+## What's next
+
+- **v2.0 — React UI**, replacing the Streamlit prototype, parity-port
+  against the v1.4 `/sessions/*` API surface. The long pole.
+- Smaller cleanups: duplicate `ToolCall` audit rows
+  (gateway colon-form vs harvester `__`-form), `ApprovalWatchdog`
+  regression test, `ASR_LOG_LEVEL` doc, `src/runtime/locks.py:49`
+  TODO. See [`docs/10-known-risks-and-todos.md`](10-known-risks-and-todos.md).
diff --git a/docs/01-local-setup.md b/docs/01-local-setup.md
new file mode 100644
index 0000000..cba0a50
--- /dev/null
+++ b/docs/01-local-setup.md
@@ -0,0 +1,167 @@
+# 01 — Local setup
+
+## Prerequisites
+
+- **Python 3.11** (`pyproject.toml:7` requires `>=3.11`; `pyrightconfig` /
+  CI also pin 3.11). The dev environment in this repo runs Python
+  3.13 / 3.14 successfully — Inference: 3.11 is the *floor*, newer
+  3.x versions work in practice.
+- **`uv`** package manager `>= 0.11.7` (CI pins this exact version
+  in `.github/workflows/ci.yml`). Install via `pipx install uv` or
+  the `uv` binary; do not `curl | sh`.
+- **git** (for branch / PR workflow).
+- **Optional, for live LLM smoke**: provider API keys —
+  `OLLAMA_API_KEY`, `OPENROUTER_API_KEY`, `AZURE_OPENAI_KEY` (+
+  `AZURE_ENDPOINT`, `AZURE_DEPLOYMENT`). Stub-mode tests do NOT
+  need any keys.
+- **Optional, for postgres deployments**: install
+  `pip install asr[postgres]` to pull `langgraph-checkpoint-postgres`
+  and `psycopg-pool`. SQLite is the default and CI-tested path.
+
+## Install
+
+From a clean checkout:
+
+```bash
+git clone <repo-url>
+cd asr
+uv sync --frozen --extra dev
+```
+
+`--frozen` forbids re-resolving — installs the exact set pinned in
+`uv.lock` with hash verification (HARD-02 reproducibility gate). For
+fully air-gapped install with an internal mirror, see
+[`docs/AIRGAP_INSTALL.md`](AIRGAP_INSTALL.md).
+
+`--extra dev` pulls test runner, type checker, and linters per
+`pyproject.toml:42-50`.
+
+## Run
+
+Two entry points share the same orchestrator service.
+
+### CLI / API
+
+```bash
+uv run python -m runtime --config config/incident_management.yaml
+```
+
+Boots the long-lived `OrchestratorService` and FastAPI surface (`/sessions/*`
+REST, SSE, WebSocket). Source: `src/runtime/__main__.py`.
+
+### Streamlit UI
+
+```bash
+ASR_LOG_LEVEL=INFO uv run streamlit run src/runtime/ui.py --server.port 37777
+```
+
+`ASR_LOG_LEVEL` env var enables structured logs at the chosen level
+(`DEBUG` / `INFO` / `WARNING` / `ERROR`). Source:
+`src/runtime/ui.py:46-65` (`_maybe_configure_logging`).
+
+The UI binds to the same `OrchestratorService` instance as the CLI;
+both can run in the same process (Streamlit script imports the
+service lazily on first session).
+
+## Test
+
+```bash
+# Full suite
+uv run pytest -x
+
+# Without coverage (faster)
+uv run pytest -x --no-cov
+
+# A single file or test
+uv run pytest tests/test_interrupt_detection.py -x -v
+
+# With coverage gate (fails below 85%)
+uv run pytest --cov=src/runtime --cov-fail-under=85 -x
+```
+
+Pytest config: `pyproject.toml:53-58` — `asyncio_mode = "auto"`,
+`testpaths = ["tests"]`, `pythonpath = ["src", "."]`.
+
+Coverage omits: `src/runtime/ui.py`,
+`src/runtime/__main__.py`, `src/runtime/checkpointer_postgres.py`,
+`src/runtime/triggers/transports/plugin.py`
+(`pyproject.toml:71-76`).
+
+## Lint + type check
+
+```bash
+uv run ruff check src/ tests/
+uv run pyright src/runtime
+```
+
+CI runs both with `fail-on-error`.
+
+## Concept-leak ratchet
+
+```bash
+python scripts/check_genericity.py            # current count
+python scripts/check_genericity.py --baseline 39  # exit non-zero if exceeded
+```
+
+Enforced by `tests/test_genericity_ratchet.py` — the count must stay
+at or below `BASELINE_TOTAL` (currently 39).
+
+## Bundle regeneration
+
+After ANY change to `src/runtime/` or `examples/*/`:
+
+```bash
+uv run python scripts/build_single_file.py
+git add dist/
+```
+
+CI's "Bundle staleness gate (HARD-08)" rebuilds and fails the build
+if `dist/*` doesn't match. See `docs/DEVELOPMENT.md` for the full
+flow.
+
+## Required services
+
+Default config uses local-only services:
+
+- **SQLite** at `/tmp/asr.db` (auto-created on first run)
+- **FAISS** vector index at `/tmp/asr-faiss/` (auto-created)
+- **Ollama Cloud** (when `llm.default` points there) — needs
+  `OLLAMA_API_KEY`
+
+To start fresh after testing:
+```bash
+rm /tmp/asr.db /tmp/asr.db-wal /tmp/asr.db-shm
+rm -rf /tmp/asr-faiss
+```
+
+## Common setup issues
+
+| Symptom | Cause | Fix |
+|---|---|---|
+| `ModuleNotFoundError: runtime` when running tests | `pythonpath` not picked up | Run via `uv run pytest …` (NOT bare `pytest`); pytest reads `[tool.pytest.ini_options].pythonpath` from `pyproject.toml` |
+| CI fails "Lockfile freshness gate" | `pyproject.toml` changed without `uv lock` | Run `uv lock` and commit `uv.lock` |
+| CI fails "Bundle staleness gate" | `src/runtime/` or `examples/*/` changed without `dist/` regen | Run `uv run python scripts/build_single_file.py` and commit `dist/*` |
+| Live LLM tests fail with `Error code: 402` (OpenRouter) | Account out of credits | Switch `llm.default` to `gpt_oss` (Ollama) or another model in `config/config.yaml` |
+| Live LLM tests fail with `Connection error` (Azure) | `.env` `AZURE_ENDPOINT` is a placeholder or unreachable | Set a real Azure endpoint, or skip Azure leg (test gates on `AZURE_OPENAI_KEY` + `AZURE_ENDPOINT` per `tests/test_integration_driver_s1.py`) |
+| Streamlit dies on every Python 3.14 WebSocket request with `AssertionError: scope["type"] == "http"` | Streamlit ≤ x + Starlette static-files compat bug under Python 3.14 | Cosmetic — HTTP traffic still works. Filter logs with `grep -v "AssertionError\|scope.*type"`. Inference: fixed in newer Streamlit; not yet retested. |
+| `gpt-oss:20b` returns errors / no envelope | Model dropped the markdown contract | Path 6 permissive synthesis (`turn_output.py`) emits a 0.30-confidence placeholder so the session still finalizes; retry the session for a real envelope |
+
+## Environment variables
+
+| Var | Required when | Notes |
+|---|---|---|
+| `OLLAMA_API_KEY` | `ollama_cloud` provider used | `config/config.yaml` references via `${OLLAMA_API_KEY}` |
+| `OPENROUTER_API_KEY` | OpenRouter provider used | |
+| `AZURE_OPENAI_KEY` | Azure provider used | |
+| `AZURE_ENDPOINT` | Azure provider used | Full URL incl. trailing `/` |
+| `AZURE_DEPLOYMENT` | Azure provider used | Defaults to `gpt-4o` in test driver |
+| `EXTERNAL_MCP_URL` | external HTTP MCP server configured | (see `tests/fixtures/sample_config.yaml`) |
+| `EXT_TOKEN` | external HTTP MCP server with bearer auth | |
+| `ASR_LOG_LEVEL` | optional | `DEBUG` / `INFO` / `WARNING` / `ERROR`; UI uses it via `force=True` basicConfig |
+| `APP_CONFIG` | optional | overrides default `config/config.yaml` path; read by `src/runtime/ui.py:68` |
+| `OLLAMA_LIVE` | optional | gates live-LLM smoke tests in `tests/test_llm_providers_smoke.py` |
+| `OLLAMA_BASE_URL` | required for `tests/test_integration_driver_s1.py` | Typically `https://ollama.com` for cloud or `http://localhost:11434` for local |
+
+CI uses dummy values for the API keys (see `ci.yml` — they only need
+to *exist* for the strict-mode `_interpolate` check; tests don't call
+live providers).
diff --git a/docs/02-architecture.md b/docs/02-architecture.md
new file mode 100644
index 0000000..146f3fc
--- /dev/null
+++ b/docs/02-architecture.md
@@ -0,0 +1,146 @@
+# 02 — Architecture
+
+> Companion to [`docs/DESIGN.md`](DESIGN.md), which carries the
+> long-form design narrative. This file is the quick-scan summary.
+
+## Major components
+
+```
++------------------------------------------------------------+
+| App layer (examples/incident_management, examples/code_review)
+| - state.py, config.py, skills/, mcp_server.py, ui.py       |
++------------------------------------------------------------+
+| Framework — runtime/                                       |
+| - Session, Skill, AgentRun, ToolCall, AgentTurnOutput      |
+| - Orchestrator, OrchestratorService                        |
+| - Gateway (wrap_tool), policies, ToolRegistry              |
+| - SessionStore, HistoryStore, EventLog                     |
+| - graph.py: build_graph + make_agent_node                  |
+| - llm.py: provider abstraction                             |
+| - ui.py: Streamlit shell                                   |
+| - api.py: FastAPI surface                                  |
++------------------------------------------------------------+
+| LangGraph 1.x  (orchestration / state / checkpointing)     |
+| LangChain 1.x  (chat models, agents.create_agent, tools)   |
+| FastMCP        (in-process / stdio / http MCP servers)     |
++------------------------------------------------------------+
+| Providers: Ollama Cloud · OpenRouter · Azure OpenAI · …    |
++------------------------------------------------------------+
+```
+
+| Component | Source | Responsibility |
+|---|---|---|
+| `Session` (model) | `src/runtime/state.py:70-172` | Lifecycle + telemetry fields. Apps subclass. |
+| `Skill` (config) | `src/runtime/skill.py` | YAML-driven agent declaration: kind, model, tools, routes, system_prompt |
+| `Orchestrator` | `src/runtime/orchestrator.py` | Owns compiled langgraph + SessionStore + per-session lock |
+| `OrchestratorService` | `src/runtime/service.py` | Long-lived asyncio loop wrapper. Thread-safe `submit_async` / `submit_and_wait` bridge |
+| `make_agent_node` | `src/runtime/graph.py:539+` and `src/runtime/agents/responsive.py:49+` | Builds one langgraph node per skill |
+| `_drive_agent_with_resume` | `src/runtime/graph.py:202+` | Drives `langchain.agents.create_agent` executor with HITL pause/resume |
+| `wrap_tool` (Gateway) | `src/runtime/tools/gateway.py:224+` | Risk-rated tool wrapper; injects session-derived args; raises `interrupt()` on high-risk |
+| `parse_envelope_from_result` | `src/runtime/agents/turn_output.py` | 6-path envelope parser (markdown-primary, with synthesis fallbacks) |
+| `SessionStore` | `src/runtime/storage/session_store.py` | CRUD over `IncidentRow` + FAISS write-through |
+| `HistoryStore` | `src/runtime/storage/history_store.py` | Read-only similarity search over the same engine |
+| `EventLog` | `src/runtime/storage/event_log.py` | Append-only `session_events` table |
+| `ApprovalWatchdog` | `src/runtime/tools/approval_watchdog.py` | Background task that times out stale `pending_approval` rows |
+
+## Request / data flow (one session, end-to-end)
+
+```
+UI / API ──start_session(query, environment, …)──▶ OrchestratorService
+                                                       │
+                                                       ▼
+                                  Orchestrator (per-session lock)
+                                                       │
+                              new IncidentRow inserted ▼
+                                  langgraph compiled graph (Pregel)
+                                                       │
+                              ┌────────────────────────┴────────────────┐
+                              │ for each skill in topological order:    │
+                              │   make_agent_node:                      │
+                              │     reload session from store           │
+                              │     wrap tools (gateway)                │
+                              │     create_agent (langgraph subgraph)   │
+                              │     _drive_agent_with_resume:           │
+                              │       inner.ainvoke(messages)           │
+                              │       if __interrupt__:                 │
+                              │         raise GraphInterrupt → outer    │
+                              │         pauses; UI sees pending_approval│
+                              │       else parse envelope, record       │
+                              │         AgentRun, route on signal       │
+                              │   gate node (low-confidence)?           │
+                              │   route to next skill / __end__         │
+                              └─────────────────────────────────────────┘
+                                                       │
+                                  finalize: terminal-tool match? ▼
+                                  default_terminal_status?
+                                  agent failure? → status='error'
+                                  paused on HITL? → SKIP finalize
+```
+
+Detailed contract: `docs/DESIGN.md` § 4 + § 7.
+
+## Storage choices
+
+| Store | Backend | Default URL / path | Owner |
+|---|---|---|---|
+| Session metadata | SQLAlchemy (SQLite or Postgres) | `sqlite:////tmp/asr.db` | `SessionStore`, `HistoryStore` |
+| Vector similarity | FAISS (filesystem) | `/tmp/asr-faiss/` | `SessionStore._add_vector` |
+| LangGraph checkpoints | `langgraph-checkpoint-sqlite` (default) or `langgraph-checkpoint-postgres` (opt-in) | Same SQLite DB as session metadata | `make_checkpointer` (`src/runtime/checkpointer.py`) |
+| Event log | SQLAlchemy `session_events` table | Same SQLite DB | `EventLog.append` |
+| Memory layers (incident_management only) | Filesystem JSON | `incidents/{kg,releases,playbooks}/` (or seed bundle) | `examples/incident_management/asr/*_store.py` |
+| Lesson store (auto-learning) | SQLAlchemy `session_lessons` table | Same SQLite DB | `LessonStore` |
+
+The whole framework runs against ONE durable backend (SQLite or
+Postgres) carrying four separate concerns. Apps don't get to choose
+backends per-store — the storage URL is a single config knob.
+
+## External systems
+
+The runtime in production reaches out to:
+
+- **LLM providers** (variable): Ollama Cloud, Azure OpenAI,
+  OpenAI-compatible endpoints (OpenRouter, etc.). Configured per
+  `llm.providers` in `config/config.yaml`. Stub provider for tests.
+- **MCP servers**: in-process (Python module) by default; `stdio`
+  and `http` transports also supported per `mcp.servers[*].transport`.
+  Schema: `MCPServerConfig` in `src/runtime/config.py`.
+- **APScheduler** (in-process): drives nightly `LessonRefresher`
+  jobs and any `schedule:` triggers from the trigger registry.
+
+The runtime does NOT reach out to:
+
+- The public internet at boot or runtime in air-gapped deploys —
+  every provider URL is configurable; the hardcoded
+  `https://ollama.com` fallback was removed in Phase 13 (HARD-05).
+- Any package mirror at deploy time — the deploy is copy-only;
+  `uv sync` runs at *install* time inside CI / the dev box.
+
+## Important tradeoffs
+
+| Decision | Trade | Where decided |
+|---|---|---|
+| LangGraph as orchestration engine | Don't maintain a graph engine; pay for langgraph version churn | `docs/DESIGN.md` DEC-001 |
+| `langchain.agents.create_agent` for the per-agent loop | Single tool-loop with native ToolStrategy fallback; we're tied to langchain v1.x's agent API | `docs/DESIGN.md` DEC-002, Phase 15 |
+| Markdown contract over `response_format` JSON | Lenient parsing in our code; 7 parse paths instead of 1 schema | DEC-003, Phase 22 |
+| Pure-policy HITL gating | One source of truth (`should_gate`); everywhere else just calls it | DEC-004, Phase 11 |
+| Generic `Session` + `extra_fields` JSON | Apps can extend without schema migrations; loses some type safety on app fields | DEC-005, v1.1 |
+| Per-agent `skill.model` override | Cheap models for cheap agents; one config to think about | DEC-006, v1.5-C |
+| Single-file bundle | Air-gap deployable; large files for review (~600KB each) | DEC-007, BUNDLER-01 |
+| Concept-leak ratchet | CI gate keeps framework generic; some legitimate `incident` references look like leaks until cleaned | DEC-008, v1.5-B |
+| 429 separate retry regime (longer backoff) | Free-tier OpenRouter survives transient throttles; non-429 4xx still fail fast | DEC-009, v1.5-D |
+| Inner agent checkpointer + reload-on-entry | HITL Approve/Reject actually drives the gated tool; more state per agent invocation | DEC-010, PR #6 |
+
+## What this architecture is NOT
+
+- **Not a workflow engine** — agents are LLM-driven, not declarative
+  state machines. Routing is signal-based, not condition-tree.
+- **Not multi-tenant by default** — one process, one orchestrator,
+  one storage URL. Multi-tenant deployments need a separate
+  process/DB per tenant.
+- **Not horizontally scalable** — `OrchestratorService` is a
+  single-process / single-loop model. The lock registry
+  (`SessionLockRegistry`) prevents concurrent writes per session
+  but assumes one orchestrator per DB.
+- **Not authenticated** — there's no built-in user authentication on
+  the FastAPI surface. Air-gap deploys live behind corporate
+  network controls; trigger webhook auth is bearer-token only.
diff --git a/docs/03-code-map.md b/docs/03-code-map.md
new file mode 100644
index 0000000..1fbb72d
--- /dev/null
+++ b/docs/03-code-map.md
@@ -0,0 +1,169 @@
+# 03 — Code map
+
+> Approximate line counts and per-folder purpose. Verify with
+> `find <path> -name '*.py' -exec wc -l {} +`.
+
+## Top-level directories
+
+| Path | Purpose |
+|---|---|
+| `src/runtime/` | Framework code — the only thing the bundler reads to produce `dist/app.py` |
+| `examples/incident_management/` | Flagship example app: SRE incident investigation pipeline |
+| `examples/code_review/` | Second example app: PR review pipeline (proves framework genericity) |
+| `tests/` | Pytest suite (149 test files; 1265 tests; 87% coverage on `src/runtime/`) |
+| `scripts/` | Build + lint utilities |
+| `config/` | Default framework config + per-app config + skill prompt directory |
+| `docs/` | This documentation set + DESIGN narrative + dev / install how-tos |
+| `dist/` | **Generated** by `scripts/build_single_file.py`; never hand-edit |
+| `ui/` | Streamlit launcher shim (`streamlit_app.py`) |
+| `.github/workflows/` | CI: lint + type-check + test + sonar (`ci.yml`) |
+| `.planning/` | **Gitignored** local working state (GSD planning workflow); selected artifacts can be committed but rarely should be |
+
+Top-level files:
+
+| File | Purpose |
+|---|---|
+| `pyproject.toml` | Project metadata, deps, pytest/ruff/pyright/coverage config |
+| `uv.lock` | Pinned dependency graph with hashes — reproducible installs |
+| `pyrightconfig.json` | Pyright typing config (CI gate fails on errors per `ci.yml`) |
+| `sonar-project.properties` | SonarCloud analysis config (sources, exclusions, CPD exclusions, coverage paths) |
+| `README.md` | Repo intro pointing at `docs/DESIGN.md` |
+
+---
+
+## `src/runtime/` (~18 200 lines total)
+
+### Top-level modules
+
+| File | LOC | Purpose | Related |
+|---|---|---|---|
+| `__main__.py` | ~70 | argparse-only CLI entry: `python -m runtime --config <yaml>` | `orchestrator.py`, `service.py`, `api.py` |
+| `__init__.py` | 0 | empty | |
+| `state.py` | 173 | `Session`, `AgentRun`, `ToolCall`, `TokenUsage` pydantic models | All app `Session` subclasses extend the model here |
+| `state_resolver.py` | ~70 | Loads the app's `state_class` from a dotted path (`runtime.state_class` config) | Wave-2 generic-runtime decoupling |
+| `skill.py` | ~520 | `Skill`, `RouteRule`, `DispatchRule`, skill loader (reads YAML + system.md per skill folder) | `examples/*/skills/*/config.yaml` |
+| `config.py` | ~1100 | All pydantic config schemas: `AppConfig`, `LLMConfig`, `MCPConfig`, `OrchestratorConfig`, `GatewayConfig`, `GatePolicy`, etc. | `config/config.yaml` |
+| `errors.py` | ~50 | Typed exceptions: `LLMTimeoutError`, `LLMConfigError`, `EnvelopeMissingError` |
+| `llm.py` | ~600 | `get_llm`, `get_embedding` — provider abstraction + `StubChatModel` for tests | `langchain-openai`, `langchain-ollama` |
+| `mcp_loader.py` | ~270 | Loads MCP servers per `mcp.servers[*]`, builds `ToolRegistry` | `fastmcp`, `langchain-mcp-adapters` |
+| `orchestrator.py` | ~1400 | `Orchestrator` class — `start_session`, `stream_session`, `resume_session`, `retry_session`, `_finalize_session_status_async`, `_is_graph_paused` | `graph.py`, `service.py` |
+| `service.py` | ~830 | `OrchestratorService` — long-lived asyncio loop; thread-safe bridge | Used by both UI + API |
+| `api.py` | ~880 | FastAPI surface — `/sessions/*` REST + SSE + WebSocket + approvals | `service.py` |
+| `api_dedup.py` | ~110 | API endpoint for retracting dedup matches | `dedup.py` |
+| `graph.py` | ~1430 | LangGraph build (`build_graph`, `_build_agent_nodes`), `make_agent_node`, `_drive_agent_with_resume`, `_ainvoke_with_retry`, `parse_envelope_from_result` callers | `langgraph`, `langchain.agents.create_agent` |
+| `intake.py` | ~250 | Default intake supervisor runner — similarity retrieval + dedup gate | `dedup.py`, `LessonStore` |
+| `dedup.py` | ~430 | Two-stage dedup pipeline (embedding similarity + LLM stage 2) | `HistoryStore` |
+| `similarity.py` | ~50 | Cosine similarity helper |
+| `policy.py` | ~270 | Pure functions — `should_gate`, `should_retry`, gate decision dataclass | `gateway.py`, `orchestrator.py` |
+| `locks.py` | ~120 | `SessionLockRegistry` — per-session asyncio locks; `SessionBusy` exception; D-01 contract | `orchestrator.py`, `service.py` |
+| `checkpointer.py` | ~120 | LangGraph checkpointer factory (sqlite default); `make_checkpointer` | `langgraph-checkpoint-sqlite` |
+| `checkpointer_postgres.py` | ~80 | Postgres checkpointer (lazy-imported; `pip install asr[postgres]`) | `langgraph-checkpoint-postgres` |
+| `dedup.py` | ~430 | Two-stage dedup (embedding + LLM) | (above) |
+| `terminal_tools.py` | ~80 | Maps terminal tool names → status transitions per `cfg.orchestrator.terminal_tools` |
+| `skill_validator.py` | ~110 | Validates `skill.model` references against `LLMConfig.models` at orchestrator boot |
+
+### Subpackages
+
+| Path | Purpose | Key files |
+|---|---|---|
+| `agents/` | Agent-kind factories | `responsive.py` (default LLM agent — mirrors `graph.py:make_agent_node`), `supervisor.py` (rule/llm dispatch), `monitor.py` (out-of-band runner), `turn_output.py` (envelope parser + `AgentTurnOutput` model) |
+| `tools/` | Gateway + arg-injection + watchdog | `gateway.py` (~830 lines — risk-rated wrap + interrupt/resume), `arg_injection.py` (session-derived args), `approval_watchdog.py` (~320 lines — stale-approval timeout), `__init__.py` |
+| `storage/` | Persistence | `models.py` (SQLAlchemy `IncidentRow` + `EventRow` + `LessonRow`), `engine.py` (engine factory), `embeddings.py` (FAISS-backed embedder), `vector.py` (vector store), `session_store.py` (~660 lines — CRUD), `history_store.py` (~230 lines — read-only similarity), `event_log.py` (~135 lines), `lesson_store.py` (~150 lines), `migrations.py` (~210 lines), `checkpoint_gc.py` (~50 lines) |
+| `learning/` | Auto-learning (M5/M6) | `extractor.py` (lesson extraction at finalize), `scheduler.py` (~160 lines — APScheduler nightly refresher) |
+| `memory/` | App-overridable memory hooks | `session_state.py`, `hypothesis.py` (triage hypothesis loop), `knowledge_graph.py`, `release_context.py`, `playbook_store.py`, `resolution.py` — these are runtime-agnostic helpers; the L2/L5/L7 stores in `examples/incident_management/asr/` use them |
+| `triggers/` | Trigger registry | `base.py` (TriggerTransport ABC), `config.py`, `registry.py` (~320 lines), `idempotency.py` (~210 lines), `auth.py` (bearer), `resolve.py`, `transports/api.py`, `transports/webhook.py` (~140 lines), `transports/schedule.py` (~85 lines), `transports/__init__.py` |
+
+---
+
+## `examples/incident_management/`
+
+| File | Purpose |
+|---|---|
+| `__init__.py` | empty |
+| `state.py` | `IncidentState(Session)` subclass — `query`, `environment`, `reporter`, `summary`, `tags`, `severity`, `category`, `matched_prior_inc`, `resolution`, `memory: MemoryLayerState` |
+| `mcp_server.py` | `IncidentMCPServer` — `lookup_similar_incidents`, `create_incident`, `update_incident`, `submit_hypothesis`, `mark_resolved`, `mark_escalated`, `hydrate_and_gate` (memory hydration + dedup gate) |
+| `mcp_servers/observability.py` | Observability tools: `get_logs`, `get_metrics`, `get_service_health`, `check_deployment_history` |
+| `mcp_servers/remediation.py` | Remediation tools: `propose_fix`, `apply_fix` (gated `high`), `notify_oncall` |
+| `mcp_servers/user_context.py` | User-context tools |
+| `asr/` | L2 Knowledge Graph + L5 Release Context + L7 Playbook stores (filesystem-backed) |
+| `skills/intake/` | Supervisor skill: rule-dispatch to triage; runs similarity + memory hydration |
+| `skills/triage/` | Hypothesis-loop investigator |
+| `skills/deep_investigator/` | Evidence gathering |
+| `skills/resolution/` | Propose / apply fix or escalate |
+| `skills/_common/` | Shared prompt fragments (output contract, confidence calibration) |
+
+Per-skill structure: `<skill>/config.yaml` + `<skill>/system.md`.
+
+---
+
+## `examples/code_review/`
+
+| File | Purpose |
+|---|---|
+| `state.py` | `CodeReviewState(Session)` — `pr: PullRequest`, `review_findings: list[ReviewFinding]`, `overall_recommendation`, `review_summary`, `review_token_budget` |
+| `mcp_server.py` | `CodeReviewMCPServer` — `fetch_pr_diff` (mock), `add_review_finding`, `set_recommendation` |
+| `skills/intake/` `analyzer/` `recommender/` | 3-skill responsive pipeline |
+
+Demonstration / mock; the diff fetch reads `tests/fixtures/code_review/<repo>/<number>.json` if present.
+
+---
+
+## `tests/` (149 files)
+
+Test groups by topic (sample):
+
+| Pattern | Topic |
+|---|---|
+| `test_agent_node*.py`, `test_real_llm_tool_loop_termination.py`, `test_integration_driver_s1.py` | Agent runner contract, live-LLM smoke |
+| `test_interrupt_detection.py`, `test_gateway_persist_resolution.py`, `test_orchestrator_pause_detection.py`, `test_approval_*.py` | HITL approve/reject end-to-end |
+| `test_markdown_turn_output.py` | Phase 22 envelope parser (36 tests) |
+| `test_ainvoke_retry_429.py` | 429 retry backoff regime |
+| `test_per_agent_model_dispatch.py` | v1.5-C per-agent dispatch contract |
+| `test_genericity_ratchet.py`, `test_concept_leak_ratchet.py` | Framework-leak counters |
+| `test_session_store.py`, `test_incident_store.py`, `test_history_store.py`, `test_dedup_*.py` | Storage layer |
+| `test_telemetry_integration.py`, `test_event_log.py` | Per-step events |
+| `test_api*.py`, `test_approval_api.py`, `test_session_lock.py` | FastAPI surface + lock contract |
+| `test_bundle_*.py`, `test_build_*.py` | Bundler + bundle completeness |
+| `test_triggers/` | Trigger registry transports |
+| `test_ui_*.py`, `test_render_*.py` | Streamlit UI helpers |
+| `test_skill*.py` | Skill loader, model override resolution |
+
+Helpers: `tests/_envelope_helpers.py`, `tests/_policy_helpers.py`,
+`tests/conftest.py` (if present), `tests/fixtures/` (sample
+configs, mock PR diffs).
+
+---
+
+## `scripts/`
+
+| Script | Purpose |
+|---|---|
+| `build_single_file.py` | The bundler. Reads `RUNTIME_MODULE_ORDER` + per-app order lists, flattens into `dist/`. **Must run after any change to `src/runtime/` or `examples/`** |
+| `check_genericity.py` | Counts `incident` / `severity` / `reporter` tokens in `src/runtime/`. Powers the ratchet test |
+| `lint_skill_prompts.py` | Phase 21 (SKILL-LINTER-01) — walks every `examples/*/skills/*/system.md` and asserts referenced tool names + arg fields exist in the inventory |
+| `migrate_jsonl_to_sql.py` | One-off migration for legacy JSONL incident store → SQLAlchemy |
+| `seed_demo_incidents.py` | Seeds the FAISS index + sqlite DB with demo data for UI walkthroughs |
+
+---
+
+## `config/`
+
+| File | Purpose |
+|---|---|
+| `config.yaml` | Default framework config — LLM providers + models, MCP servers, storage URL, trigger registry, gateway policy |
+| `config.yaml.example` | Annotated template for new deploys |
+| `incident_management.yaml` | Incident-app composite config (framework + app keys) |
+| `code_review.yaml`, `code_review.runtime.yaml` | Code-review composite config |
+| `skills/` | Optional shared skill prompts (rare; usually skills live under `examples/<app>/skills/`) |
+
+---
+
+## `docs/`
+
+| File | Purpose |
+|---|---|
+| `DESIGN.md` | Long-form architecture + decisions narrative |
+| `DEVELOPMENT.md` | Day-to-day dev workflow |
+| `AIRGAP_INSTALL.md` | Air-gap install procedure |
+| `00-…` through `11-…` | This brownfield documentation set (you're reading it) |
+| `adr/0001-…` | Architecture Decision Record |
diff --git a/docs/04-main-flows.md b/docs/04-main-flows.md
new file mode 100644
index 0000000..bfd2e36
--- /dev/null
+++ b/docs/04-main-flows.md
@@ -0,0 +1,288 @@
+# 04 — Main flows
+
+For each flow: **entry points**, **key files**, and **failure
+modes**. Companion to `docs/DESIGN.md` § 2 (architecture overview)
+and § 7 (HITL).
+
+---
+
+## Auth / login
+
+**Status: not present in framework.** Air-gap deploys rely on
+corporate network controls (the runtime never opens its own
+auth surface).
+
+The only auth surface in the framework is **bearer-token
+auth on webhook trigger endpoints** (`auth: bearer` in
+`triggers:` config; token read from env var at startup; constant-
+time comparison via `hmac.compare_digest`).
+
+Entry point: `src/runtime/triggers/auth.py`,
+`src/runtime/triggers/transports/webhook.py`.
+
+Failure modes:
+- Missing/empty token env → trigger refuses to start
+  (`LLMConfigError` analogue at config-load)
+- Wrong bearer → `HTTP 401`
+- Timing-safe comparison only — no rate limiting in the framework
+
+---
+
+## Session lifecycle (request → terminal)
+
+Entry points (any of):
+
+- **CLI** — `python -m runtime --config <yaml>` (boots the FastAPI surface)
+- **API** — `POST /sessions` (`src/runtime/api.py`)
+- **Streamlit UI** — "Start Investigation" button (`src/runtime/ui.py`)
+- **Webhook trigger** — `POST /triggers/{name}` (configured per `triggers:` block in YAML)
+- **Schedule trigger** — APScheduler cron (in-process)
+- **Plugin trigger** — custom transport via setuptools entry-point
+
+All entry points converge on
+`OrchestratorService.start_session(query=…, environment=…, …)`,
+which:
+
+1. Allocates the session ID synchronously on the loop
+2. Inserts the row (`status='new'`)
+3. Spawns an `asyncio.Task` for `Orchestrator.graph.ainvoke(...)`
+4. Returns the session ID immediately (caller polls or streams)
+
+Key files:
+
+- `src/runtime/service.py:start_session` (entry point)
+- `src/runtime/orchestrator.py:start_session` (per-session lock + graph kick-off)
+- `src/runtime/graph.py:make_agent_node` (per-skill agent step)
+- `src/runtime/agents/turn_output.py:parse_envelope_from_result` (envelope contract enforcement)
+- `src/runtime/orchestrator.py:_finalize_session_status_async` (terminal status assignment)
+
+Per-step events emitted to `EventLog`:
+`agent_started → tool_invoked* → confidence_emitted → route_decided
+→ agent_finished` per agent; `gate_fired` at HITL boundaries;
+`status_changed` on terminal transitions.
+
+Failure modes:
+
+| What | Symptom | Where caught |
+|---|---|---|
+| LLM 5xx / connection reset | Retried 3× with 1.5s/3s/4.5s backoff | `_ainvoke_with_retry` |
+| LLM 429 rate-limit | Retried 3× with 7.5s/15s/22.5s backoff | `_ainvoke_with_retry` |
+| LLM 4xx (non-429) | Fail immediately → `_handle_agent_failure` → `status='error'` | `make_agent_node` exception arm |
+| LLM dropped markdown contract (no envelope) | Path 5 (terminal-tool args) → Path 6 (permissive synthesis) → 0.30-confidence placeholder | `parse_envelope_from_result` |
+| LLM dropped contract AND no tool calls | Hard fail → `EnvelopeMissingError` → `status='error'` | Path 7 |
+| HITL high-risk tool gate fires | `interrupt()` raised, session stays `in_progress`, pending_approval row written | `gateway.wrap_tool` |
+| Operator times out an approval | `ApprovalWatchdog` resolves with `verdict=timeout` | `tools/approval_watchdog.py` |
+| Stale-version save (concurrent writers) | `StaleVersionError` raised; caller reloads + retries | `SessionStore.save` |
+| Recursion limit hit on inner agent | LangGraph `GraphRecursionError` propagates → `_handle_agent_failure` | langgraph default bound |
+
+---
+
+## HITL approve / reject (high-risk tool)
+
+Trigger: an agent calls a tool tagged `high` in
+`runtime.gateway.policy` (or matching `gate_policy.resolution_trigger_tools`
+in production env).
+
+Flow:
+
+```
+agent calls apply_fix
+  └─ gateway _arun
+       ├─ inject session-derived args (e.g. environment)
+       ├─ should_gate → GateDecision(gate=True, reason=…)
+       ├─ append ToolCall(status='pending_approval') + store.save
+       └─ langgraph.types.interrupt(payload)  ← pauses inner agent
+            ↓
+inner.ainvoke returns with __interrupt__ in result dict
+  └─ _drive_agent_with_resume detects, raises GraphInterrupt
+       ↓
+outer Pregel pauses (state checkpointed)
+  └─ ainvoke returns with __interrupt__ on outer state
+       ↓
+finalize SKIPPED (Orchestrator._is_graph_paused → True)
+       ↓
+[UI / API: operator clicks Approve or POSTs to /approvals/{tcid}]
+       ↓
+graph.ainvoke(Command(resume={"decision": "approve", ...}))
+  └─ outer node re-runs
+       └─ _drive_agent_with_resume: aget_state(inner_cfg).next non-empty
+            └─ outer interrupt() → returns the verdict dict
+                 └─ inner.ainvoke(Command(resume=verdict), config=inner_cfg)
+                      └─ gateway _arun re-enters
+                           └─ verdict == "approve" → run apply_fix
+                           └─ update pending row → status='approved' + save
+                                ↓
+inner agent finishes
+  └─ envelope parsed, AgentRun recorded, route to next node / END
+       ↓
+outer ainvoke returns
+  └─ finalize runs (no longer paused) → terminal status set
+```
+
+Key files:
+
+- `src/runtime/tools/gateway.py:_arun` (and `_run` mirror) — the
+  pause + resume entry points
+- `src/runtime/graph.py:_drive_agent_with_resume` — the
+  langgraph 1.x `__interrupt__` plumbing
+- `src/runtime/orchestrator.py:_is_graph_paused` — finalize guard
+- `src/runtime/api.py:submit_approval_decision` — HTTP approval handler
+- `src/runtime/ui.py:_submit_approval_via_service` — UI approval handler
+- `src/runtime/tools/approval_watchdog.py` — stale-approval timeout
+
+Failure modes:
+
+| What | Symptom | Where caught |
+|---|---|---|
+| `Command(resume=…)` raises `Cannot use Command(resume=...) without checkpointer` | Inner agent missing checkpointer | Inner `create_agent` always gets `checkpointer=` per PR #6 |
+| Stale `state["session"]` on resume → gateway double-appends → `StaleVersionError` | Outer Pregel checkpoint at step boundaries doesn't reflect mid-step gateway saves | `make_agent_node` reloads from store at entry per PR #6 |
+| Operator approves but DB row stays `pending_approval` | Gateway didn't save after status transition | `_record_pending_resolution` saves after every transition (approved/rejected/timeout) per PR #6 |
+| Session goes to `error` instead of resuming | Pre-PR-#6 langgraph 1.x silently swallowed `interrupt()` and finalized the session | Fixed by `_drive_agent_with_resume` |
+
+---
+
+## Background jobs
+
+### `LessonRefresher` (auto-learning, M5/M6)
+
+Source: `src/runtime/learning/scheduler.py`.
+
+Runs an APScheduler job (default: nightly 02:00 UTC; configurable
+via `learning.scheduler` block in YAML). For each session resolved
+since the last run, extracts a `Lesson` row capturing the winning
+hypothesis + applied fix.
+
+Entry: `LessonRefresher.start()` (called by lifespan hook in
+`src/runtime/api.py`).
+
+Failure modes:
+- Job exception → APScheduler logs and continues to next tick
+  (defensive `try/except` around the per-session extraction)
+- Long-running extraction blocks subsequent ticks within the same
+  scheduler — bounded by per-session timeout
+
+### `ApprovalWatchdog`
+
+Source: `src/runtime/tools/approval_watchdog.py`.
+
+Polls the DB for `pending_approval` rows older than
+`framework.approval_timeout`. Resolves them with
+`verdict=timeout` so operators don't end up with permanently-paused
+sessions.
+
+Entry: `ApprovalWatchdog.start()` (called by lifespan hook).
+
+Failure modes:
+- DB unreachable → logged, retried next tick
+- Resolution race with concurrent operator approval →
+  `StaleVersionError`; watchdog reloads + retries
+
+---
+
+## Data ingestion / sync
+
+### Trigger registry
+
+Source: `src/runtime/triggers/`.
+
+Three transport flavours configurable in `config.yaml`'s
+`triggers:` block:
+
+| Transport | Entry | Per-trigger config |
+|---|---|---|
+| `webhook` | `POST /triggers/{name}` (FastAPI route) | `payload_schema`, `transform`, `auth`, `idempotency_ttl_hours` |
+| `schedule` | APScheduler in-process cron | `schedule:` 5-field cron, `payload:` static |
+| `plugin` | custom (`TriggerTransport` ABC, setuptools entry-point) | per-plugin |
+| `api` | back-compat for `POST /investigate` | (deprecated alias) |
+
+Each trigger fires `OrchestratorService.start_session(...)` with
+a synthetic payload. Provenance stamped on
+`session.findings['trigger']` so dashboards can answer "where did
+this come from?"
+
+Failure modes:
+
+| What | Symptom | Where caught |
+|---|---|---|
+| Transform raises | `HTTP 422 Unprocessable Entity`, NOT cached for idempotency | `transports/webhook.py` |
+| Auth fails | `HTTP 401` | `triggers/auth.py` |
+| Idempotency-Key replay | First request's `session_id` returned | `triggers/idempotency.py` |
+| Schedule drift | ±1 minute under normal load (in-process APScheduler limit) | Inference: not measured; documented in legacy README |
+
+### Two-stage dedup pipeline
+
+Source: `src/runtime/dedup.py`.
+
+Stage 1: embedding similarity over closed sessions
+(`HistoryStore.find_similar`). Stage 2: LLM judge confirms (or
+rejects) the match. Confirmed matches mark the new session
+`status='duplicate'` with `parent_session_id` linkage.
+
+Entry: `Orchestrator._run_dedup_check` called early in
+`start_session`.
+
+Failure modes:
+- LLM stage 2 throws → degrades to "not a duplicate" so dedup
+  never crashes intake (`Orchestrator._run_dedup_check` catches `Exception`)
+- No similar sessions → returns False, normal flow proceeds
+
+---
+
+## Deployment
+
+Source: `scripts/build_single_file.py`,
+`docs/AIRGAP_INSTALL.md`, `docs/DEVELOPMENT.md`.
+
+**Build (CI / dev box):**
+```bash
+uv sync --frozen --extra dev
+uv run python scripts/build_single_file.py
+git add dist/ && git commit
+```
+
+**Deploy (target host, copy-only):**
+
+7-file payload:
+```
+app.py                    (renamed from dist/apps/<app>.py)
+ui.py                     (dist/ui.py)
+config/config.yaml        (framework: LLM, MCP, storage)
+config/<app>.yaml         (app: severity aliases, escalation roster, …)
+config/skills/            (optional skill prompt overrides)
+.env                      (provider keys)
+```
+
+Boot:
+```bash
+python -m runtime --config config/<app>.yaml
+streamlit run ui.py --server.port 37777
+```
+
+CI gate `Bundle staleness gate (HARD-08)` rebuilds the bundles
+from source on every PR and refuses the merge if `dist/*` differs
+from a fresh build. Means `dist/*` on `main` is always
+deploy-ready.
+
+Failure modes:
+
+| What | Symptom | Where caught |
+|---|---|---|
+| New `src/runtime/` module not in `RUNTIME_MODULE_ORDER` | `tests/test_bundle_completeness.py` fails | Local pytest before push |
+| Bundle drift (changed src without dist regen) | CI's "Bundle staleness gate" fails | CI |
+| Bundle doesn't boot from a clean tmpdir | `tests/test_build_single_file.py` smoke check | Local |
+| Lockfile drift | CI's "Lockfile freshness gate" fails | CI (`uv lock --check`) |
+
+---
+
+## Error handling (cross-cutting patterns)
+
+| Pattern | Example | Source |
+|---|---|---|
+| Typed exception hierarchy | `LLMTimeoutError`, `LLMConfigError`, `EnvelopeMissingError`, `SessionBusy`, `StaleVersionError` | `src/runtime/errors.py`, `storage/session_store.py`, `locks.py`, `agents/turn_output.py` |
+| Bounded retries on transient cloud errors | `_ainvoke_with_retry` (5xx + 429) | `src/runtime/graph.py` |
+| Fail-fast on policy errors | `should_gate` raises before tool runs | `src/runtime/policy.py` |
+| Defensive try/except around telemetry | EventLog failures NEVER break a tool call | `gateway.py` `_emit_invoked` |
+| `_handle_agent_failure` for caught LLM exceptions | Marks session `error` + records failure agent_run | `src/runtime/graph.py` |
+| Per-session async lock prevents concurrent writes | `SessionLockRegistry.acquire(session_id)` | `src/runtime/locks.py`, used by `service.py` + `api.py` |
+| Optimistic concurrency on save | `version` column on `IncidentRow`; `StaleVersionError` on mismatch | `storage/session_store.py:save` |
+| Silent-failure sweep (Phase 18 / HARD-04) | All `except Exception: pass` blocks replaced with logged re-raise or typed handler | `tests/test_silent_failure_sweep.py` (Inference: name based on phase) |
diff --git a/docs/05-configuration.md b/docs/05-configuration.md
new file mode 100644
index 0000000..9687f7c
--- /dev/null
+++ b/docs/05-configuration.md
@@ -0,0 +1,285 @@
+# 05 — Configuration
+
+## Layered config
+
+Two layers, in order of precedence:
+
+| Layer | File(s) | Owns |
+|---|---|---|
+| **Framework** | `config/config.yaml` (or `${APP_CONFIG}`) | LLM providers + models, MCP servers, storage URL, gateway policy, framework knobs (confidence threshold, escalation roster, dedup), trigger registry, runtime tunables |
+| **App** | `examples/<app>/config.yaml`, `config/<app>.yaml` (composite) | Domain-specific knobs: severity aliases, escalation teams, environments, similarity thresholds |
+
+Source: `src/runtime/config.py` (~1100 lines) holds every pydantic
+schema. Framework reads + validates at orchestrator boot via
+`load_config(path)`.
+
+The framework's `AppConfig` does **not** contain incident-shaped
+keys — they live on `IncidentAppConfig`. Adding a new domain
+field is a one-line addition to `IncidentAppConfig`, never to
+`runtime.config.AppConfig`.
+
+---
+
+## Environment variables
+
+Used in `config.yaml` via `${VAR_NAME}` interpolation
+(`src/runtime/config.py:_interpolate`). Strict-mode resolver
+**fails at config-load** if a referenced var is missing — this
+is by design, so missing keys can't silently fall through to
+"use default model".
+
+| Var | Used by | Default | Notes |
+|---|---|---|---|
+| `OLLAMA_API_KEY` | `ollama_cloud` provider | none | Required if any `llm.providers.*.kind: ollama` entry references it |
+| `OPENROUTER_API_KEY` | `openai_compat` provider via OpenRouter | none | |
+| `AZURE_OPENAI_KEY` | `azure_openai` provider | none | |
+| `AZURE_ENDPOINT` | `azure_openai` provider | none | Full URL incl. trailing `/` |
+| `AZURE_DEPLOYMENT` | `smart` model in default config | `gpt-4o` (test driver default) | Per-deployment Azure name |
+| `EXTERNAL_MCP_URL` | external HTTP MCP server | none | See `tests/fixtures/sample_config.yaml` |
+| `EXT_TOKEN` | external HTTP MCP server bearer auth | none | |
+| `ASR_LOG_LEVEL` | `src/runtime/ui.py:46-65` | unset (silent) | `DEBUG` / `INFO` / `WARNING` / `ERROR`; takes effect via `force=True` `logging.basicConfig` |
+| `APP_CONFIG` | `src/runtime/ui.py:68` | `config/config.yaml` | Path override |
+| `OLLAMA_LIVE` | `tests/test_llm_providers_smoke.py` | unset (skip) | Set to `1` to opt into live Ollama smoke |
+| `OLLAMA_BASE_URL` | `tests/test_integration_driver_s1.py` | unset | Required for the integration driver `local` arm |
+
+CI config (`.github/workflows/ci.yml:71-83`) sets dummy values for
+all the above so the strict `_interpolate` check passes — tests
+don't call live providers.
+
+---
+
+## Config file: `config/config.yaml`
+
+Top-level structure (see
+`config/config.yaml.example` for an annotated template):
+
+```yaml
+storage:
+  metadata:
+    url: "sqlite:////tmp/asr.db"     # SQLAlchemy URL
+    pool_size: 5                     # postgres only; sqlite uses NullPool
+    echo: false                      # SQL echo to stdout
+  vector:
+    backend: faiss                   # faiss | pgvector | none
+    path: "/tmp/asr-faiss"           # FAISS only
+    collection_name: "incidents"
+    distance_strategy: cosine        # cosine | euclidean | inner_product
+
+llm:
+  default: workhorse                 # name from llm.models below
+  providers:
+    ollama_cloud:
+      kind: ollama
+      base_url: https://ollama.com
+      api_key: ${OLLAMA_API_KEY}
+    azure:
+      kind: azure_openai
+      endpoint: ${AZURE_ENDPOINT}
+      api_version: 2024-08-01-preview
+      api_key: ${AZURE_OPENAI_KEY}
+    openrouter:
+      kind: openai_compat
+      base_url: https://openrouter.ai/api/v1
+      api_key: ${OPENROUTER_API_KEY}
+    stub:
+      kind: stub                     # in-memory canned responses for tests
+  models:
+    workhorse:
+      provider: openrouter
+      model: inclusionai/ring-2.6-1t:free
+      temperature: 0.0
+    gpt_oss:
+      provider: ollama_cloud
+      model: gpt-oss:20b
+      temperature: 0.0
+    gpt_oss_cheap:
+      provider: ollama_cloud
+      model: gpt-oss:20b
+      temperature: 0.4
+    smart:
+      provider: azure
+      model: gpt-4o
+      deployment: gpt-4o
+      temperature: 0.0
+  embedding:
+    provider: ollama_cloud
+    model: nomic-embed-text          # single embedding model
+
+mcp:
+  servers:
+    - name: local_inc
+      transport: in_process          # in_process | stdio | http | sse
+      module: examples.incident_management.mcp_server
+      category: incident_management
+    - name: local_observability
+      transport: in_process
+      module: examples.incident_management.mcp_servers.observability
+      category: observability
+    # ...
+
+runtime:
+  state_class: examples.incident_management.state.IncidentState
+  gateway:
+    policy:                          # tool_name -> low | medium | high
+      apply_fix: high
+      restart_service: medium
+      get_logs: low
+  max_concurrent_sessions: 8         # SessionCapExceeded → HTTP 429
+
+orchestrator:
+  entry_agent: intake                # name of the first skill in the graph
+  default_terminal_status: needs_review
+  signals: [success, failed, needs_input]
+  injected_args:
+    environment: state.environment   # session-derived args injected before LLM-visible signature
+  terminal_tools:                    # tool_name -> status transition rules
+    - tool_name: mark_resolved
+      status: resolved
+      kind: terminal
+    - tool_name: mark_escalated
+      status: escalated
+      kind: escalation
+      extract_fields: { team: args.team }
+  patch_tools: [submit_hypothesis, update_incident]
+  default_llm_request_timeout: 120.0
+
+framework:
+  confidence_threshold: 0.75
+  escalation_teams: [payments-oncall, infra-oncall, ...]
+  approval_timeout: 1800             # seconds; ApprovalWatchdog timeout
+  intake_context: {}                 # generic intake bag
+  session_id_prefix: INC             # apps override (CR for code-review)
+
+dedup:
+  enabled: true
+  stage1_top_k: 5
+  stage1_threshold: 0.82
+  stage2_model: workhorse
+  prompt_template: |                 # LLM judge prompt (defaultable)
+    ...
+
+triggers:                            # optional; trigger registry transports
+  - name: pagerduty-incident
+    transport: webhook
+    target_app: incident_management
+    payload_schema: examples.incident_management.triggers.PagerDutyPayload
+    transform: examples.incident_management.triggers.transform_pagerduty
+    auth: bearer
+    auth_token_env: PAGERDUTY_WEBHOOK_TOKEN
+    idempotency_ttl_hours: 24
+
+learning:
+  scheduler:
+    enabled: true
+    cron: "0 2 * * *"                # nightly 02:00 UTC
+```
+
+Inference: not every block above is required for a minimal boot;
+omitting `triggers` / `dedup` / `learning` is supported (they're
+optional).
+
+---
+
+## Per-skill config
+
+Each skill is a `<skill_dir>/config.yaml` + `<skill_dir>/system.md`
+pair under `examples/<app>/skills/`.
+
+```yaml
+# examples/incident_management/skills/triage/config.yaml
+description: Hypothesis-loop triage agent
+kind: responsive                     # responsive | supervisor | monitor
+model: gpt_oss_cheap                 # optional per-agent override; falls back to llm.default
+tools:
+  local_inc:
+    - submit_hypothesis
+    - update_incident
+  local_observability:
+    - get_logs
+    - get_metrics
+    - get_service_health
+    - check_deployment_history
+routes:
+  - when: success
+    next: deep_investigator
+  - when: needs_input
+    next: __end__
+    gate: confidence
+  - when: default
+    next: deep_investigator
+```
+
+The accompanying `system.md` is the system prompt template. It must
+include the markdown turn-output contract block (see
+`examples/incident_management/skills/_common/output.md`) — failure
+to include it will trip the envelope parser unless gpt-oss
+synthesises something Path 6 can salvage.
+
+---
+
+## Feature flags
+
+There are no first-class feature flags. Toggles are config-driven:
+
+| Toggle | Mechanism |
+|---|---|
+| Disable dedup | `dedup.enabled: false` |
+| Disable auto-learning scheduler | `learning.scheduler.enabled: false` |
+| Disable HITL gating per env | `gate_policy.gated_environments: []` |
+| Disable a tool's risk tier | Remove from `runtime.gateway.policy` (defaults to `auto`) |
+| Disable a trigger | Remove from `triggers:` block; restart |
+| Switch checkpointer to postgres | Install `asr[postgres]`; change `storage.metadata.url` to a postgres URL |
+
+---
+
+## Secrets required (production)
+
+For a typical incident-management deploy:
+
+| Secret | Purpose |
+|---|---|
+| `OLLAMA_API_KEY` (or `OPENROUTER_API_KEY`, etc.) | LLM provider auth |
+| `AZURE_OPENAI_KEY` + `AZURE_ENDPOINT` | If Azure provider used |
+| Webhook bearer tokens (e.g. `PAGERDUTY_WEBHOOK_TOKEN`) | If webhook triggers configured |
+| Postgres credentials in the SQLAlchemy URL | If `storage.metadata.url` points at postgres |
+
+**Do NOT commit secrets.** The framework reads them from env vars
+via `${VAR_NAME}` interpolation; bind them via your deploy's
+secret manager (k8s secret / docker `--env-file` / etc.).
+
+`.env` is gitignored at the repo root. CI uses dummy values.
+
+---
+
+## Safe defaults
+
+The shipped `config/config.yaml.example` documents safe defaults:
+
+- `llm.default: stub_default` — runs without any LLM
+  provider keys (useful for first boot / smoke)
+- `storage.metadata.url: sqlite:///incidents/incidents.db` — local
+  SQLite, no external service
+- `vector.backend: faiss` — local FAISS, no external service
+- No `triggers:` block — trigger registry off; only `POST /sessions`
+  works
+- No `dedup:` block — dedup off
+- No `learning.scheduler.enabled` block — scheduler off
+
+These give a working framework boot with zero external dependencies.
+Production deploys swap in a real LLM provider and (optionally)
+real triggers / dedup / scheduler.
+
+---
+
+## Validators
+
+`src/runtime/config.py` enforces:
+
+- `LLMConfig.default` must exist in `llm.models`
+- Every `llm.models[*].provider` must exist in `llm.providers`
+- Every `${VAR}` placeholder must resolve at config-load (strict)
+- Every `skill.model` must exist in `llm.models` (skill-level
+  validator, separate from `LLMConfig`)
+
+Errors raise typed exceptions (`LLMConfigError`, `ValueError`) at
+boot — the framework refuses to start with a misconfigured registry.
diff --git a/docs/06-data-model.md b/docs/06-data-model.md
new file mode 100644
index 0000000..9b96998
--- /dev/null
+++ b/docs/06-data-model.md
@@ -0,0 +1,292 @@
+# 06 — Data model
+
+## Storage backends in use
+
+| Concern | Backend | Default URL/path | Source |
+|---|---|---|---|
+| Session metadata | SQLAlchemy (SQLite default; Postgres optional via `asr[postgres]`) | `sqlite:////tmp/asr.db` | `src/runtime/storage/models.py`, `engine.py`, `session_store.py` |
+| Vector similarity | FAISS (filesystem) | `/tmp/asr-faiss/` | `src/runtime/storage/vector.py`, `embeddings.py` |
+| LangGraph checkpoints | `langgraph-checkpoint-sqlite` (default) or `langgraph-checkpoint-postgres` | Same SQLite DB as session metadata | `src/runtime/checkpointer.py` |
+| Per-step events | SQLAlchemy `session_events` table | Same SQLite DB | `src/runtime/storage/event_log.py` |
+| Lessons (auto-learning) | SQLAlchemy `session_lessons` table | Same SQLite DB | `src/runtime/storage/lesson_store.py` |
+| Dedup retractions | SQLAlchemy `dedup_retractions` table | Same SQLite DB | `storage/session_store.py:un_duplicate` |
+| Trigger idempotency keys | SQLAlchemy `trigger_idempotency_keys` table | Same SQLite DB | `src/runtime/triggers/idempotency.py` |
+| Memory layers (incident_management) | Filesystem JSON / YAML | `incidents/{kg,releases,playbooks}/` (or seed bundle) | `examples/incident_management/asr/*_store.py` |
+
+All SQLAlchemy concerns share the **same engine**
+(`storage.metadata.url`). One DB, one connection pool, four
+logical tables.
+
+---
+
+## Entities
+
+### `IncidentRow` — primary table
+
+Source: `src/runtime/storage/models.py`.
+
+```python
+class IncidentRow(Base):
+    __tablename__ = "incidents"
+    id: str                          # PK; format: "<PREFIX>-YYYYMMDD-NNN"
+    status: str                      # new | in_progress | resolved | escalated |
+                                     # needs_review | awaiting_input | error |
+                                     # stopped | duplicate
+    created_at: datetime
+    updated_at: datetime
+    deleted_at: datetime | None      # soft delete
+    query: str
+    environment: str
+    reporter_id: str                 # incident-shaped column; apps without
+    reporter_team: str               # the concept ignore (round-trip omits)
+    summary: str
+    severity: str | None             # incident-shaped column
+    category: str | None             # incident-shaped column
+    matched_prior_inc: str | None    # FK to another row; dedup linkage
+    resolution: str | None
+    tags: list[str]                  # JSON
+    agents_run: list[AgentRun]       # JSON; append-only audit
+    tool_calls: list[ToolCall]       # JSON; append-only audit
+    findings: dict[str, Any]         # JSON; per-agent finding bag
+    pending_intervention: dict | None # JSON; gate node payload when paused
+    user_inputs: list[str]           # JSON
+    input_tokens: int                # accumulated TokenUsage
+    output_tokens: int
+    total_tokens: int
+    parent_session_id: str | None    # dedup linkage to confirmed parent
+    dedup_rationale: str | None      # stage-2 LLM rationale text
+    extra_fields: dict[str, Any]     # JSON; per-app extension bag
+    version: int                     # optimistic concurrency token
+```
+
+**Why so many incident-shaped columns?** History — the framework was
+born incident-management-shaped. v1.1 (DEC-005) lifted the runtime
+out of the incident shape, but renaming the schema columns would
+have required a destructive migration. The columns are tolerated: an
+app whose `Session` subclass doesn't declare `severity` or `reporter`
+just leaves those columns NULL (round-trip silently omits them per
+`_row_to_incident`).
+
+The v1.5-B generic-noun pass (DEC-008) renamed local variables and
+docstrings but **left the SQLAlchemy columns alone** — they would
+require a migration. See `docs/DESIGN.md` § 8.2 for rationale.
+
+### `EventRow` — per-step telemetry
+
+Source: `src/runtime/storage/models.py`, `event_log.py`.
+
+```python
+class EventRow(Base):
+    __tablename__ = "session_events"
+    id: int                          # autoincrement
+    session_id: str                  # FK to incidents.id
+    kind: EventKind                  # tool_invoked | gate_fired |
+                                     # agent_started | agent_finished |
+                                     # confidence_emitted | route_decided |
+                                     # status_changed | lesson_extracted | ...
+    payload: dict                    # JSON; per-event shape
+    ts: datetime
+```
+
+Append-only. Every meaningful boundary in the runtime emits a row.
+
+### `LessonRow` — auto-learning corpus
+
+Source: `src/runtime/storage/models.py`, `lesson_store.py`.
+
+```python
+class LessonRow(Base):
+    __tablename__ = "session_lessons"
+    id: int
+    source_session_id: str           # FK to incidents.id
+    title: str
+    body: str                        # extracted narrative
+    embedding: list[float] | None    # JSON; for similarity lookup
+    metadata: dict                   # JSON
+    created_at: datetime
+    updated_at: datetime
+    deleted_at: datetime | None      # soft delete (intake's "still relevant?" gate)
+```
+
+Built by `LessonExtractor` at session finalize; refreshed nightly by
+`LessonRefresher` for sessions resolved manually after the fact.
+
+### `DedupRetractionRow` — operator un-duplicate audit
+
+Source: `src/runtime/storage/models.py`, `session_store.py:un_duplicate`.
+
+```python
+class DedupRetractionRow(Base):
+    __tablename__ = "dedup_retractions"
+    id: int
+    session_id: str
+    original_match_id: str
+    retracted_at: datetime
+    retracted_by: str | None
+    note: str | None
+```
+
+### `TriggerIdempotencyRow`
+
+Source: `src/runtime/triggers/idempotency.py`.
+
+```python
+class TriggerIdempotencyRow(Base):
+    __tablename__ = "trigger_idempotency_keys"
+    trigger_name: str                # PK part 1
+    key: str                         # PK part 2 (Idempotency-Key header)
+    session_id: str                  # session minted by the original request
+    created_at: datetime
+```
+
+Inference: rows expire opportunistically per `idempotency_ttl_hours`
+on each trigger config.
+
+---
+
+## Pydantic models (in-memory; round-trip via `extra_fields`)
+
+The `Session` base class (`src/runtime/state.py:70-117`) corresponds
+roughly to the typed columns on `IncidentRow`. Apps subclass to add
+domain fields:
+
+```python
+class IncidentState(Session):
+    query: str
+    environment: str
+    reporter: Reporter
+    summary: str
+    tags: list[str]
+    severity: str | None
+    category: str | None
+    matched_prior_inc: str | None
+    resolution: Any
+    memory: MemoryLayerState         # ASR memory bundle (read-only)
+
+class CodeReviewState(Session):
+    pr: PullRequest
+    review_findings: list[ReviewFinding]
+    overall_recommendation: Literal["approve", "request_changes", "comment"] | None
+    review_summary: str
+    review_token_budget: int
+```
+
+Round-trip pattern (`SessionStore._row_to_incident` /
+`_incident_to_row_dict`):
+
+- For each field declared on the state class:
+  - If `IncidentRow` has a typed column for it → write to that column
+  - Else → write to `extra_fields` JSON
+- On load, fields with typed columns hydrate from those columns;
+  everything else reads from `extra_fields[name]`.
+
+This keeps row schema migrations rare — apps freely add domain
+fields without touching the row schema.
+
+---
+
+## Relationships
+
+```
+incidents (PK: id)
+    │
+    ├──< session_events.session_id (one-to-many, append-only)
+    │
+    ├──< session_lessons.source_session_id (one-to-many, soft-deletable)
+    │
+    ├──< dedup_retractions.session_id (one-to-many)
+    │
+    ├──> incidents.parent_session_id (self-FK; dedup linkage)
+    │
+    └──> incidents.matched_prior_inc (self-FK; legacy linkage)
+
+trigger_idempotency_keys (PK: trigger_name + key)
+    │
+    └──> incidents.id (loose ref; not enforced FK)
+
+LangGraph checkpointer state
+    └─ keyed by `configurable.thread_id`
+       (= session_id by default; bumped to "<sid>:retry-N" on retry)
+```
+
+---
+
+## Migrations
+
+Source: `src/runtime/storage/migrations.py` (~210 lines).
+
+The framework runs **idempotent JSON-walk migrations** at orchestrator
+boot, not Alembic. Pre-existing rows get their new fields filled with
+defaults so the audit history reads consistently after a schema
+extension.
+
+Two named migrations exist (Inference: based on tests +
+`migrations.py` content):
+
+- `migrate_tool_calls_audit` — added when Phase 4 introduced the
+  risk-rated gateway audit fields (`risk`, `status`, `approver`,
+  `approved_at`, `approval_rationale`). Walks every `tool_calls`
+  JSON and fills missing audit fields with their pydantic defaults.
+- `migrate_extra_fields` (Inference) — for the v1.1 decoupling
+  (DEC-005) extension column.
+
+There is no Alembic / SQLAlchemy migration framework — schema
+changes are additive (new column, new table) and rely on
+`Base.metadata.create_all(engine)` at boot for new tables. **Risk:
+destructive schema changes (drop column, change type, rename)
+require a hand-rolled migration script.**
+
+---
+
+## Persistence assumptions
+
+- **Single writer per session** — enforced by `SessionLockRegistry`
+  (`src/runtime/locks.py`); `SessionBusy` raised on contention.
+- **Optimistic concurrency on save** — every `SessionStore.save`
+  bumps `version` and rejects stale-version writes with
+  `StaleVersionError`. Caller's contract is reload + retry.
+- **Append-only audit logs** — `agents_run`, `tool_calls`,
+  `session_events` are never updated in place (the gateway DOES
+  update individual `tool_calls[idx]` for status transitions, but
+  the rest of the row stays pristine).
+- **Soft delete** — `deleted_at` column on `IncidentRow` and
+  `LessonRow`. Hard delete is rare; the `delete_session` API is a
+  soft delete + vector-store removal.
+- **Dual write for pending intervention** — both LangGraph
+  checkpoint AND `IncidentRow.pending_intervention` are written
+  when a gate pauses, so dashboards reading the relational row
+  stay accurate.
+- **No cross-session transactions** — the framework doesn't model
+  workflows that span multiple sessions (the `parent_session_id`
+  link is the only inter-session reference, and it's a passive
+  pointer).
+- **Retry creates a new langgraph thread** — `Orchestrator.retry_session`
+  bumps the `active_thread_id` (e.g. `INC-…:retry-2`); the
+  original thread's checkpoint stays at the failed state so the
+  retry runs fresh.
+
+---
+
+## Vector index
+
+FAISS is the default (`vector.backend: faiss`); pgvector and "none"
+are also supported (`src/runtime/storage/vector.py`). Vectors are
+written through on every `SessionStore.save` so the index stays
+aligned with the row table.
+
+Index is keyed on `session_id`; each row carries a single embedding
+of `_embed_source` (the session's query text, falling back to
+`extra_fields["query"]`).
+
+---
+
+## Backup / restore
+
+Inference: not formally documented. Practical recovery:
+
+- **SQLite**: copy `/tmp/asr.db` (and `*-wal`, `*-shm` if mid-write).
+- **FAISS**: copy `/tmp/asr-faiss/` directory.
+- The two MUST be backed up together — a vector index pointing at
+  rows that no longer exist will surface "ghost" similar-incidents
+  matches. The reverse (rows without vectors) silently degrades
+  similarity to "no matches".
diff --git a/docs/07-integrations.md b/docs/07-integrations.md
new file mode 100644
index 0000000..2701758
--- /dev/null
+++ b/docs/07-integrations.md
@@ -0,0 +1,196 @@
+# 07 — Integrations
+
+External systems the framework talks to, plus their dev / local
+alternatives.
+
+---
+
+## LLM providers
+
+Source: `src/runtime/llm.py:get_llm`. Each provider kind maps to a
+LangChain chat-model class.
+
+| Provider kind | Production class | Auth | Local alternative |
+|---|---|---|---|
+| `ollama` | `langchain_ollama.ChatOllama` | `api_key` (Ollama Cloud) or none (local Ollama) | Run Ollama locally (`ollama serve`); set `base_url: http://localhost:11434` |
+| `azure_openai` | `langchain_openai.AzureChatOpenAI` | `api_key`, `endpoint`, `deployment` | None — Azure is cloud-only. Use `stub` for tests. |
+| `openai_compat` | `langchain_openai.ChatOpenAI` (with `base_url=`) | `api_key` | Any OpenAI-compatible endpoint (LM Studio, vLLM, OpenRouter, …) |
+| `stub` | `runtime.llm.StubChatModel` | none | Built-in canned-response chat model for tests / smoke |
+
+Switching providers: edit `llm.providers` + `llm.models` in
+`config/config.yaml`; per-skill override via `skill.model` in the
+skill's YAML.
+
+429 retry: free / shared upstream tiers (e.g. OpenRouter `…:free`)
+are protected by the rate-limit retry regime added in v1.5-D
+(`_RATE_LIMIT_MARKERS` in `src/runtime/graph.py`).
+
+Live verification: `tests/test_integration_driver_s1.py` parametrises
+three legs (`local`, `workhorse`, `azure`); each independently skips
+on missing keys. `tests/test_llm_providers_smoke.py` is the
+single-call smoke gated on `OLLAMA_LIVE=1`.
+
+---
+
+## MCP servers
+
+Source: `src/runtime/mcp_loader.py`,
+`src/runtime/config.py:MCPServerConfig`.
+
+Three transports:
+
+| Transport | Connection | Use case |
+|---|---|---|
+| `in_process` | Loads a Python module that exports a `mcp = FastMCP(...)` instance | Default for example apps; zero network cost |
+| `stdio` | Spawns a subprocess command, talks JSON-RPC over stdio | Wrapping a 3rd-party MCP CLI |
+| `http` | Talks JSON-RPC over HTTP | Remote MCP server (often with bearer auth via `headers`) |
+| `sse` | Server-sent events transport | Inference: present in `MCPServerConfig.transport` literal but not exercised in tests; status: scaffold |
+
+Configuration:
+
+```yaml
+mcp:
+  servers:
+    - name: local_inc
+      transport: in_process
+      module: examples.incident_management.mcp_server
+      category: incident_management
+    - name: ext_metrics
+      transport: http
+      url: ${EXTERNAL_MCP_URL}
+      headers:
+        Authorization: "Bearer ${EXT_TOKEN}"
+      category: observability
+```
+
+The example apps' MCP servers all use `in_process` — the bundle
+ships with the MCP code in the same process. Tests fixture sample at
+`tests/fixtures/sample_config.yaml` covers `http` + bearer auth.
+
+---
+
+## Auth providers
+
+The framework does not integrate with external auth providers
+(no SSO, OIDC, SAML, …). Air-gap deploys live behind corporate
+network controls.
+
+The only auth touched by the framework:
+
+- **MCP server bearer auth** — `headers.Authorization: "Bearer
+  ${EXT_TOKEN}"` per server config.
+- **Webhook trigger bearer auth** — `auth: bearer` +
+  `auth_token_env: <ENV_VAR>` per trigger config; constant-time
+  comparison via `hmac.compare_digest`.
+
+Both read tokens from env vars at process start; rotating a secret
+requires a process restart.
+
+---
+
+## Queues / messaging
+
+The framework has no built-in queue. The closest thing is the
+**trigger registry** (`src/runtime/triggers/`), which can fire a
+session start from:
+
+- HTTP POST (webhook)
+- APScheduler cron (in-process)
+- Custom plugin transport (entry-point or explicit registration)
+
+There is no SQS / Kafka / NATS / RabbitMQ integration shipped, but
+the `TriggerTransport` ABC and `plugin_transports` kwarg on
+`TriggerRegistry.create` exist for adding one. The
+`src/runtime/triggers/transports/plugin.py` file is a stub —
+Inference: scaffold for future SQS/Kafka work.
+
+---
+
+## Observability / external services (referenced by the
+incident_management example)
+
+Source: `examples/incident_management/mcp_servers/observability.py`,
+`mcp_servers/remediation.py`, `mcp_servers/user_context.py`.
+
+The example app's MCP servers expose **mock** versions of operational
+tools:
+
+| Tool | Purpose | Real backend (production) | Mock (this repo) |
+|---|---|---|---|
+| `get_logs(service, minutes)` | Recent logs | Datadog / Loki / Splunk | Returns canned WARN/ERROR/INFO lines |
+| `get_metrics(service, minutes)` | CPU/latency/error-rate samples | Prometheus / Datadog | Returns canned numeric envelope |
+| `get_service_health(env)` | Service-level health | Service registry / k8s health | Returns canned per-service health dict |
+| `check_deployment_history(hours, env)` | Recent deploys | ArgoCD / Spinnaker / Octopus | Returns canned recent-release list |
+| `notify_oncall(team, message)` | Page oncall | PagerDuty / Opsgenie | Returns synthesised page id |
+| `apply_fix(proposal_id, env)` | Run a remediation script | Ansible / Salt / custom | Returns deterministic success/failure |
+| `propose_fix(hypothesis, env)` | Generate a fix proposal | LLM-driven (this remains LLM-only in production) | Returns canned proposal_id |
+
+To wire real backends: replace the `_impl` body in the corresponding
+`mcp_servers/<name>.py` file with the real client call, keeping the
+function signature stable (the LLM-visible tool surface comes from
+the signature + docstring).
+
+---
+
+## Code review tools
+
+`examples/code_review/mcp_server.py` ships **mocked**:
+
+- `fetch_pr_diff(repo, number)` — reads from
+  `tests/fixtures/code_review/<repo>/<number>.json` if present;
+  otherwise returns a tiny synthetic diff.
+- `add_review_finding(...)` and `set_recommendation(...)` —
+  in-process state mutation only.
+
+There is no real GitHub or GitLab integration. To wire one up,
+replace `fetch_pr_diff` with a `gh` API call or PyGithub /
+python-gitlab client.
+
+---
+
+## Memory layers (incident_management example)
+
+Source: `examples/incident_management/asr/`.
+
+| Layer | Backing files | Lifecycle |
+|---|---|---|
+| L2 Knowledge Graph | `incidents/kg/{components,edges}.json` (or seed bundle at `examples/incident_management/asr/seeds/kg/`) | Read-only; populated by ops, consumed by intake |
+| L5 Release Context | `incidents/releases/recent.json` (or seed bundle) | Read-only; populated by deploy pipeline (out of scope), consumed by triage |
+| L7 Playbook Store | `incidents/playbooks/*.yaml` (or seed bundle) | Read-only; authored by SREs, consumed by resolution |
+
+Filesystem-backed by design — no Neo4j / Redis / pgvector dependency
+keeps the framework air-gap-friendly. When the configured layer
+directory is empty, each store falls back to the bundled seeds so a
+fresh checkout has working data.
+
+Mutation paths (write-back from agents, playbook authoring) are
+deferred — Inference: planned for a later milestone.
+
+---
+
+## CI / external services for development
+
+| Service | Purpose | Configuration |
+|---|---|---|
+| GitHub Actions | CI (lint / type-check / test / sonar / bundle freshness) | `.github/workflows/ci.yml` |
+| SonarCloud | Code quality + coverage gate | `sonar-project.properties`, `SONAR_TOKEN` repo secret |
+| CodeQL | Security analysis | Default GitHub setup; `.github/workflows/` (auto-generated) |
+| Socket Security | Dependency security scan | Auto-detected on PRs |
+| OpenRouter | Live LLM smoke (when keys present) | `OPENROUTER_API_KEY` repo secret (Inference: project owner controls) |
+
+CI does not call live LLM providers — the test suite is
+stub-mode-only. Live integration smokes (`tests/test_integration_driver_s1.py`,
+`tests/test_llm_providers_smoke.py`) are gated on env vars and skipped
+in CI.
+
+---
+
+## Where to override for local dev
+
+| Want to | Override |
+|---|---|
+| Use local Ollama instead of Ollama Cloud | `llm.providers.ollama.base_url: http://localhost:11434` |
+| Use SQLite in `/var/lib/asr/` instead of `/tmp` | `storage.metadata.url: sqlite:////var/lib/asr/asr.db`, `storage.vector.path: /var/lib/asr/faiss` |
+| Use Postgres instead of SQLite | `pip install asr[postgres]`; `storage.metadata.url: postgresql://…` |
+| Skip MCP entirely for an integration test | Use `LLMConfig.stub()` + an empty `MCPConfig` (see `tests/_envelope_helpers.py`) |
+| Test webhook trigger locally | Set `triggers:` in a local `config.yaml`; `curl -H 'Authorization: Bearer …' -X POST http://localhost:8000/triggers/<name>` |
diff --git a/docs/08-testing.md b/docs/08-testing.md
new file mode 100644
index 0000000..a978dc8
--- /dev/null
+++ b/docs/08-testing.md
@@ -0,0 +1,169 @@
+# 08 — Testing
+
+## Framework
+
+**pytest** with `pytest-asyncio` (asyncio_mode=auto), `pytest-cov`,
+`pytest-repeat` (for D-13 stability gate). Config in
+`pyproject.toml:53-58`.
+
+```
+[tool.pytest.ini_options]
+asyncio_mode = "auto"
+testpaths = ["tests"]
+addopts = "-v --cov=src/runtime --cov-report=term-missing --cov-report=xml"
+pythonpath = ["src", "."]
+```
+
+Coverage gate **fails below 85%** when run with
+`--cov-fail-under=85`. Current coverage: **87.04%** (post v1.5).
+
+## How to run
+
+```bash
+# Full suite, fail-fast
+uv run pytest -x
+
+# Without coverage (faster iteration)
+uv run pytest -x --no-cov
+
+# Single file
+uv run pytest tests/test_interrupt_detection.py -x -v
+
+# Single test
+uv run pytest tests/test_interrupt_detection.py::test_resume_forwards_verdict_to_inner_tool_and_completes -xvs
+
+# With coverage gate
+uv run pytest --cov=src/runtime --cov-fail-under=85 -x
+
+# Stability check (50 iterations of one test — D-13 local gate)
+uv run pytest tests/test_session_lock.py -x --count=50
+
+# Live integration smoke (gated on env vars)
+OLLAMA_API_KEY=... OLLAMA_BASE_URL=https://ollama.com \
+  uv run pytest tests/test_integration_driver_s1.py -v
+```
+
+CI runs the full suite + coverage XML + JUnit XML for SonarCloud.
+
+## Suite structure
+
+149 test files; ~1265 tests; ~140s for the full suite.
+
+### By topic
+
+| Topic | Sample files |
+|---|---|
+| Agent runner contract + live-LLM smoke | `test_agent_node*.py`, `test_real_llm_tool_loop_termination.py`, `test_integration_driver_s1.py`, `test_per_agent_model_dispatch.py` |
+| HITL approve/reject + gateway | `test_interrupt_detection.py`, `test_gateway_persist_resolution.py`, `test_orchestrator_pause_detection.py`, `test_approval_*.py`, `test_gateway_*.py`, `test_interrupt_status_handling.py` |
+| Markdown turn-output parser | `test_markdown_turn_output.py` (36 tests) |
+| Retry behaviour | `test_ainvoke_retry_429.py` (5 tests) |
+| Storage layer | `test_session_store.py`, `test_incident_store.py`, `test_history_store.py`, `test_dedup_*.py`, `test_event_log.py` |
+| FastAPI surface + locks | `test_api*.py`, `test_approval_api.py`, `test_session_lock.py`, `test_retry_concurrency.py` |
+| Triggers | `test_triggers/test_*.py` (transport per file) |
+| Bundler + bundle | `test_build_*.py`, `test_bundle_*.py` |
+| Genericity ratchets | `test_genericity_ratchet.py`, `test_concept_leak_ratchet.py` |
+| Skill loader | `test_skill*.py` |
+| Telemetry + auto-learning | `test_telemetry_integration.py`, `test_lesson_*.py` |
+| UI helpers | `test_ui_*.py`, `test_render_*.py` |
+| Memory layers (incident_management) | `test_asr_*.py`, `test_kg_store.py`, `test_release_store.py`, `test_playbook_store.py` |
+| Per-app tests | `test_code_review_*.py`, `test_two_apps_coexist.py`, `test_session_id_format.py`, `test_generic_round_trip.py` |
+
+### Helpers + fixtures
+
+| File | Purpose |
+|---|---|
+| `tests/_envelope_helpers.py` | `EnvelopeStubChatModel` — pydantic stub LLM that emits the markdown contract, used across HITL + agent tests |
+| `tests/_policy_helpers.py` | Helpers for building synthetic gate decisions |
+| `tests/fixtures/sample_config.yaml` | Reference config for config-loader tests |
+| `tests/fixtures/code_review/<repo>/<number>.json` | Mock PR diffs for the code-review example app |
+
+Conftest is implicit (no `tests/conftest.py` discovered;
+fixtures defined per-file).
+
+## What's covered well
+
+- **Markdown envelope parser** — 36 tests covering 6 paths,
+  Unicode dash variants, gpt-oss empty-closing pattern, terminal-tool
+  args synthesis, permissive synthesis fallback.
+- **HITL pause/resume on langgraph 1.x** — `test_interrupt_detection.py`
+  proves the GraphInterrupt re-raise + Command(resume) forwarding;
+  `test_gateway_persist_resolution.py` (10 tests) proves the DB row
+  reflects the verdict for both sync + async paths.
+- **Retry regimes** — `test_ainvoke_retry_429.py` pins both backoff
+  windows (5xx and 429) plus fast-fail on non-transient errors.
+- **Per-agent LLM dispatch** — `test_per_agent_model_dispatch.py`
+  proves `_build_agent_nodes` calls `get_llm` with `model_name=skill.model`.
+- **Storage round-trip** — `test_generic_round_trip.py` proves
+  `extra_fields` JSON survives full save/load cycles for arbitrary
+  `Session` subclasses.
+- **Optimistic concurrency** — `test_session_lock.py` (over 1000
+  lines) covers the D-01 / D-20 contracts: per-session lock holds
+  across HITL pause; resume re-acquires cleanly; concurrent retry
+  is rejected.
+- **API surface** — `test_api_react_surface.py` covers `/sessions/*`
+  + SSE + WebSocket + structured error envelope.
+- **Two apps coexist** — `test_two_apps_coexist.py` proves an
+  incident session and a code-review session can share the same
+  metadata DB without collisions (per `Session.id_format`).
+
+## What's covered weakly or not at all
+
+| Gap | Why it matters | Where to start |
+|---|---|---|
+| `src/runtime/ui.py` (~1700 lines, 0% coverage) | Streamlit shell — exercised by manual smoke. Phase 20 (HARD-09) scaffolded `tests/test_ui_*.py` but UI parity coverage is a milestone. | `tests/test_ui_*.py` exists; extend with `streamlit.testing.v1.AppTest` |
+| `src/runtime/__main__.py` | argparse-only CLI; covered by smoke only | Inference: low risk |
+| `src/runtime/checkpointer_postgres.py` | Postgres saver; CI is sqlite-only | Run a postgres container in CI for a one-test postgres smoke |
+| `src/runtime/triggers/transports/plugin.py` | Stub for future transports | n/a |
+| `ApprovalWatchdog` × `gateway` saves on transition | I added gateway saves on transitions in PR #6; the watchdog should observe a faster cleanup signal but no focused test verifies that. ~15 min. | New test asserting the watchdog resolves a row faster after a gateway save |
+| Live integration with all 3 providers green simultaneously | OpenRouter is out of credits and Azure has placeholder endpoint in this dev `.env` | Operator-side issue, not framework |
+| `test_silent_failure_sweep.py` | Should assert no `except Exception: pass` survives | Inference: name based on Phase 18 / HARD-04; verify the test exists and passes |
+
+## Risky areas needing more tests
+
+1. **Multi-agent live runs against real providers** — only the
+   single-agent S1 driver is live-gated. Multi-agent E2E (intake →
+   triage → DI → resolution) only runs in stub mode. A live multi-
+   agent driver would catch provider-quirk regressions earlier.
+2. **`HistoryStore` filter dimensions** — apps build their own
+   `filter_resolver`; the framework only tests the incident-shaped
+   one. A code-review-shaped filter test would prove the seam holds.
+3. **`OrchestratorService.stop_session` mid-pause** — what happens
+   if the operator cancels a session that's currently `pending_approval`?
+   `test_session_lock.py` covers locks; explicit cancellation
+   semantics during HITL deserve a focused test.
+4. **`migrations.py` rollback** — the migrations are forward-only
+   and idempotent. A backward-compat regression test (run the new
+   code against an old-shape DB) exists for `migrate_tool_calls_audit`;
+   adding similar tests for future migrations would lock the
+   contract.
+5. **Trigger registry under concurrency** — `test_triggers/`
+   covers each transport in isolation; a fan-in test (50 webhooks
+   firing concurrently) would catch idempotency-key races.
+
+## CI gates
+
+`.github/workflows/ci.yml`:
+
+| Gate | Tool | Failure behavior |
+|---|---|---|
+| Lockfile freshness (HARD-02) | `uv lock --check` | Fails if `pyproject.toml` drift from `uv.lock` |
+| Bundle staleness (HARD-08) | `python scripts/build_single_file.py && git diff --exit-code dist/` | Fails if `dist/` would change |
+| Lint | `ruff check src/ tests/` | Fails on any rule violation |
+| Type check (HARD-03) | `pyright src/runtime` | Fail-on-error since Phase 19 |
+| Test + coverage | `pytest --cov=src/runtime --cov-report=xml --junitxml=junit.xml` | Default fail on test failure; coverage gate via SonarCloud |
+| Skill-prompt-vs-schema lint (SKILL-LINTER-01) | `python scripts/lint_skill_prompts.py` | Fails if any skill prompt references a tool name / arg field that doesn't exist |
+| SonarCloud scan | `SonarSource/sonarqube-scan-action@v8.0.0` | Quality gate (coverage / hotspots / duplications) reported back to the PR |
+
+## How to add a test
+
+1. Pick the file matching the topic (or create a new one if cross-cutting).
+2. If async, no decorator needed (`asyncio_mode=auto`).
+3. If you need a stub LLM, use `EnvelopeStubChatModel` from
+   `tests/_envelope_helpers.py` — it emits the markdown contract
+   automatically.
+4. If you need a `Session` instance with a particular state class,
+   use `runtime.storage.session_store.SessionStore.create(...)`
+   over a tmp_path engine (see `_make_repo` patterns in existing
+   tests).
+5. Run the new test with `-xvs` to iterate; then `-x` for the full
+   suite to catch regressions.
diff --git a/docs/09-build-deploy-release.md b/docs/09-build-deploy-release.md
new file mode 100644
index 0000000..5c060e5
--- /dev/null
+++ b/docs/09-build-deploy-release.md
@@ -0,0 +1,227 @@
+# 09 — Build / deploy / release
+
+## Build commands
+
+| Step | Command | Source |
+|---|---|---|
+| Install dependencies (frozen, hash-verified) | `uv sync --frozen --extra dev` | `uv.lock`, `pyproject.toml:42-50` |
+| Regenerate single-file bundle | `uv run python scripts/build_single_file.py` | `scripts/build_single_file.py` |
+| Lint | `uv run ruff check src/ tests/` | |
+| Type-check | `uv run pyright src/runtime` | `pyrightconfig.json` |
+| Test + coverage | `uv run pytest --cov=src/runtime --cov-fail-under=85` | `pyproject.toml:53-58` |
+| Skill-prompt linter | `uv run python scripts/lint_skill_prompts.py` | |
+| Concept-leak ratchet | `uv run python scripts/check_genericity.py --baseline 39` | |
+| Lockfile freshness | `uv lock --check` | |
+
+The "build" of this project is **not a wheel** — wheels exist
+(`pyproject.toml:[tool.hatch.build.targets.wheel]` declares
+`packages = ["src/runtime", "examples"]`) but the deployed artifact
+is the **single-file bundle** under `dist/`. Wheels are useful for
+local `pip install -e .` development; the deployed shape is
+copy-only.
+
+## Packaging — the bundler
+
+Source: `scripts/build_single_file.py`. Runs in three steps:
+
+1. Read `RUNTIME_MODULE_ORDER` (a list of `(root, relpath)` tuples
+   topologically ordered so each module's body sees its
+   dependencies' symbols already in scope).
+2. For each module: read source, strip intra-bundle imports
+   (the bundle is one big namespace — `from runtime.config import X`
+   becomes a no-op when `X` is already defined above).
+3. Concatenate + emit four bundles:
+
+| Output | Contents |
+|---|---|
+| `dist/app.py` (~660KB) | Framework only. Used to demonstrate the runtime stands on its own. |
+| `dist/apps/incident-management.py` (~707KB) | Framework + `incident_management` example. The deployment ship target for the incident app. |
+| `dist/apps/code-review.py` (~670KB) | Framework + `code_review` example. The second example, demonstrating genericity. |
+| `dist/ui.py` (~68KB) | Streamlit shell. Sits next to whichever `app.py` you deployed and `from app import …` reaches into the deploy bundle's flattened namespace. |
+
+The bundler also runs an `ast.parse` smoke on each output so a
+broken bundle fails the script (rather than failing at deploy).
+
+## CI/CD
+
+Source: `.github/workflows/ci.yml`.
+
+Single workflow `quality:` runs on every push to `main` and on every
+PR. Steps:
+
+```
+checkout (fetch-depth: 0 for SonarCloud blame)
+  ↓
+setup-python @ 3.11
+  ↓
+setup-uv @ 0.11.7
+  ↓
+Lockfile freshness gate (uv lock --check)            # HARD-02
+  ↓
+Install deps (uv sync --frozen --extra dev)
+  ↓
+Bundle staleness gate (build + git diff --exit-code dist/)  # HARD-08
+  ↓
+Lint (ruff check src/ tests/)
+  ↓
+Type check (pyright src/runtime)                     # HARD-03 fail-on-error
+  ↓
+Test with coverage (pytest --cov= --cov-report=xml --junitxml=junit.xml)
+  ↓
+Skill-prompt-vs-schema lint (lint_skill_prompts.py)  # SKILL-LINTER-01
+  ↓
+SonarCloud Scan
+```
+
+Total CI time: ~2-3 minutes (most spent in test suite).
+
+CI environment variables (dummy values for the
+`_interpolate` strict check; tests don't call live providers):
+- `OLLAMA_API_KEY=""`
+- `OPENROUTER_API_KEY=""`
+- `AZURE_OPENAI_KEY=""`
+- `AZURE_DEPLOYMENT=""`
+- `AZURE_ENDPOINT=https://ci-dummy.example/`
+- `EXTERNAL_MCP_URL=https://ci-dummy.example/`
+- `EXT_TOKEN=ci-dummy`
+
+## Quality gates
+
+Beyond CI's pass/fail, these soft gates guide PR review:
+
+| Gate | Source | Threshold |
+|---|---|---|
+| Coverage | SonarCloud `new_coverage` | ≥ 80% on new code |
+| Duplications | SonarCloud `new_duplicated_lines_density` | < 3% (with `sonar.cpd.exclusions` for intentional sync/async + responsive/graph mirrors) |
+| Reliability | SonarCloud `new_reliability_rating` | A (=1) |
+| Security | SonarCloud `new_security_rating` | A (=1) |
+| Maintainability | SonarCloud `new_maintainability_rating` | A (=1) |
+| Hotspots reviewed | SonarCloud `new_security_hotspots_reviewed` | 100% |
+| Concept-leak ratchet | `tests/test_genericity_ratchet.py` | ≤ `BASELINE_TOTAL` (currently 39) |
+| Bundle freshness | `tests/test_bundle_completeness.py` + CI gate | exit-code clean |
+| Type errors | `pyright` fail-on-error | zero new errors |
+| Lockfile drift | `uv lock --check` | clean |
+| Skill prompts | `scripts/lint_skill_prompts.py` | binary pass |
+
+## Containerisation
+
+There is **no Dockerfile** in the repo (verified via
+`find . -name Dockerfile`). Inference: the deploy target is bare-VM
+or systemd, not container. A container deploy would need a
+hand-rolled `Dockerfile`:
+
+```dockerfile
+FROM python:3.11-slim
+WORKDIR /app
+COPY dist/apps/incident-management.py app.py
+COPY dist/ui.py ui.py
+COPY config/ config/
+ENV PYTHONUNBUFFERED=1
+CMD ["python", "app.py", "--config", "config/incident_management.yaml"]
+```
+
+(Inference: above is illustrative; not tested in this repo.)
+
+## Deployment model — air-gap copy
+
+Source: `docs/AIRGAP_INSTALL.md`,
+`docs/DEVELOPMENT.md`, `docs/DESIGN.md` § 10.
+
+**The deploy target has NO public-internet access** at runtime. Two
+phases:
+
+### Phase A — install dependencies (one-time, on the dev/CI box or behind an internal mirror)
+
+```bash
+export UV_INDEX_URL="https://<internal-mirror>/simple/"
+uv sync --frozen --extra dev          # populates ~/.cache/uv from the mirror
+# or fully offline if the cache is pre-warmed:
+uv sync --frozen --offline --extra dev
+```
+
+### Phase B — copy the 7-file payload onto the target host
+
+```
+app.py                    (renamed from dist/apps/<app>.py)
+ui.py                     (dist/ui.py)
+config/config.yaml        (framework: LLM, MCP, storage)
+config/<app>.yaml         (app: severity aliases, escalation roster, …)
+config/skills/            (optional skill prompt overrides)
+.env                      (provider keys; secrets manager preferred)
+```
+
+### Phase C — boot
+
+```bash
+python -m runtime --config config/<app>.yaml &
+streamlit run ui.py --server.port 37777 &
+```
+
+Or systemd units; or k8s `Pod`s. The framework doesn't care.
+
+## Release flow
+
+Source: git history + `docs/DESIGN.md` § 13.
+
+The release pattern in this repo is **squash merge into `main`** via
+GitHub PRs. Each milestone is a sequence of small PRs:
+
+```
+PR opened → CI runs (lint / type / test / sonar / bundle / skill-lint)
+         → all green → squash merge with verbose subject
+         → branch deleted
+         → main moves to the squash SHA
+```
+
+There is **no separate release branch**, no semver tags, and no
+release notes infrastructure. The "release" is `main` itself.
+
+The milestone history (v1.0 → v1.5) is recorded in
+`docs/DESIGN.md` § 13. New work goes on a feature branch (`feat/…`,
+`fix/…`, `refactor/…`, `docs/…`); merge via PR.
+
+## Rollback
+
+Inference: not formally documented. Practical:
+
+- **Code rollback** — `git revert <squash-sha>` and merge a revert
+  PR. CI will re-run.
+- **Bundle rollback** — copy the previous bundle from a known-good
+  `main` commit; the deploy is copy-only so rolling back is just
+  copying older files.
+- **Schema rollback** — there's no Alembic. New columns / tables
+  added via `Base.metadata.create_all` are forward-only;
+  rolling back code that introduced a new column doesn't delete
+  the column from the DB (harmless — old code ignores it). New
+  rows in new tables are abandoned (also harmless).
+- **Stuck session rollback** — operator can `DELETE /sessions/{sid}`
+  (soft delete) or set `status='stopped'` via `stop_session(sid)`.
+
+## Versioning
+
+`pyproject.toml:8` declares `version = "0.1.0"`. The version has
+not been bumped despite v1.0 → v1.5 of the **product** milestones —
+Inference: the package version is independent of the milestone
+labelling. There are no git tags pinning the milestones; the
+squash SHAs in `docs/DESIGN.md` § 13 are the canonical reference.
+
+## Operational concerns
+
+- **Process lifecycle** — `OrchestratorService` runs a single
+  asyncio loop on a background thread. SIGTERM cancels in-flight
+  session tasks; the lifespan shutdown hook closes the FastMCP +
+  SQLAlchemy + checkpointer transports.
+- **Session capacity** — `runtime.max_concurrent_sessions: 8`
+  (default); raises `SessionBusy → HTTP 429` on overflow.
+- **Long-running approval** — `framework.approval_timeout` (default
+  Inference: 1800 seconds) drives `ApprovalWatchdog`; sessions with
+  pending approvals beyond that age get auto-resolved with
+  `verdict=timeout`.
+- **DB growth** — `EventLog` and `LessonStore` are append-only.
+  No automatic pruning. Operators should periodically GC closed
+  sessions via `delete_session(sid)` (soft delete) or run a
+  manual VACUUM on SQLite. Inference: not documented; needs a
+  runbook.
+- **FAISS index growth** — vectors are written through on every
+  save and removed on `delete_session`. The index size scales
+  linearly with active sessions.
diff --git a/docs/10-known-risks-and-todos.md b/docs/10-known-risks-and-todos.md
new file mode 100644
index 0000000..4c5c25d
--- /dev/null
+++ b/docs/10-known-risks-and-todos.md
@@ -0,0 +1,146 @@
+# 10 — Known risks and TODOs
+
+## Source-code TODO/FIXME/HACK markers
+
+Verified via `grep -rnE "TODO|FIXME|XXX|HACK|DEPRECATED" src/ examples/`
+on this branch (excluding `__pycache__` and the
+`deprecated_kwargs` legitimate name).
+
+| File | Marker | What |
+|---|---|---|
+| `src/runtime/locks.py:49` | `TODO(v2)` | Evict idle slots in `SessionLockRegistry` to cap memory in long-running servers |
+| `src/runtime/locks.py:53` | `TODO(v2)` | Same — placement note on `_slots: dict[str, _Slot]` |
+
+That's it. The codebase is otherwise free of TODO/FIXME debt — a
+deliberate result of Phase 18 (HARD-04 silent-failure sweep) and the
+overall "fix root cause, not workaround" project rule.
+
+## Hardcoded values worth flagging
+
+| Where | Value | Risk |
+|---|---|---|
+| `src/runtime/config.py` (default `MetadataConfig.url`) | `"sqlite:///incidents/incidents.db"` (relative path) | Default points at a relative path; CWD-dependent. The framework's actual default `config/config.yaml` overrides to `sqlite:////tmp/asr.db` (absolute). Operators who skip `config.yaml` get the relative-path default. |
+| `src/runtime/config.py` (`storage.vector.path`) | `"incidents/faiss"` (relative) | Same as above |
+| `src/runtime/llm.py` Phase 13 default request_timeout | `120.0` seconds | A 2-minute timeout is generous for LLM calls; some providers can hang longer on long-context responses. Per-provider override available |
+| `runtime.locks.SessionLockRegistry` | unbounded dict | See `TODO(v2)` above |
+| Bundle file sizes | ~660-700KB each | Large for code review. Inference: the flatten + intra-import-strip pattern is the only viable single-file deploy path. |
+| `_RATE_LIMIT_MARKERS` in `src/runtime/graph.py` | string-match heuristic | If a provider invents a new 429 phrasing, retries fall back to fast-fail. Markers list comments out the variants observed in the wild. |
+
+## Weak / incomplete features
+
+### v1.5-D Azure leg of the integration driver
+The `azure` parametrize arm in
+`tests/test_integration_driver_s1.py` is wired but the dev
+`.env` carries placeholder values for `AZURE_ENDPOINT`. Live
+verification requires a real Azure deployment; framework code path
+(`AzureChatOpenAI` construction) is intact.
+
+### Duplicate ToolCall audit rows
+The HITL fix in PR #6 left a known cosmetic duplication: when the
+gateway records a high-risk tool, it stores the row under the
+FastMCP composite name (`local_remediation:apply_fix`, colon form),
+while the harvester later records the same tool call under the
+LLM-visible name (`local_remediation__apply_fix`, double-underscore
+form). Two rows for one logical event. Cosmetic in the UI; matters
+if any consumer aggregates tool counts. Fix: align both on the `__`
+form (~30 min). Out of scope for v1.5; deferred.
+
+### `ApprovalWatchdog` regression test
+PR #6 added gateway saves on resolution transitions. The watchdog
+should observe a faster cleanup signal but no focused test verifies
+that. Add a 1-test regression. ~15 min.
+
+### `ASR_LOG_LEVEL` env var documentation
+Added in PR #6, mentioned in `docs/01-local-setup.md` and
+`docs/05-configuration.md` of this brownfield set, but not in the
+main `README.md` or `docs/DEVELOPMENT.md`. One-line note worth
+adding for operator visibility.
+
+### Streamlit UI test coverage
+`src/runtime/ui.py` is ~1700 lines, 0% coverage. Phase 20 (HARD-09)
+scaffolded `tests/test_ui_*.py` with a few smoke tests but reaching
+parity with backend coverage requires a dedicated UI-testing
+milestone. Excluded from the coverage gate via
+`pyproject.toml:[tool.coverage.run].omit`.
+
+### Trigger registry plugin transport
+`src/runtime/triggers/transports/plugin.py` is a stub —
+Inference: scaffold for future SQS / Kafka / NATS work. The
+`TriggerTransport` ABC + `plugin_transports` kwarg on
+`TriggerRegistry.create` are usable today by external code, but no
+in-repo transport beyond api / webhook / schedule.
+
+### Postgres checkpointer
+Optional via `pip install asr[postgres]`. CI is sqlite-only; the
+postgres saver code (`src/runtime/checkpointer_postgres.py`) is
+excluded from coverage. Production postgres deploys exist but
+aren't exercised in the test suite. Risk: a postgres-specific bug
+ships unnoticed.
+
+### ASR memory layer write-back
+The L2 / L5 / L7 stores in `examples/incident_management/asr/`
+are read-only. Mutation paths (write-back from agents, playbook
+authoring) are deferred. Inference: planned for a future
+milestone; no roadmap entry confirms this.
+
+### Dedup pipeline LLM error handling
+`Orchestrator._run_dedup_check` catches all `Exception` from the
+stage-2 LLM and degrades to "not a duplicate". Defensive but
+silently masks a misconfigured stage-2 model. Inference: a typed
+error path with logging would make ops triage faster.
+
+## Security-sensitive areas
+
+| Area | What to audit |
+|---|---|
+| `src/runtime/config.py:_interpolate` | Strict mode requires every `${VAR}` to exist; misses VAR-injection if `os.environ` itself is compromised. Standard env-var posture. |
+| `src/runtime/triggers/auth.py` | Bearer token is read from env var at process start; rotation requires restart. `hmac.compare_digest` used. No HMAC-signature transport (PagerDuty / Slack) yet — `auth: bearer` only. |
+| `src/runtime/tools/gateway.py` (HITL gate) | The risk policy is config-driven (`runtime.gateway.policy`) — operators MUST configure `apply_fix`-class tools as `high` for production environments to enforce HITL. The framework defaults to `auto` for unlisted tools. |
+| `src/runtime/tools/gateway.py:_record_pending_resolution` | Verdict dict from operator → `Command(resume=verdict)` → tool args. Trust boundary: the operator is trusted; a malicious approver could pass arbitrary `rationale` text but cannot inject tool args (the gateway re-injects from session-derived state). |
+| `src/runtime/dedup.py` (LLM stage 2) | Operator-supplied `query` text is interpolated into the LLM prompt. Standard prompt-injection surface — the LLM verdict can be steered by adversarial query content. Currently used only for soft routing (`status='duplicate'`); a misclassification doesn't escalate privileges. |
+| `src/runtime/api.py` | NO authentication on `/sessions/*` endpoints. Air-gap deploys live behind corporate network controls. Webhook triggers have bearer auth via the trigger registry. |
+| `src/runtime/intake.py` (similarity retrieval) | `query` text is embedded and matched against historical sessions. Low risk — the retrieved lessons are framing context, not authoritative. |
+| Vector store (FAISS) | Local files. No encryption at rest; relies on filesystem permissions. Ops should chmod `/tmp/asr-faiss/` appropriately. |
+
+## Migration risks
+
+| Migration | Risk |
+|---|---|
+| Schema additive (new column, new table) | Low — `Base.metadata.create_all` at boot handles new tables; new columns get hand-rolled idempotent JSON-walk migrations under `migrations.py`. |
+| Schema destructive (drop column, rename, change type) | High — there is no Alembic. A destructive change requires a one-shot script + a documented downtime window. None planned. |
+| `extra_fields` JSON field reshape | Medium — apps store domain fields here. Renaming a field on the app's `Session` subclass without a `SessionStore` migration breaks load. Mitigation: app authors own their migrations. |
+| FAISS index format change | Low — re-indexing is idempotent (delete the index file; the next save rebuilds). |
+| Bundle format change | Low — `dist/*` is regenerated from source on every PR (HARD-08 gate). Bundle drift is mechanical. |
+| `langgraph` major version bump | High — PR #6 caught a breaking semantic change in `interrupt()` between langgraph 0.x and 1.x. Future major bumps (2.x?) need similar smoke tests; the `_drive_agent_with_resume` helper is the most exposed surface. |
+| `langchain` major version bump | High — `langchain.agents.create_agent` is the agent factory. A signature change there cascades through `make_agent_node`. |
+| Provider model deprecation (e.g. OpenRouter free-tier model removed) | Low — config swap; no code change. The 429 retry helps with transient throttles, not deprecations. |
+
+## Concurrency / race risks
+
+| Risk | Mitigation |
+|---|---|
+| Concurrent session writes (UI + API approval simultaneously) | `SessionLockRegistry` enforces single writer per session; second writer gets `SessionBusy → HTTP 429`. |
+| Concurrent retry on a session in `error` | `_retries_in_flight` set in `Orchestrator` rejects second retry. |
+| Approval race with `ApprovalWatchdog` timeout | `StaleVersionError` → both reload, one wins. Watchdog re-checks before resolving. |
+| LangGraph thread_id collision on retry | `retry_session` bumps `active_thread_id` to `<sid>:retry-N`; original thread stays at terminated checkpoint. |
+| Stale state on HITL resume | PR #6 fix: `make_agent_node` reloads from store at entry. Past pain point — see `docs/DESIGN.md` DEC-010. |
+
+## Operational risks
+
+| Risk | Mitigation |
+|---|---|
+| `/tmp` filling up (SQLite + FAISS in `/tmp` per default config) | Operators should override `storage.metadata.url` and `storage.vector.path` in production to a persistent path. |
+| Long-running orchestrator memory growth | `SessionLockRegistry` `TODO(v2)` — slots accumulate; add eviction. |
+| Provider key rotation requires restart | Env vars read at process start. No SIGHUP reload. |
+| Single-process limit | One `OrchestratorService` per host; `runtime.max_concurrent_sessions: 8` cap. Multi-host deploys need a separate orchestrator per host (and a separate metadata DB OR strict per-session locking via a shared lock service — not implemented). |
+| Bundle drift on hand-edited `dist/` | CI catches via "Bundle staleness gate (HARD-08)". |
+| Lockfile drift after `pip install` instead of `uv sync` | Operators MUST use `uv sync --frozen`; CI catches via `uv lock --check`. |
+
+## Documentation drift risks
+
+| Risk | Mitigation |
+|---|---|
+| Docs reference outdated test counts / coverage / ratchet baseline | `docs/00-project-overview.md` snapshots current values; refresh on milestone landings. |
+| `.planning/` (gitignored) used as canonical state | Don't — the canonical state is `docs/DESIGN.md` § 13 and the git history. |
+| `.env` placeholder vs real values mismatch | Operators must populate per-deploy; CI uses dummy values. |
+| Skill prompts reference removed tool args | `scripts/lint_skill_prompts.py` (Phase 21 / SKILL-LINTER-01) catches as a CI gate. |
diff --git a/docs/11-agent-handoff.md b/docs/11-agent-handoff.md
new file mode 100644
index 0000000..54303b2
--- /dev/null
+++ b/docs/11-agent-handoff.md
@@ -0,0 +1,230 @@
+# 11 — Agent handoff
+
+> Designed for AI coding agents picking this project up cold. If
+> you're a human, this works for you too.
+
+## Project summary in 20 lines
+
+ASR is a generic Python multi-agent runtime framework. It wraps
+**LangGraph** (orchestration / checkpointing) and **LangChain**
+(`langchain.agents.create_agent` for the per-agent loop;
+`Chat{OpenAI,Ollama}` and `AzureChatOpenAI` for provider abstraction).
+Tools come from **FastMCP** servers (in-process / stdio / http).
+A risk-rated **HITL gateway** wraps every tool — high-risk calls
+raise `langgraph.types.interrupt(payload)` to pause the graph for
+operator approval; resume via `Command(resume=verdict)`. Agent
+output uses a **markdown contract block** (`## Response / ##
+Confidence / ## Signal`) parsed by a 6-path lenient parser with
+synthesis fallbacks for misbehaving models.
+
+Two reference apps live in `examples/`: `incident_management` (4-skill
+SRE investigation pipeline with ASR memory layers) and `code_review`
+(3-skill PR review pipeline; mocked tools). Apps subclass `Session`
+to add domain fields; the framework stays generic — a CI ratchet
+(`tests/test_genericity_ratchet.py`) keeps it that way.
+
+The deploy target is air-gapped corporate environments. The deploy
+artifact is a single-file bundle under `dist/` (not a wheel) plus a
+handful of YAML configs and `.env`. The bundler script
+(`scripts/build_single_file.py`) flattens `src/runtime` + an example
+app into one `.py` file; CI's "Bundle staleness gate" rebuilds on
+every PR and refuses the merge if `dist/` would change.
+
+`main` is at v1.5; 1265 tests passing; 87% coverage; ruff clean;
+SonarCloud green; concept-leak ratchet at 39. v2.0 (React UI
+replacing the Streamlit prototype) is the next big move.
+
+## Top 20 files to read first
+
+In order — each builds on the previous.
+
+1. **`README.md`** — repo intro + quick start
+2. **`docs/DESIGN.md`** — long-form architecture + decision log
+   (12 numbered DEC-NNN entries) + milestone history (v1.0 → v1.5)
+3. **`docs/02-architecture.md`** — quick-scan summary of the layers
+4. **`pyproject.toml`** — deps, pytest/ruff/pyright/coverage config
+5. **`config/config.yaml.example`** — annotated config template
+6. **`src/runtime/state.py`** — `Session`, `AgentRun`, `ToolCall`,
+   `TokenUsage` pydantic models
+7. **`src/runtime/skill.py`** — `Skill` (YAML-driven agent declaration)
+8. **`src/runtime/orchestrator.py`** — `Orchestrator` class + lifecycle
+   methods (`start_session`, `stream_session`, `resume_session`,
+   `_finalize_session_status_async`, `_is_graph_paused`)
+9. **`src/runtime/service.py`** — `OrchestratorService` long-lived
+   loop wrapper + thread-safe bridge
+10. **`src/runtime/graph.py`** — `build_graph`, `make_agent_node`,
+    `_drive_agent_with_resume`, `_ainvoke_with_retry`,
+    `parse_envelope_from_result` callers
+11. **`src/runtime/agents/turn_output.py`** — markdown envelope
+    parser, 6-path fallback chain
+12. **`src/runtime/tools/gateway.py`** — `wrap_tool` (~830 LOC) —
+    risk-rated tool wrapper with HITL pause/resume
+13. **`src/runtime/llm.py`** — `get_llm` provider abstraction
+14. **`src/runtime/storage/session_store.py`** — CRUD + FAISS
+    write-through + optimistic-version save
+15. **`src/runtime/api.py`** — FastAPI `/sessions/*` REST + SSE +
+    WebSocket + approvals
+16. **`examples/incident_management/state.py`** — example
+    `IncidentState(Session)` subclass
+17. **`examples/incident_management/mcp_server.py`** — example MCP
+    server pattern
+18. **`tests/test_interrupt_detection.py`** — proves the HITL fix
+    end-to-end (read this for the resume contract)
+19. **`scripts/build_single_file.py`** — the bundler (the deploy
+    pipeline)
+20. **`.github/workflows/ci.yml`** — CI gates (lint / type / test /
+    sonar / bundle / skill-lint)
+
+## Commands future agents SHOULD use
+
+| Goal | Command |
+|---|---|
+| Install / sync deps | `uv sync --frozen --extra dev` |
+| Run full test suite | `uv run pytest -x` |
+| Run single test fast | `uv run pytest tests/<file>.py::<test_name> -xvs --no-cov` |
+| Lint | `uv run ruff check src/ tests/` |
+| Type check | `uv run pyright src/runtime` |
+| Coverage gate | `uv run pytest --cov=src/runtime --cov-fail-under=85 -x` |
+| Regenerate single-file bundle | `uv run python scripts/build_single_file.py` |
+| Concept-leak ratchet check | `python scripts/check_genericity.py` |
+| Skill-prompt linter | `uv run python scripts/lint_skill_prompts.py` |
+| Lockfile freshness | `uv lock --check` |
+| Boot CLI | `uv run python -m runtime --config config/incident_management.yaml` |
+| Boot Streamlit UI | `ASR_LOG_LEVEL=INFO uv run streamlit run src/runtime/ui.py --server.port 37777` |
+| Reset local state | `rm /tmp/asr.db /tmp/asr.db-*; rm -rf /tmp/asr-faiss` |
+| Inspect session events | `sqlite3 /tmp/asr.db "SELECT kind, datetime(ts), substr(payload,1,200) FROM session_events WHERE session_id='<sid>' ORDER BY ts;"` |
+| Inspect a session row | `sqlite3 /tmp/asr.db "SELECT id, status, version FROM incidents WHERE id='<sid>';"` |
+| Live integration smoke | `OLLAMA_API_KEY=… OLLAMA_BASE_URL=https://ollama.com uv run pytest tests/test_integration_driver_s1.py -v` |
+| Open a PR | `gh pr create --base main --head <branch> --title "…" --body "…"` |
+| Watch CI | `gh pr checks <pr_number> --watch` |
+| Squash merge | `gh pr merge <pr_number> --squash --delete-branch --subject "…"` |
+
+## Commands future agents SHOULD AVOID
+
+| Avoid | Why | Use instead |
+|---|---|---|
+| `pip install …` | Bypasses uv lockfile; CI's "Lockfile freshness gate" will fail | `uv add <pkg>` then `uv sync` |
+| `pytest …` (bare) | Doesn't pick up `pythonpath` from `pyproject.toml` | `uv run pytest …` |
+| Editing `dist/*` directly | Bundles are generated; hand-edits get clobbered + CI's "Bundle staleness gate" fails | Edit `src/runtime/` or `examples/`, regenerate via `scripts/build_single_file.py` |
+| `git commit` without bundle regen after touching `src/runtime/` or `examples/` | CI's bundle gate fails | Run `scripts/build_single_file.py`, `git add dist/` |
+| `git push --force` to `main` (or any shared branch) | Rewrites history for everyone | Use a feature branch + PR |
+| `git push origin --delete <branch>` for branches you didn't create | Destructive on shared state | Confirm with the owner |
+| Adding a `TODO` to source | Project rule is "fix root cause, not workaround"; the only `TODO(v2)` in the repo is intentional | Open an issue or write the fix |
+| Adding `except Exception: pass` | Phase 18 (HARD-04) explicitly removed all of these | Log + re-raise, or catch a typed exception |
+| Touching schema columns on `IncidentRow` | Requires a migration; v1.5-B (DEC-008) explicitly left the incident-shaped columns alone | Use `extra_fields` JSON for app-specific data |
+| Calling live LLM providers in tests | CI uses dummy keys; live tests are env-gated and skipped | Use `LLMConfig.stub()` + `EnvelopeStubChatModel` |
+| Renaming `incident` → `session` in source code without bumping the ratchet test | `tests/test_genericity_ratchet.py` enforces the count downward only | Update `BASELINE_TOTAL` in the same commit with rationale comment (see history at `tests/test_genericity_ratchet.py:60-86`) |
+| Writing agent-generated `*.md` outside `docs/` and committing | `docs/*` is gitignored except for explicit allowlist | Add to the allowlist in `.gitignore` if it's a real deliverable; otherwise keep it local |
+
+## Architectural rules
+
+These are **load-bearing** — if you're tempted to violate one, stop
+and re-read `docs/DESIGN.md` § 12 (decision log).
+
+1. **The framework stays domain-agnostic.** Apps subclass `Session`
+   for domain data; framework code references `Session` and
+   `extra_fields`, never app-specific fields. The concept-leak
+   ratchet enforces this on `incident` / `severity` / `reporter`
+   tokens.
+2. **One source of truth per concern.** Gate decisions:
+   `policy.should_gate`. Retry policy: `policy.should_retry`.
+   Status finalization: `_finalize_session_status`. Don't reimplement.
+3. **HITL pause is NOT an error.** `GraphInterrupt` and the
+   `__interrupt__` field on the result dict signal a checkpointed
+   pending_approval, not a failure. `_handle_agent_failure` must NOT
+   fire; finalize must NOT run while paused. See PR #6.
+4. **Append-only audit trails.** `agents_run`, `tool_calls`,
+   `session_events` are never updated in place (the gateway's
+   per-row pending→approved transition IS in-place but is the only
+   exception, and it persists via `_record_pending_resolution`).
+5. **The bundle is the deploy unit.** `dist/*` is regenerated, not
+   hand-edited. Every PR touching `src/runtime/` or `examples/`
+   commits a fresh bundle.
+6. **Provider abstraction stays in `src/runtime/llm.py`.** Apps
+   declare provider config; the framework owns the provider class
+   selection (`langchain_openai.ChatOpenAI` vs
+   `langchain_openai.AzureChatOpenAI` vs `langchain_ollama.ChatOllama`).
+7. **Tests use stubs by default.** Live LLM tests are env-gated;
+   the suite must run cleanly in CI without any provider keys.
+8. **No public-internet calls at deploy time.** Air-gap is the
+   target. The `https://ollama.com` hardcoded fallback was
+   explicitly removed in Phase 13 (HARD-05); don't re-introduce.
+
+## Coding conventions
+
+| Convention | Example |
+|---|---|
+| Pydantic v2 BaseModel for every config / state | `src/runtime/state.py:Session` |
+| Async first; sync wrappers as needed | `OrchestratorService.submit_async` is async; `submit_and_wait` wraps for sync callers |
+| Type-hint everything; pyright fail-on-error gate | `src/runtime/graph.py` |
+| Skill prompts as `system.md` not Python strings | `examples/*/skills/<name>/system.md` |
+| Tools registered via `@mcp.tool()` decorator on FastMCP server | `examples/incident_management/mcp_server.py` |
+| Per-line `# pyright: ignore[<rule>] -- <rationale>` for legitimate stub gaps | `src/runtime/orchestrator.py` (multiple) |
+| String constants for envelope keys / status values | Avoid bare strings — use `runtime.state.ToolStatus` Literal or named constants |
+| `_private_helper(*, kw=…)` for keyword-only args inside the framework | `src/runtime/graph.py:make_agent_node` |
+| Test files mirror source: `src/runtime/X.py` → `tests/test_X.py` | Most do; some are topical (`test_interrupt_detection.py` ≠ one source file) |
+| Conventional-commit subjects | `feat(retry): 429 rate-limit retry…`, `fix(hitl): …`, `refactor(v1.5-B): …`, `docs: …`, `build: …`, `chore(config): …` |
+| Atomic commits per logical change; squash-merge into main | git history shows the pattern |
+
+## Common traps
+
+1. **`pytest` (bare) doesn't pick up the `pythonpath`** → `ModuleNotFoundError: runtime`. Use `uv run pytest …`.
+2. **Touching `src/runtime/` or `examples/` without regenerating `dist/`** → CI bundle gate fails. Always run `uv run python scripts/build_single_file.py && git add dist/` before committing.
+3. **Adding a kwarg to a framework function without checking callers** → `incident=` rename in v1.5-B caught the example app's `_record_success_run(incident=…)` call. Run `git grep -nE "<func>\\("` before any signature change.
+4. **Approving a HITL session that was created on pre-PR-#6 code** → that session's checkpoint is poisoned (langgraph 1.x semantic mismatch). The Approve button silently no-ops. Tell the user to start a fresh session.
+5. **Live OpenRouter `:free` model rate limits** → first call may 429. The v1.5-D 429 retry (7.5s/15s/22.5s) clears most short-window throttles; persistent 429 means quota exhaustion.
+6. **Azure connection error** → check `.env` `AZURE_ENDPOINT` is a real URL, not a placeholder like `noop`.
+7. **Pyright complains about langchain stubs** → use `# pyright: ignore[<rule>] -- <rationale>` per line; don't disable the gate.
+8. **Streamlit `AssertionError: scope["type"] == "http"` storm under Python 3.14** → cosmetic Starlette compat bug; HTTP traffic still works. Filter logs.
+9. **`StaleVersionError` on HITL resume** → was a real bug pre-PR-#6 (stale `state["session"]`); now mitigated by `make_agent_node` reload-on-entry. If you see it again, check whether you accidentally bypassed the reload.
+10. **Two ToolCall rows for one apply_fix** → known cosmetic duplication (gateway colon-form vs harvester `__`-form). Documented as a small follow-up.
+
+## Current unfinished work
+
+From `docs/00-project-overview.md` § "What's next" and
+`docs/10-known-risks-and-todos.md`:
+
+| Item | Effort | Priority |
+|---|---|---|
+| **v2.0 — React UI** replacing Streamlit; parity-port against `/sessions/*` API | ~1–2 weeks | High |
+| Duplicate ToolCall audit rows (gateway colon vs harvester `__`) | ~30 min | Low (cosmetic) |
+| `ApprovalWatchdog` regression test (covers PR #6 saves) | ~15 min | Medium |
+| `ASR_LOG_LEVEL` env var doc in main README | ~5 min | Low |
+| `src/runtime/locks.py:49` — `TODO(v2)` slot eviction | ~1-2h | Low (relevant for long-running servers) |
+
+**Environment-side (operator, not framework):**
+
+- OpenRouter `workhorse` returns 402 on paid models — out of credits
+- Azure live verification needs a real `AZURE_ENDPOINT` (`.env` placeholder)
+
+## Recommended next tasks
+
+In order of value × effort:
+
+1. **Update `.planning/STATE.md` + `.planning/ROADMAP.md`** (gitignored,
+   local) to reflect v1.5 fully shipped. ~5 min.
+2. **Land the smaller cleanups together as a single "v1.5 polish" PR**:
+   `ApprovalWatchdog` test + duplicate ToolCall fix + `ASR_LOG_LEVEL`
+   doc. ~1h total. Closes the loop on v1.5.
+3. **Brainstorm v2.0 React UI** — invoke `superpowers:brainstorming`.
+   Stack pick (Next.js / Vite + React / Remix?), state management,
+   API client codegen from `/sessions/*` OpenAPI?
+4. **Scaffold v2.0 React UI** in a new top-level `web/` directory.
+   Don't touch `src/runtime/` until the parity-port surfaces a real
+   missing API.
+5. **Build a multi-agent live driver** that runs intake → triage →
+   resolution against a real provider end-to-end. Catch provider-quirk
+   regressions earlier than the single-agent S1 driver.
+6. **Postgres CI smoke** — one test against a postgres container so
+   the optional checkpointer doesn't drift unnoticed.
+
+## Where DESIGN.md and this handoff differ
+
+`docs/DESIGN.md` is the **prose narrative** — read it once, top-to-
+bottom, to build the mental model. This handoff is the **action card**
+— skim it at the start of each new session to remember what to do
+and what to avoid.
+
+The 12 numbered files in this `docs/` directory (00 through 11) are
+the **per-topic reference**: jump to whichever one matches your
+current question.
diff --git a/docs/adr/0001-current-architecture.md b/docs/adr/0001-current-architecture.md
new file mode 100644
index 0000000..98b2cf4
--- /dev/null
+++ b/docs/adr/0001-current-architecture.md
@@ -0,0 +1,209 @@
+# ADR 0001: Current architecture
+
+**Status:** Accepted (snapshot of `main` as of v1.5, post-PR #11)
+
+**Date:** 2026-05-14
+
+**Context:** This ADR captures the architectural baseline that
+v1.5 ships. It is a synthesis of the twelve numbered decisions in
+`docs/DESIGN.md` § 12 (DEC-001 through DEC-012). Future ADRs
+should be written for new decisions that supersede or refine this
+baseline.
+
+---
+
+## Decision
+
+The framework's architecture composes three external layers
+(LangGraph, LangChain, FastMCP) with a generic runtime + two
+example apps, deployed as a single-file bundle into air-gapped
+corporate environments.
+
+### Layer composition
+
+| Layer | Provided by | Owned by us |
+|---|---|---|
+| Provider clients | `langchain-openai`, `langchain-ollama` | NO |
+| Agent factory (per-skill ReAct loop) | `langchain.agents.create_agent` (which is itself a langgraph subgraph) | NO |
+| Graph orchestration / checkpointing / `interrupt()` | `langgraph` 1.x | NO |
+| MCP tool servers | `fastmcp` | NO |
+| **Framework abstractions** (`Session`, `Skill`, `Orchestrator`, gateway, telemetry, storage, bundling, HITL plumbing) | THIS REPO (`src/runtime/`) | YES |
+| **Apps** (state subclass, MCP servers, skill prompts) | THIS REPO (`examples/`) or external | YES (examples) / external (downstream apps) |
+
+### Decision summary
+
+Reference: each is detailed in `docs/DESIGN.md` § 12.
+
+| ID | Decision | Why |
+|---|---|---|
+| DEC-001 | LangGraph as orchestration engine | Out-of-the-box Pregel-style step boundaries + checkpointing + first-class HITL `interrupt()` |
+| DEC-002 | `langchain.agents.create_agent` as the per-agent loop (Phase 15) | Single tool-loop; AutoStrategy → ToolStrategy fallback; removed the `recursion_limit=25` workaround |
+| DEC-003 | Markdown turn-output contract over `response_format` JSON (Phase 22) | JSON schema brittleness across providers; markdown is what every chat model writes well; parse leniency under our control |
+| DEC-004 | Pure-policy HITL gating (Phase 11) | One source of truth (`should_gate`); auditing what gates is one grep |
+| DEC-005 | Generic `Session` base + `extra_fields` JSON (v1.1) | Apps extend without schema migrations; framework stays domain-agnostic |
+| DEC-006 | Per-agent `skill.model` override (v1.5-C / M8) | Cheap models for cheap agents; one config knob |
+| DEC-007 | Single-file bundle for air-gap deploy (BUNDLER-01) | Copy-only deploy; no `pip install` at deploy time |
+| DEC-008 | Concept-leak ratchet (v1.5-B) | CI-enforced framework genericity; downward-only count |
+| DEC-009 | 429 separate retry regime (v1.5-D) | Free upstream tiers (OpenRouter `…:free`) need 30-60s windows; 5xx default backoff exhausts in 9s |
+| DEC-010 | Inner agent checkpointer + reload-on-entry (PR #6) | langgraph 1.x `__interrupt__` semantics + outer Pregel step-boundary checkpointing → reload defends against stale state |
+| DEC-011 | Two example apps to prove genericity | Without a second app, "is the framework generic?" is unanswerable |
+| DEC-012 | Bundle staleness CI gate (HARD-08) | dist drift = deploy-time bugs; CI rebuilds + diff every PR |
+
+---
+
+## Consequences
+
+### Positive
+
+- **Air-gap deployable** — copy-only 7-file payload; no runtime
+  internet dependencies; reproducible installs via `uv.lock`.
+- **Genuinely generic** — two distinct example apps prove the
+  decoupling; CI ratchet keeps it that way.
+- **HITL is first-class** — risk-rated gateway, durable pause via
+  langgraph checkpointer, two approval surfaces (UI + API), watchdog
+  for stale approvals.
+- **Per-step observability** — `EventLog` rows for every
+  meaningful boundary, drives the auto-learning lesson store and
+  any external observability stack.
+- **Provider-agnostic** — Ollama / Azure / OpenAI-compatible via
+  one config knob; per-skill override.
+- **Resilient to provider quirks** — markdown contract + Path 5/6
+  synthesis fallbacks; 429 backoff regime; provider timeout +
+  retry on 5xx.
+
+### Negative
+
+- **Two heavy upstream dependencies** (`langgraph`, `langchain`)
+  with histories of breaking semantic changes (PR #6 caught one;
+  more likely on future major bumps).
+- **Single-process model** — `OrchestratorService` is one asyncio
+  loop on one host. Multi-host / multi-tenant deploys need
+  separate orchestrators per tenant.
+- **No built-in auth on the FastAPI surface** — relies on corporate
+  network controls. Webhook triggers have bearer auth only.
+- **Schema migrations are ad-hoc** — no Alembic. Additive changes
+  use `Base.metadata.create_all`; destructive changes need
+  hand-rolled scripts.
+- **Concept-leak residue** — 39 tokens still on the `incident` /
+  `severity` / `reporter` axis after v1.5-B, mostly schema-coupled
+  columns + legacy `/incidents/*` URL routes that would require
+  destructive migration to remove. Documented in
+  `docs/DESIGN.md` § 12 DEC-008.
+- **Bundle files are large** (~660-700KB each). Code review on
+  `dist/*` is impractical; reviewers focus on `src/runtime/`
+  diffs and trust the bundle gate.
+- **Streamlit UI is a prototype** — slated for replacement by a
+  React UI (v2.0, not started). Adds a transitional cost.
+
+### Neutral
+
+- **No queue / messaging integration shipped** — trigger registry
+  + plugin transport ABC exists, but no SQS/Kafka/NATS in-tree.
+- **No container Dockerfile** — Inference: bare-VM / systemd
+  deploy assumed.
+- **No semver tags** — `pyproject.toml` declares `0.1.0`; the
+  v1.0 → v1.5 milestone labels are documentation-level, not git
+  tags. Squash SHAs in `docs/DESIGN.md` § 13 are the canonical
+  references.
+
+---
+
+## Alternatives considered
+
+### Build a graph engine ourselves
+
+Rejected (DEC-001 implicitly). LangGraph's Pregel + checkpointer +
+interrupt semantics are exactly what HITL needs. Owning the
+orchestration engine would cost us a year of work for a similarly-
+shaped result.
+
+### Stay on `langgraph.prebuilt.create_react_agent`
+
+Rejected in Phase 15 (DEC-002). The prebuilt was deprecated; the
+`recursion_limit=25` workaround we needed to avoid infinite loops
+was a symptom of the prebuilt's interaction with our structured-
+output post-pass. `langchain.agents.create_agent` runs a single
+tool-loop with native ToolStrategy fallback, removing the workaround.
+
+### Stay on `response_format=AgentTurnOutput` JSON envelope
+
+Rejected in Phase 22 (DEC-003). `response_format` triggered three
+classes of brittleness: model-specific JSON drift, tool-strategy +
+React END interaction, recursion-limit ceilings. Markdown is the
+native format every chat model writes well; the parse step now
+happens in our code where leniency is in our control.
+
+### Keep `IncidentState` as the only state class
+
+Rejected in v1.1 (DEC-005). Adding a second app (code_review) was
+the forcing function — every "incident-shaped" leak that surfaced
+during code-review's build moved into the framework rather than
+becoming an app workaround. The concept-leak ratchet (DEC-008,
+v1.5-B) keeps this honest.
+
+### Multi-file deploy (zip / tarball / wheel + venv)
+
+Rejected for BUNDLER-01 (DEC-007). Air-gap target is copy-only;
+multi-file `pip install` at deploy time is out of scope. The
+bundler turns the multi-file source tree into the smallest
+possible deploy payload (7 files).
+
+### Use Alembic for schema migrations
+
+Considered, rejected (Inference). Schema changes have been purely
+additive so far. When a destructive change becomes necessary,
+adding Alembic at that point is straightforward. Until then, the
+pydantic + JSON-bag pattern keeps schema rare.
+
+### Multi-agent supervisor as the entry point (instead of intake)
+
+Considered (Phase 6 introduced `kind: supervisor`). The
+incident-management example app uses a supervisor for intake (rule-
+based dispatch); other apps use a `responsive` skill at entry
+(`code_review` does). The framework supports both patterns equally.
+
+---
+
+## Open questions to revisit in future ADRs
+
+These are decisions the v1.5 baseline does NOT take a strong
+position on:
+
+1. **Multi-host orchestration.** When does the single-process model
+   stop scaling? Does the answer involve a shared lock service, a
+   queue between orchestrators, or just "shard by app"?
+2. **Authentication on the FastAPI surface.** Air-gap defers this;
+   if v2.0 React UI is hosted on a corporate intranet with SSO,
+   we'll need at least a JWT verification layer. ADR 0002?
+3. **Postgres CI coverage.** The `asr[postgres]` extra ships but
+   no CI test exercises it. A postgres container in CI would
+   close the gap; cost is CI time + workflow complexity.
+4. **Trigger fan-in transports.** SQS / Kafka / NATS plugin
+   transports exist as scaffold — no production user yet. When
+   the first arrives, the plugin transport ABC may need refining.
+5. **React UI architecture.** Stack pick (Next.js? Vite +
+   React Router?), state management (TanStack Query?), API codegen
+   from a generated OpenAPI spec? ADR 0003 territory.
+6. **Lesson-store pruning.** `LessonRow` is append-only; soft delete
+   exists but there's no automatic GC. At what corpus size do
+   intake's relevance lookups slow down enough to need pruning?
+7. **Dual-write inconsistency between IncidentRow.pending_intervention
+   and the langgraph checkpointer.** Currently both are written
+   when a gate pauses; race-window between the two writes is
+   tolerated (operator dashboards may briefly disagree). Worth a
+   focused test or a transactional wrapper?
+
+---
+
+## Related documents
+
+- `docs/DESIGN.md` — long-form architecture narrative + decision
+  rationale + milestone history
+- `docs/00-project-overview.md` — what / who / status
+- `docs/02-architecture.md` — quick-scan summary of the layers +
+  data flow
+- `docs/04-main-flows.md` — entry points + failure modes per flow
+- `docs/06-data-model.md` — entities + relationships +
+  persistence assumptions
+- `docs/10-known-risks-and-todos.md` — what's pending
+- `docs/11-agent-handoff.md` — action card for AI agents