diff --git a/CHANGELOG.md b/CHANGELOG.md index e228aeb..0d30556 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -52,6 +52,7 @@ CHANGELOG entry. ### Documentation - Post-v1.0 staleness sweep: `ROADMAP.md`, `docs/deployment/programmatic.md`, and `docs/spec/31-llm-backend.md` updated to reflect the v1.0 release (twelfth backend protocol shipped, MCPServerRegistryBackend added to lists, status copy flipped from future-tense to PyPI-install present-tense, spec/31 status line refreshed from "v0.13 era" to "v1.0"). Closes the three stragglers an audit found after the PR #334 v1.0 release sweep. +- Slimmed `CLAUDE.md` from 88k → 30k chars to clear the Claude Code 40k-char prompt budget. Extracted the per-protocol detail (the wall-of-text "What this is" paragraph and the bullet-per-backend Status block) into a new `docs/protocols-shipped.md` that owns the full reference impls / capabilities / operator overrides / doctor checks / Implementer Contracts / cliff-closed narrative for all twelve shipped backends. `CLAUDE.md` keeps the design ethos + a compact 12-row Status table that points to the detail file. Matches the framework's own "progressive disclosure" principle (rule 6 — load metadata in-prompt; lazy-load content). ## [1.0.0] - 2026-06-04 diff --git a/CLAUDE.md b/CLAUDE.md index 9c8a0f0..cbe881e 100644 --- a/CLAUDE.md +++ b/CLAUDE.md @@ -12,7 +12,7 @@ For broader context, read these in order on a fresh session: ## What this is -Atomic Agents is a vault-native AI agent framework: agents live as plain markdown files, the runtime is stateless, and storage is moving toward swappable protocols layer by layer. **Shipped backend protocols**: MemoryBackend (PR #57); LLMBackend (#87 — Anthropic + OpenAI + Moonshot reference impls); JudgeBackend Protocol (#112 — locked at PR 4 with conformance suite, PolicyJudge + LLMJudgeBackend reference impls, ESCALATE + REVISE state machines, `judges.md` operator config + cascade-aware project floor, operator-driven resolution flow); LockBackend Protocol (#60 — locked at PR 4 with `FilesystemLockBackend` + `RedisLockBackend` reference impls, `scope()` Protocol method, daemon-thread heartbeat with `LockLost` lease-expiry detection, operator override via env vars + constructor kwarg, doctor `check_lock_backend` coherence check — closes the multi-host cliff so atomic-agents runs on Cloud Run / Kubernetes / gizmo without forking); LogBackend Protocol (#61 — locked at PR 4 with `FilesystemLogBackend` + `SQLiteLogBackend` reference impls, parametrized conformance suite across both backends, operator override via `ATOMIC_AGENTS_LOG_BACKEND` env var + constructor kwarg + per-runner kwargs on OutcomeRunner/DreamRunner, doctor `check_log_backend` coherence check with stats probe + URL credential redaction, `LogQuery.agent_name` filter for shared-backend cross-agent isolation — closes the dashboard-perf cliff: operators on Cloud Run / Kubernetes can pin SQLite for indexed query/aggregate/retention); AgentProfileBackend Protocol (#63 — locked at PR 4 with `FilesystemAgentProfileBackend` + `SQLiteAgentProfileBackend` reference impls, parametrized conformance suite across both backends, JSON-based snapshot trio on both backends, `supports_skills` capability dimension, operator override via `ATOMIC_AGENTS_PROFILE_BACKEND=sqlite` + optional `ATOMIC_AGENTS_PROFILE_BACKEND_URL` env vars OR `AtomicAgent(..., profile_backend=...)` constructor kwarg + per-runner kwargs on OutcomeRunner/EvalRunner/DreamRunner/delegate.py, doctor `check_agent_profile_backend` coherence check with capability snapshot + agent-count probe + URL credential redaction, Implementer contract for registry-backed backends documented in spec/24 — closes the SaaS-shape cliff: SaaS / database-backed / git-backed agent registries are now ONE Protocol implementation away); ToolRegistryBackend Protocol (#64 — locked at PR 4 with `FilesystemToolRegistryBackend` + `SQLiteToolRegistryBackend` reference impls, parametrized conformance suite across both backends, hybrid metadata-in-SQL + handler-bodies-on-disk storage shape on SQLite, `install` / `uninstall` capability flipped True on SQLite with TOCTOU-safe INSERT-first + atomic_write-on-success-only atomicity, multi-process WAL race resolved by `PRAGMA busy_timeout=5000` before WAL pragma, cross-scope isolation enforced at SQL layer (`WHERE agent_scope = ?` on every query), URL factory credential redaction across all 5 `ValueError` sites, operator override via `ATOMIC_AGENTS_TOOL_REGISTRY_BACKEND=sqlite` + optional `ATOMIC_AGENTS_TOOL_REGISTRY_BACKEND_URL` (`sqlite:///path?agent_scope=`) env vars OR `AtomicAgent(..., tool_registry_backend=...)` constructor kwarg + per-runner kwargs on OutcomeRunner/EvalRunner/DreamRunner (delegate.py deliberately NOT threaded — per-agent scoping per spec/25 Decision 9), doctor `check_tool_registry_backend` coherence check with capability snapshot + tool-count probe + URL credential redaction, Implementer contract for registry-backed tool backends documented in spec/25 — Protocol seam in place; future PyPI / git / company-internal-HTTP / SaaS-database adapters register via `register_tool_registry_backend(...)` without forking core); **PolicyBackend Protocol (#89 — locked at PR 4 with `FilesystemPolicyBackend` reference impl reading `/policy.md` (markdown + embedded YAML), mtime+size composite-key parse cache (`cache_ttl_s=0` capability declaration — operators observe edits within 0 seconds of mtime change), `agent_name` charset `[a-zA-Z0-9_.+@-]+` enforced at API boundary with path-traversal / control-char / newline / leading-dot refusal, side-effect-free construction (lazy parse on first method call so the 115 existing `AtomicAgent(...)` construction sites stay byte-identical when no `policy.md` exists), parametrized conformance suite across registered backends, `PolicySnapshotForCall` frozen per call entry (per Premise 3 — operator edits mid-call defer to the next call), cost-cap MIN composition in `_check_cost_guardrails` + `MandateCheck` steps 7-9 consume pre-composed effective caps (PR 3a — cost caps enforce immediately), non-cap surfaces (tool allowlist, MCP server allowlist, model selection) consumed at the matching call sites with `ATOMIC_AGENTS_POLICY_ENFORCE_NONCAP` env-var-gated enforcement (PR 3b shipped in log-only mode; **PR 4 flipped the default to `true` — non-cap surfaces enforce by default; operators wanting log-only set the env to `false` explicitly**), unified `policy_decision` event family with `decision_kind: deny | override` discriminator + `axis: cost_cap | tool_allowlist | mcp_allowlist | model_selection` field + `enforced: bool` so SaaS / Postgres adapters target a frozen schema (Premise 4 — one event answers "was this Policy or Mandate?" via `denying_layer`), `model_from_per_call_override` field captures the `agent.call(model=...)` kwarg when Policy supersedes it (#274 — fleet-config-wins precedence is audit-visible to the caller), per-call dedup set bounds tool-allowlist denial emissions to one event per `(tool_name, call)` (#273 — log-only audit shape stays clean when the LLM re-attempts a denied tool every iteration), per-dimension MIN cap math (`daily` and `monthly` independently; cumulative deferred to v1.1 per plan-subagent D1), per-agent overrides under nested `agents:` section with field-level MERGE for caps + UNION+deny-wins for allowlists + REPLACE for model selection, cross-host cap-overrun bound `(replica_count) × (per-call ceiling)` documented for shared-FS deployments (Postgres / SaaS adapters with linearizable state get exact-cap semantics through their own consistency layer), operator override via `ATOMIC_AGENTS_POLICY_BACKEND` env var OR `AtomicAgent(..., policy_backend=...)` constructor kwarg + per-runner kwargs on OutcomeRunner/EvalRunner/DreamRunner + `delegate.py` threading per spec/32 D1 (Policy is fleet-scoped — a delegate inheriting the coordinator's pinned Postgres backend doesn't silently fall back to filesystem-default), `doctor.check_policy_backend` validates operator-config coherence with PASS/WARN/FAIL ladder + capability snapshot + URL credential redaction, Implementer contract for policy backends documented in spec/32 §"Implementer contract for policy backends" (7 normative MUSTs covering `agent_name` validation at API boundary, per-agent storage isolation, `cache_ttl_s`-bounded staleness, side-effect-free construction, capability honesty, URL credential redaction, `PolicyDecision` event schema compliance). **Closes the cross-agent configuration cliff**: operators with a fleet of agents stop hand-syncing `model.md` / `tools.md` / `mcp.md` across N agents; the single project-root `policy.md` is the audit-trail source of truth, with SaaS / Postgres / org-admin-console adapters one Protocol implementation away.** **MandateBackend Protocol (#124 — locked at PR 4 with `FilesystemMandateBackend` reference impl, parametrized conformance suite across registered backends, `MandateCheck` judge specialist with validation steps 1-9 (existence, source-hash binding, state, tool allowlist, target allowlist via per-agent named `TargetExtractorRegistry`, time window, token-cost projection with stale-baseline defense, external-cost projection via `CostEstimatorRegistry` fail-closed to `mandate_external_cost_unprojectable`, escalation thresholds with ESCALATE-preempts-BLOCK precedence), reservation pattern (`MandateReservationManager.create / commit / rollback / _expire` lifecycle with `threading.Timer`-driven TTL watchers + `threading.Lock`-serialized in-process state), crash recovery via `MandateBackend.recover_orphan_reservations` with LockBackend-serialized scan-inside-lock discipline (pessimistic over-report > silent under-bill for orphan reservations from prior crashed runs), post-action verification event family (`mandate_action_verified` / `_diverged` / `_verification_unavailable` emitted exactly once per `external_side_effect` / `irreversible` action after cost commit), suspicious-rebind throttle (60s default; closes the source-hash-before-state edit window for prompt-injection-style threats), `mandates.md` operator-authored markdown + embedded YAML parser + `judges.md ## Mandates` operator config with cascade-aware project floor, structural write protection (`mandates.md` excluded from default WritePolicy alongside `tools.md` / `judges.md` / `persona/*.md`), operator override via `ATOMIC_AGENTS_MANDATE_BACKEND` env var OR `AtomicAgent(..., mandate_backend=...)` constructor kwarg + per-runner kwargs on OutcomeRunner/EvalRunner/DreamRunner (delegate.py deliberately NOT threaded — per-agent scoping per spec/29 + spec/15 delegate isolation), doctor `check_mandate_backend` coherence check, Implementer contract for mandate backends documented in spec/29 — closes the durable-authorization cliff: operators authoring `cumulative_external_usd: 6000` on a procurement mandate now have that cap defended against concurrent action races + crash-restart, with operator-facing audit signal when an action's executed target diverged from authorization at proposal time).** **PersonaBackend Protocol (#62 — locked at PR 4 with `tests/test_persona_protocol_conformance.py` parametrized across registered backends + `tests/test_persona_filesystem_backend.py` + `tests/test_persona_composition.py` + `tests/test_profile_composition_snapshot.py` + `tests/test_profile_composition_restore.py`, `FilesystemPersonaBackend(personas_root)` reference impl at `/.personas//{IDENTITY,SOUL,USER}.md` + `metadata.json` sidecar (hidden namespace mirrors `.snapshots/`; `list_agents()` skips dot-prefixed entries so personas don't surface as agents), `persona_id` charset `[a-zA-Z0-9_.+@-]+` enforced at API boundary with path-traversal / control-char / newline / leading-dot refusal, group-atomic `save_persona` with `mkdir(exist_ok=False)` for race-free fresh-create + swap-and-delete for `overwrite=True` (20-iteration retry bound on macOS APFS `ENOTEMPTY`), snapshot trio capability flipped `supports_snapshot=False → True` in PR 3 with nested storage `//.snapshots//` (D-PP-10 — geometric cross-persona isolation: `rm -rf //` removes the persona AND its full history cleanly) + `snap__<12hex>` snapshot IDs matching AgentProfile (D-PP-11 — 48-bit entropy + cross-Protocol uniformity enables a shared `_validate_snapshot_id` path-security guard), `/persona.link.md` (YAML-in-code-block with `kind: shared` + `persona_id` per D-ER-4) is the ownership trigger driving AgentProfileBackend composition via `external_persona_ref(agent_id) -> str | None` (D-PP-3 — supersedes D-ER-1's earlier boolean for cleaner bootstrap-path resolution) so `load_profile` repopulates persona fields + re-derives `agent_mode` (D-PP-4), `save_profile` ignores persona fields when externally owned (D6, mirrors spec/24 Decision 6's `agent_mode` pattern), `snapshot()` + `restore()` drop persona fields with one-time `agent_profile_restore_dropped_persona_fields` warning per `(agent_id, snapshot_id)` via thread-safe per-process dedup (D-PP-13 migration-window event), `PersonaOwnershipConflict` raised on filesystem-backend when both `persona.link.md` and `persona/IDENTITY.md` coexist (D2a + D-PP-8 — filesystem-only loud refusal; SQLite uses silent-drop with the equivalent `agent_profile_save_dropped_persona_fields` event for cross-backend uniformity), SQLite v1→v2 schema migration adds `agents.persona_id` column with forward-only race-loser handling, D-PP-1 sentinel sweep teaches `load_profile/list_agents/exists/list_skills/load_skill_body` about the shared-persona layout (D-PP-12 closed the sweep in PR 3), operator override via `ATOMIC_AGENTS_PERSONA_BACKEND` + optional `ATOMIC_AGENTS_PERSONA_BACKEND_URL` env vars OR `AtomicAgent(..., persona_backend=...)` constructor kwarg + per-runner kwargs on OutcomeRunner/EvalRunner/DreamRunner + `delegate.py` explicit-only threading per D-ER-2 (mirrors Policy's `_policy_backend_was_explicit` precedent at `agent.py:401`; default-resolved backends do not leak the coordinator's `personas_root` to delegates because persona is per-agent semantic context), `atomic-agents persona list / show / snapshot / list-snapshots / restore / clone` CLI (zero LLM calls) catches `PersonaError` subclasses + `OSError` + `PermissionError` cleanly, doctor `check_persona_backend` coherence check with PASS/WARN/FAIL ladder + capability snapshot + URL credential redaction, Implementer contract for persona backends documented in spec/33 §"Implementer contract for persona backends" (8 normative MUSTs), D5 retires spec/24's `TemplateProfileBackend` reservation — `PersonaCapabilities.supports_templates` is the canonical home for a future persona-template marketplace surface — **closes the shared-persona cliff**: a team running 5 customer-support agents stops maintaining 5 separate `SOUL.md` files that drift; one canonical persona record (`shared:customer-support-v3`) serves all 5 agents with consistent identity, snapshot/restore lifecycle, and operator-editable markdown — home users with one agent running the legacy `/persona/{IDENTITY,SOUL,USER}.md` layout see byte-identical pre-#62 behavior because the legacy layout works forever through AgentProfile's existing filesystem walk).** **CorpusBackend Protocol** (#65, locked at PR 4 with `tests/test_corpus_protocol_conformance.py` parametrized across registered backends + `tests/test_corpus_filesystem_backend.py` + `tests/test_corpus_sqlite_backend.py` + `tests/test_corpus_registry.py` + `tests/test_corpus_composition.py` + `tests/test_corpus_wiring.py` + `tests/test_corpus_migration_regression.py` + `tests/test_corpus_doctor.py`, `FilesystemCorpusBackend(agent_root)` reference impl reading `/wiki/` (distilled knowledge per the Karpathy style) + `/raw/` (operator-ingested source documents) with per-page `_io.atomic_write` safety + `render_index_summary(corpus)` Protocol method that returns the routing INDEX the agent loads at step [7] of the canonical load order per spec/04, `SQLiteCorpusBackend` with FTS5 (stdlib `sqlite3`, no optional extra; hybrid storage shape with metadata in SQL + bodies on disk matching ToolRegistryBackend precedent; WAL journal mode + `PRAGMA busy_timeout=5000` before WAL pragma mirroring the multi-process race fix from #64; FTS5 virtual table for O(log N) indexed full-text query on page bodies + frontmatter titles; cross-agent isolation enforced at the SQL layer via `WHERE agent_scope = ? AND corpus = ?` double discriminator; `BEGIN IMMEDIATE` transaction discipline wrapping the read-validate-UPSERT-FTS sequence in `write_page`; INSERT-first + atomic_write-on-success-only atomicity for hybrid storage half-failure recovery; idempotent `INSERT OR IGNORE` cold-start schema init for multi-replica deployments), page name charset `[a-zA-Z0-9_.+@-]+` enforced at API boundary with path-traversal / control-char / leading-dot refusal, side-effect-free construction (empty or missing `wiki/` + `raw/` yields zero registrations so all 166 existing `AtomicAgent(...)` construction sites stay byte-identical when no corpus is configured; IRON RULE byte-identity regression suite at `tests/test_corpus_migration_regression.py` pins the contract across 5 explicit assertions covering the wiki INDEX read path and bundle rendering), parametrized conformance suite across both backends pins the Protocol contract so future `PgvectorCorpusBackend` + Postgres adapters register via `register_corpus_backend(...)` without forking core (the semantic-search seam is deferred to the coordinated #258 Postgres-adapter family release so semantic-search coverage stays symmetric across MemoryBackend + CorpusBackend; ROADMAP §"Semantic memory retrieval" frames this as the Letta-gap closer), call-site migration: `agent.py:_load_indexes()` routes `wiki/INDEX.md` reads through `corpus_backend.render_index_summary("wiki")` when registered (per spec/04 step [7]; legacy direct-read path catches `OSError` + `UnicodeDecodeError` with logged warning marker for soft-degrade symmetry), `bundle.py:_render_memory_breakpoint` gains a `corpus_backend: CorpusBackend | None = None` parameter threaded three levels through `render_bundle`, with a shared `_render_wiki_index_section(label, path, content)` helper producing byte-identical output between Protocol path and legacy fallback (IRON RULE assertion 4), `bundle.py:_source_paths` migration deferred to v1.1 (filesystem-only function; pinned by the deferral test and tracked at #314), `CorpusBackend` becomes the source of truth for `wiki/` and `raw/` per spec/34 while `MemoryBackend` retains exclusive ownership of `memory/` and `journal/` (spec/24 Decision 7 addendum), operator override via `ATOMIC_AGENTS_CORPUS_BACKEND` + optional `ATOMIC_AGENTS_CORPUS_BACKEND_URL` env vars (when `=sqlite` without URL, defaults to `/.corpus.db` with `agent_scope=quote_plus(agent_root.name)` so single-host operators get a working SQLite default by flipping one env var) OR `AtomicAgent(..., corpus_backend=...)` constructor kwarg + per-runner kwargs on OutcomeRunner (threads at `outcome.py:255`) / EvalRunner (at `eval.py:363`) / DreamRunner (stores as `self._corpus_backend` for API parity; no internal `AtomicAgent` construction site in v1), `delegate.py` explicit-only threading via `_corpus_backend_was_explicit` flag mirroring PersonaBackend D-ER-2 at `agent.py:431` (default-resolved backends do not leak the coordinator's `agent_root` to delegates because corpus is per-agent semantic context, distinct from fleet-scoped Policy + AgentProfile which always thread), `doctor.check_corpus_backend` coherence check with PASS/WARN/FAIL ladder + capability snapshot + page-count performance cliff WARN when `stats().page_count` exceeds 1000 pages on `supports_full_text_search=False` (the WARN hint names `ATOMIC_AGENTS_CORPUS_BACKEND=sqlite` as the remedy, mirroring the LogBackend doctor precedent) + URL credential redaction across operator-facing error paths, `atomic-agents corpus` CLI (`list`/`show`/`query`/`version`/`restore` subcommands, zero LLM calls, env-var-aware), Implementer contract for corpus backends documented in spec/34 §"Implementer contract for corpus backends" (9 normative MUSTs covering page name charset validation at API boundary, side-effect-free construction, capability honesty including `embedding_provider=None` invariant, `query()` capability precedence rule, `write_page()` 4-case behavior table, URL credential redaction across operator-facing error paths, cross-corpus isolation at storage layer, snapshot id determinism + cross-page isolation, `backend_id` stability + `close()` idempotency). **Closes the GB-scale wiki cliff**: operators with a 10K-page wiki or hundreds of MB of raw documents stop waiting seconds per keyword grep over an unindexed filesystem; `SQLiteCorpusBackend` with FTS5 delivers O(log N) indexed full-text search at stdlib cost (no Postgres operator burden); future `PgvectorCorpusBackend` arrives via the coordinated #258 release for symmetric semantic retrieval across both substrates. Same agent definitions, same `agent.call()` flow, same audit trail, different corpus substrate. **MCPServerRegistryBackend Protocol** (#201, **locked at PR 5 of 5 (#201 PR 5, squash hash TBD after merge)** with `tests/test_mcp_server_registry_conformance.py` parametrized across both backends + `tests/test_mcp_server_registry_http_backend.py`, `FilesystemMCPServerRegistryBackend(agent_root, read_paths)` reference impl reading `/mcp.md` + optional read_paths for shared catalogs, `HTTPMCPServerRegistryBackend(catalog_url, agent_scope)` reference impl with tier-1/2/3 capability negotiation (D1-D4: OPTIONS probe for tier negotiation, `GET /capabilities` for structured capability body, tier-1 = read-only, tier-2 = read + install/uninstall, tier-3 = read + install/uninstall + audit), Protocol surface: `list_mcp_servers` / `load_mcp_server` / `load_all_mcp_servers` / `validate_mcp_server` / `install` / `uninstall` / `capabilities` / `refresh_capabilities` / `close`, key decisions: D1 (filesystem read-only; catalog server owns transactionality for HTTP), D2 (per-agent scoping via `agent_scope` query param on HTTP), D3 (MCP servers are processes; ToolRegistry is functions. Separate Protocols per spec/25 Decision 3), D4 (tier negotiation: OPTIONS then capabilities endpoint), D5 (`lock_backend` kwarg on filesystem for `.mcp_registry.lock` file distinct from agent main `.lock`), D6 (pre-probe conservative False/False capability default; HTTP dynamic per tier; tier-1 fallback stays False/False), D7 (env-var references resolve client-side at load time; install path must emit unresolved `$VAR` form), D8 (409 collision maps to `MCPServerAlreadyInstalled`; 405 triggers mid-session tier regression handler with re-probe + cache invalidation), D9 (URL credential redaction via `_safe_catalog_url` in ALL error paths), conformance suite covers 10 MUSTs (MUST 1 name charset, MUST 2 side-effect-free construction, MUST 3 capability honesty, MUST 4 credential redaction, MUST 5 per-agent scoping, MUST 6 backend_id stability + close idempotency, MUST 7 transient-vs-permanent failure honesty, MUST 8 env-var resolution at load time, MUST 9 install/uninstall atomicity + idempotency, MUST 10 load_all consistency), capability flag evolution: PR 1-4 static False/False on HTTP (unconditional NIE on write paths) | PR 5 dynamic True/True on tier-2+ probed backends (install/uninstall now live), 405 mid-session tier regression handler: re-probes then raises NotImplementedError with tier-change message + updates cache; if re-probe fails raises MCPRegistryUnavailable with "Capability cache may be stale" message, test count ~3,319-3,325 at PR 5 (delta +12 to +18 vs post-PR-4 3,307). **Closes the v1.0 Protocol surface**: operators with a managed MCP catalog or a private HTTP catalog registry can now install/uninstall MCP servers from the same `agent.call()` flow as home-user filesystem operators. **Twelve backend protocols shipped.** A person at home runs filesystem-everything with one agent. An organization runs the same agents over Postgres, behind an HTTP service, with a fleet of orchestrated roles. **Same agent definitions, same call() flow, same audit trail. Different backends.** +Atomic Agents is a vault-native AI agent framework: agents live as plain markdown files, the runtime is stateless, and storage is moving toward swappable protocols layer by layer. **Twelve backend protocols shipped through v1.0** — see `docs/protocols-shipped.md` for the per-protocol summary (reference impls, capabilities, operator overrides, doctor checks, Implementer Contracts, and what cliff each closes). The spec is the central artifact. The Python package is one conforming reference implementation. Anyone can build agents to the spec without using this code — and eventually, alternate implementations will. @@ -292,7 +292,8 @@ If the project ever needs to optimize differently, `docs/methodology.md` is the | Doc | Purpose | |-----|---------| | `docs/architecture.md` | Mental model in diagrams. Read first. | -| `docs/spec/01-...36-mcp-server-registry-backend.md` | Locked spec (36 docs today, 32 locked + 3 drafts at spec/26 (cascade bundle), spec/30 (responsibility audit), and spec/35 (init wizard)). The product. | +| `docs/protocols-shipped.md` | Per-protocol summary of the twelve shipped backends — reference impls, capabilities, operator overrides, doctor checks, Implementer Contracts, and the cliff each closes. | +| `docs/spec/01-...36-mcp-server-registry-backend.md` | Locked spec (35 docs today, 32 locked + 3 drafts at spec/26 (cascade bundle), spec/30 (responsibility audit), and spec/35 (init wizard)). The product. | | `docs/implementation/` | Build guides per runtime (cron, Claude skill, dashboard) | | `docs/deployment/versioning.md`, `upgrading.md` | SemVer + operator runbook | | `docs/deployment/release-runbook.md` | Maintainer `/ship` runbook: two-mode workflow + manual surface check | @@ -341,18 +342,22 @@ These are not forbidden forever — they're explicitly deferred with rationale. ## Status -**v1.0.0, stable, PUBLIC.** Core runtime stable. Test suite: run `uv run pytest --collect-only -q | tail -1` for the live count (last refresh: ~3,319-3,325 tests collected, 2026-06-04). Capability-gated skips fall into four buckets — ToolRegistry conformance (filesystem-shape + `supports_uninstall=False` variants), AgentProfile (skill-content + filesystem-shape on SQLite), cross-process Redis (require real Redis instead of fakeredis), and judge-conformance dispatch (LLM-only + PolicyJudge concurrent-evaluate). Full CI runs against `uv sync --extra dev --extra openai --extra validation --extra redis`. **Twelve backend protocols shipped**: - -- **MemoryBackend** (PR #57) — filesystem reference impl + conformance suite. -- **LLMBackend** (#87) — Anthropic + OpenAI + Moonshot reference impls, registered at framework import; conformance suite parametrizes across all three. -- **JudgeBackend Protocol** (#112, **locked at PR 4** with `tests/test_judge_protocol_conformance.py`) — PolicyJudge (rule engine) + LLMJudgeBackend reference impls; ESCALATE + REVISE state machines; `judges.md` operator config with cascade-aware project floor; operator-driven resolution flow (Approved / Denied / Redacted / Revised / Auto-decided); body-integrity check + O_EXCL sidecar de-dup + CAS-safe auto-decide. **PR 5a (unreleased):** `escalation.fallback_on_timeout` widens to per-class dict form; auto-decide resolves policy from PENDING frontmatter `action_class`. **PR 5b (unreleased):** strict JSON-Schema validation of amended `tool_arguments` via the opt-in `[validation]` extra (`validation: strict` in `judges.md`); default remains `weakened` (PR 3c behavior), so operators upgrading without flipping the field see no behavior change. Concludes the #112 arc-with-amendments. Dispatch opt-in via `judges.md` in the agent root or `AGENT_JUDGE_ENABLED=1` — existing deployments see no judge invocation by default. -- **LockBackend Protocol** (#60, **locked at PR 4** with `tests/test_lock_protocol_conformance.py` parametrized across both backends) — `FilesystemLockBackend` (POSIX `fcntl.flock` advisory; preserves the legacy `/.lock` on-disk artifact byte-for-byte) + `RedisLockBackend` (single-instance Redis advisory lock + atomic Lua release/renew + daemon heartbeat at TTL/3 + `LockLost` lease-expiry detection) reference impls. `scope(sub_path)` Protocol method lets operators pass ONE backend; framework re-scopes for dream + memory paths internally. Operator override via `ATOMIC_AGENTS_LOCK_BACKEND` + `ATOMIC_AGENTS_LOCK_BACKEND_URL` env vars (deployment path) OR `AtomicAgent(..., lock_backend=...)` constructor kwarg (programmatic path — always wins). `doctor.check_lock_backend` validates operator-config coherence with PASS/WARN/FAIL ladder + credential-redacted URL output. `_locks.AgentLock` preserved as a deprecation shim (sunset planned for v1.1 (deferred from v1.0 per #201 PR 5 release decision)). **Closes the multi-host cliff** that motivated the entire arc: atomic-agents now runs on Cloud Run / Kubernetes / gizmo without forking the framework. -- **LogBackend Protocol** (#61, **locked at PR 4** with `tests/test_log_protocol_conformance.py` parametrized across both backends) — `FilesystemLogBackend` (JSONL-on-disk; preserves the legacy `/log/YYYY-MM/YYYY-MM-DD.jsonl` artifact byte-for-byte via `_io.atomic_append_jsonl`) + `SQLiteLogBackend` (stdlib `sqlite3`, no optional extra; six indexes covering dashboard + cost-guardrail query patterns; WAL journal mode + per-thread connections for multi-process append safety on local filesystems; aggregation pushdown via SQL `GROUP BY` for canonical columns + SQLite JSON1 `json_extract` for primitive-specific `extra`-field group_bys with alphanumeric-identifier SQL injection guard; index-driven `delete_older_than`; schema version tracking with idempotent `INSERT OR IGNORE` cold-start init for multi-replica deployments). Operator override via `ATOMIC_AGENTS_LOG_BACKEND` + optional `ATOMIC_AGENTS_LOG_BACKEND_URL` env vars OR `AtomicAgent(..., log_backend=...)` / `OutcomeRunner(..., log_backend=...)` / `DreamRunner(..., log_backend=...)` constructor kwargs (programmatic path — always wins; threads through to internal sub-agents). `LogQuery.agent_name` filter (added in PR 3 review-pass per Step 11 P0 #1) for shared-backend cross-agent isolation with lenient match for legacy records (records without `agent_name` match any filter — filesystem per-agent-dir scoping is the natural isolation primitive). `doctor.check_log_backend` validates operator-config coherence with PASS/WARN/FAIL ladder + stats probe (records_today / records_this_month) + URL-credential redaction. Implementer contract for queryable backends documented in spec/22 §"Implementer contract for queryable backends" — future Postgres / Datadog / Loki / Cloud Logging adapters mirror the SQLite reference's shape. **Closes the dashboard-perf cliff** + remote-shipping requirement: operators on Cloud Run / Kubernetes with N replicas can pin SQLite for O(log N) indexed queries + indexed retention; the same Protocol seam admits future Datadog / Loki / Postgres-with-pgvector backends without forking the framework. -- **AgentProfileBackend Protocol** (#63, **locked at PR 4** with `tests/test_profile_protocol_conformance.py` parametrized across both backends — 46 tests × 2 backends = ~92 invocations) — `FilesystemAgentProfileBackend` (walks `/persona/IDENTITY.md|SOUL.md|USER.md` + `/{model,tools,judges,roster,mcp,goal}.md` + `/skills//SKILL.md` via the existing parsers; preserves byte-for-byte on-disk artifacts via `_io.atomic_write`; cascade-aware via `_cascade.detect_cascade`; JSON-based snapshot trio at `/.snapshots///{profile,metadata}.json` with `_validate_snapshot_id` path-traversal refusal + `relative_to(snapshots_root)` path-scope check + `metadata.agent_id` cross-check) + `SQLiteAgentProfileBackend` (stdlib `sqlite3`, no optional extra; JSON blob + indexed scalars approach — `agents(name PK, agent_mode indexed, profile_json, updated_at)` + `profile_snapshots(snapshot_id PK, agent_id+created_at composite indexed, label, profile_json)` + `meta(key PK, value)` with schema_version tracking via idempotent `INSERT OR IGNORE` cold-start init; `threading.local` connection pool + WAL journal mode + `synchronous=NORMAL` for multi-process append safety on local filesystems; cross-agent snapshot isolation enforced via `WHERE snapshot_id = ? AND agent_id = ?` AND-clause). `supports_skills` capability dimension — filesystem=True (walks skill dirs), SQLite=False (skills stay filesystem-only in v1; future `save_skill` Protocol method lands when SaaS UI editing requires DB-backed skill bodies). 48-bit snapshot id random tail (Step 11 adversarial F-8) makes same-second collision at 4K snapshots/sec ~6e-8. Operator override via `ATOMIC_AGENTS_PROFILE_BACKEND` + optional `ATOMIC_AGENTS_PROFILE_BACKEND_URL` env vars (when `=sqlite` without URL, defaults to `/.profile.db` so single-host operators get a working SQLite default by flipping ONE env var) OR `AtomicAgent(..., profile_backend=...)` / `OutcomeRunner(..., profile_backend=...)` / `EvalRunner(..., profile_backend=...)` / `DreamRunner(..., profile_backend=...)` constructor kwargs (programmatic path — always wins; threads through to internal sub-agents and `delegate.py`). `AgentProfile` carries typed shadow + raw text for every config file (spec/24 Decision 1 — `mcp_md_raw` preserves `$VAR` env refs verbatim so save paths never bake resolved secrets into on-disk state). `save_profile` re-derives `agent_mode` from `persona_identity` on every write (spec/24 Decision 6 — single source of truth). `doctor.check_agent_profile_backend` validates operator-config coherence with PASS/WARN/FAIL ladder + capability snapshot (incl. `supports_skills` disclosure) + agent-count probe + URL-credential redaction. Implementer contract for registry-backed backends documented in spec/24 §"Implementer contract for registry-backed backends" (8 normative MUSTs covering path-traversal refusal at API boundary, cross-agent snapshot isolation at storage layer, agent_mode re-derivation discipline, raw-text round-trip preservation, idempotent schema init across processes, snapshot id entropy budget, thread-life-tied connection management, supports_skills capability honesty) — future Postgres / git / SaaS-database adapters mirror the SQLite + filesystem references' shapes. **Closes the SaaS-shape cliff**: SaaS / database-backed / git-backed agent registries are now ONE Protocol implementation away from the framework's existing operator-config surface. Same agent definitions, same `agent.call()` flow, same audit trail — different substrate. -- **ToolRegistryBackend Protocol** (#64, **locked at PR 4** with `tests/test_tool_registry_protocol_conformance.py` parametrized across both backends — 43 conformance test functions running on filesystem + SQLite, 18 skips on capability gates) — `FilesystemToolRegistryBackend(agent_root)` (walks `/tools/.md` for descriptors + `/tools/.py` for handler modules via `importlib.util.spec_from_file_location`; refuses path-traversal in `name` at API boundary; refuses control characters; 256 KB descriptor size cap defending against YAML alias-bomb DoS — PR 1 Step 11 REPRODUCED at 33 GB RSS pre-fix; treats `chmod-000 tools/` as empty rather than `PermissionError`-crashing every agent construction — PR 2 Step 11 P1 REPRODUCED; `validate()` is static-only — descriptor parse + handler import + signature check, NO handler execution) + `SQLiteToolRegistryBackend(db_path, agent_scope, *, handlers_root=None)` (stdlib `sqlite3`, no optional extra; hybrid storage shape — SQLite stores metadata only (descriptor JSON + handler path + version + classification + scope + timestamps), handler **bodies** live on disk as `.py` files under `//.py` and load via the same `importlib.util.spec_from_file_location` path the filesystem reference uses; base64-exec'd-source design was rejected at the plan-subagent stage because it silently breaks closures + module-level imports + `session = requests.Session()` patterns; schema `tools(agent_scope, name, descriptor_json, handler_path, version, classification, created_at, updated_at, PRIMARY KEY (agent_scope, name))` — composite PK so two scopes can both have a tool named the same; `meta(key PK, value)` schema-version with idempotent `INSERT OR IGNORE` cold-start race fix; `PRAGMA busy_timeout=5000` BEFORE `PRAGMA journal_mode=WAL` resolves the multi-process WAL race REPRODUCED 3/5 pre-fix in PR 3 Step 11 — same shape as the pre-existing `test_log_sqlite_backend.py::test_concurrent_appends_from_threads` flake one-line follow-up queued in spec/22 §"Known gaps"; `threading.local` connection pool + WAL journal mode + `synchronous=NORMAL` for multi-process append safety on local filesystems; cross-scope isolation enforced via `WHERE agent_scope = ?` on every query; URL factory `make_sqlite_tool_registry_backend_from_url` honors `sqlite:///path?agent_scope=` and refuses non-sqlite scheme / netloc / fragments / duplicate query params / unknown query params — credential redaction across all 5 `ValueError` sites via `_redact_url` helper resolves the PR 3 Step 11 P1 REPRODUCED postgres-URL credential leak; `:memory:` mode is single-threaded test-only — `check_same_thread=True` + per-instance `tempfile.mkdtemp()` for `handlers_root` honoring the non-persistent promise; `handlers_root` refuses `<= 1`-component paths defending against root-write on misconfigured Linux). `install()` is TOCTOU-safe via **INSERT-first + atomic_write-on-success-only** ordering (PR 3 Step 11 REPRODUCED 50/50 pre-fix — original handler-atomic_write-first order caused concurrent installs to destroy the winner's handler file via the loser's rollback `unlink()`); losers see `rowcount=0` and raise `ToolAlreadyInstalled` WITHOUT touching disk. `install()` rejects non-callable handler at install time (PR 3 Step 11 testing CRITICAL — previously only `validate()` caught it; filesystem inherits the strengthened check). `install()` rejects non-None `version` when `supports_versioning=False` (plan-subagent Risk L — capability honesty). Operator override via `ATOMIC_AGENTS_TOOL_REGISTRY_BACKEND` + optional `ATOMIC_AGENTS_TOOL_REGISTRY_BACKEND_URL` env vars (when `=sqlite` without URL, defaults to `/.tools.db` with `agent_scope=` so single-host operators get a working SQLite default by flipping ONE env var) OR `AtomicAgent(..., tool_registry_backend=...)` / `OutcomeRunner(..., tool_registry_backend=...)` / `EvalRunner(..., tool_registry_backend=...)` / `DreamRunner(..., tool_registry_backend=...)` constructor kwargs (programmatic path — always wins; threads through to internal sub-agents — `delegate.py` deliberately does NOT thread because tool registry is per-agent scoped per spec/25 Decision 9, distinct from the fleet-scoped `profile_backend` which IS threaded). Backend tools register into `agent.tool_registry` AFTER operator-supplied `tools=ToolRegistry()` kwarg with `allow_overwrite=False` so collisions surface loudly as `ToolNameCollision`; **empty / missing `/tools/` yields zero registrations** — all 115 `AtomicAgent(...)` construction sites in the test suite see byte-identical pre-#64 behavior. `doctor.check_tool_registry_backend` validates operator-config coherence with PASS/WARN/FAIL ladder + capability snapshot + tool-count probe + URL-credential redaction. Implementer contract for registry-backed tool backends documented in spec/25 §"Implementer contract for registry-backed tool backends" (8 normative MUSTs covering path-traversal refusal at API boundary, cross-scope isolation at storage layer, atomicity on install via INSERT-first + atomic_write-on-success-only, two-tier descriptor round-trip — raw-text-preserving for filesystem-shape backends, lossy-parse-documented for structured-storage backends, idempotent schema init + busy_timeout before WAL pragma, capability honesty, trust-model framing for shared-catalog backends, connection / handler lifecycle). Protocol seam in place; two reference impls (filesystem + SQLite) shipped; 43 conformance test functions across both backends pin the contract. Future PyPI / git / company-internal-HTTP / SaaS-database adapters slot in via `register_tool_registry_backend(...)` without forking core — same agent definitions, same `agent.call()` flow, same audit trail, different tool catalog. -- **PolicyBackend Protocol** (#89, **locked at PR 4** with `tests/test_policy_protocol_conformance.py` parametrized across registered backends + `tests/test_policy_filesystem_backend.py` + `tests/test_policy_integration.py` + `tests/test_policy_cost_cap_consumption.py` + `tests/test_policy_noncap_log_only.py` + `tests/test_policy_noncap_integration.py`) — `FilesystemPolicyBackend()` reference impl: markdown + embedded YAML at `/policy.md` (per Premise 5 + plan-eng-review D4 — fleet-wide single source of truth, no per-agent `policy.md`); mtime+size composite cache key catches same-second edits on 1s-granularity filesystems via the size proxy; `cache_ttl_s=0` capability declaration — operators observe edits within 0 seconds of mtime change (the framework-side mtime stat is the staleness contract; SaaS / Postgres backends declare their real internal TTL); `agent_name` charset `[a-zA-Z0-9_.+@-]+` enforced at API boundary with path-traversal / control-char / newline / leading-dot refusal; side-effect-free construction (lazy parse on first method call so the 115 existing `AtomicAgent(...)` construction sites stay byte-identical when no `policy.md` exists). Only reference impl in v1; future Postgres / SaaS / org-admin-console adapters register via `register_policy_backend(...)` per /office-hours 2026-05-19 D2 (full Protocol seam from day 1). `PolicySnapshotForCall` frozen at `agent.call()` entry per Premise 3 — every consumption site reads the SAME snapshot for the duration of the call; operator edits to `policy.md` mid-call defer to the next `agent.call()`. Cost-cap MIN composition in `_check_cost_guardrails` per plan-eng-review D2 (`effective_daily = MIN(policy.daily, model_md.daily)`; `effective_monthly = MIN(...)`; per-call `cost_cap` ceiling bounds same-dimension cap arithmetic); `MandateCheck` steps 7-9 consume pre-composed effective caps so Policy and Mandate cost-cap checks share the same arithmetic (PR 3a — cost caps enforce immediately and ignore the env-var flag). Non-cap surfaces (tool allowlist, MCP server allowlist, model selection) consumed at the three matching call sites with `ATOMIC_AGENTS_POLICY_ENFORCE_NONCAP` env-var-gated enforcement — PR 3b shipped in log-only mode (flag default `false`); **PR 4 flipped the default to `true` so non-cap surfaces enforce by default; operators wanting log-only set `ATOMIC_AGENTS_POLICY_ENFORCE_NONCAP=false` explicitly**. Tool dispatch: blocked tool yields a synthesized `policy_blocked` `ToolCallResult` mirroring the judge_blocked shape so the LLM sees a refusal on the next turn. MCP discovery: denied servers filtered BEFORE `MCPClientPool` construction so the framework doesn't pay the subprocess startup cost. Model selection: Policy's `get_effective_model()` return replaces the pre-Policy effective model in enforce mode (the per-call `model_override=` kwarg if supplied, else the cost-cap fallback model if a fallback fired, else `model.md`'s default; model selection is NOT MIN math, it's selection-precedence). Per #274, when Policy supersedes a per-call kwarg the audit event carries `model_from_per_call_override` so the caller can detect it; when Policy agrees with the kwarg no emission fires. Unified `policy_decision` event family with `decision_kind: deny | override` discriminator + `axis: cost_cap | tool_allowlist | mcp_allowlist | model_selection` + `enforced: bool` so SaaS / Postgres adapters target a frozen schema (Premise 4 — one event family answers "was this Policy or Mandate?" via `denying_layer`; cost-cap denials with `cap_action ∈ {alert, fallback}` emit `enforced=False` so the audit log truthfully reflects whether money was actually spent — operators reading `LogQuery(primitive="policy_decision", enforced=True)` for billing-incident attribution see only actually-blocked events). `PolicyDecision.model_from_per_call_override` (#274) captures the `agent.call(model=...)` kwarg when Policy supersedes it so the caller can detect the silent override; fleet-config-wins precedence documented in `AtomicAgent.call()` docstring + spec/32 §"Composition math". Per-call dedup set bounds tool-allowlist denial emissions to one event per `(tool_name, call)` (#273 — log-only mode operators observed N events per denied tool per call because the LLM does not see refusals and re-attempts every iteration; in enforce mode the synthesized `policy_blocked` ToolCallResult naturally bounds re-attempts via LLM feedback, but the dedup keeps the audit shape uniform across both modes). `policy.md` parser handles fleet-default `cost_caps` / `tools.{allow,deny}` / `mcp_servers.{allow,deny}` / `model` fields at top level + nested `agents: { : { ... } }` per-agent overrides with field-level MERGE for caps + UNION+deny-wins for allowlists + REPLACE for model selection (plan-subagent F7); per-dimension MIN cap math (`daily` and `monthly` independently; cumulative deferred to v1.1 per plan-subagent D1). Cross-host cap-overrun bound `(replica_count) × (per-call ceiling)` documented in spec/32 §"Cross-host bound" for shared-FS deployments (Postgres / SaaS adapters with linearizable state get exact-cap semantics through their own consistency layer). Operator override via `ATOMIC_AGENTS_POLICY_BACKEND` env var OR `AtomicAgent(..., policy_backend=...)` / `OutcomeRunner(..., policy_backend=...)` / `EvalRunner(..., policy_backend=...)` / `DreamRunner(..., policy_backend=...)` constructor kwargs (programmatic path — always wins; threads through to internal sub-agents) + `delegate.py` threading per spec/32 D1 (Policy is fleet-scoped — a delegate inheriting the coordinator's pinned Postgres backend doesn't silently fall back to the filesystem default and bypass the operator's fleet cap; distinct from `mandate_backend` which is per-agent scoped and deliberately NOT threaded). `doctor.check_policy_backend` validates operator-config coherence with PASS/WARN/FAIL ladder + capability snapshot + URL credential redaction. Implementer contract for policy backends documented in spec/32 §"Implementer contract for policy backends" (7 normative MUSTs covering path-traversal refusal at API boundary, per-agent storage isolation, `cache_ttl_s`-bounded staleness, side-effect-free construction, capability honesty, URL credential redaction in factory `ValueError` sites, `PolicyDecision` event schema compliance). **Closes the cross-agent configuration cliff**: operators with a fleet of agents stop hand-syncing `model.md` / `tools.md` / `mcp.md` across N agents; a single project-root `policy.md` is the audit-trail source of truth, fleet-default + per-agent overrides compose with most-restrictive-wins semantics, and SaaS / Postgres / org-admin-console adapters are ONE Protocol implementation away from the framework's existing operator-config surface. Same agent definitions, same `agent.call()` flow, same audit trail, different fleet-config substrate. -- **MandateBackend Protocol** (#124, **locked at PR 4** with `tests/test_mandate_protocol_conformance.py` parametrized across registered backends + `tests/test_mandate_check.py` + `tests/test_mandate_reservations.py` + `tests/test_mandate_filesystem_backend.py` + `tests/test_mandate_integration.py`) — `FilesystemMandateBackend(scope_root)` reference impl: markdown + embedded YAML descriptors at `/mandates.md` (project scope) or `//mandates.md` (agent scope); state at `/.judge-state/mandates.json` via `_io.atomic_write`; refuses path-traversal in `mandate_id` at API boundary; source-hash recomputation on every `load_mandate`; derived-EXPIRED state computed at load time. Only reference impl in v1; future SaaS / mobile / Slack-bot adapters register via `register_mandate_backend(...)` per /office-hours 2026-05-17 Option 2 decision (build the seam upfront, don't retrofit later). `MandateCheck` judge specialist (~730 LOC) implements validation steps 1-9: existence + source-hash binding + state + tool allowlist + target allowlist via per-agent named `TargetExtractorRegistry` (7 built-in heuristic extractors pre-registered at agent construction; MCP tools prefix extracted target with `mcp::`) + time window + token-cost projection with stale-baseline defense (if most-recent matching event's `ts` is before current iteration's start, fall back to `expected_cost_per_call_usd` so stale-baseline drift doesn't compound across multi-iteration runs) + external-cost projection via `CostEstimatorRegistry` fail-closed to spec-stable `mandate_external_cost_unprojectable` BLOCK reason + escalation thresholds with ESCALATE-preempts-BLOCK precedence. Reservation pattern (`MandateReservationManager.create / commit / rollback / _expire` lifecycle with `threading.Timer`-driven TTL watchers + `threading.Lock`-serialized in-process state; `compute_outstanding(log_backend, scope, mandate_id)` four-clause definition — created AND NOT committed/rolled_back/expired/committed_on_recovery AND no cost event with matching `proposal_id` AND age < ttl_s — closes the cost-event-landed-without-_committed window; cost events for mandate-citing actions carry `mandate_id` + `proposal_id` so cumulative budget defense `_sum_prior_token_cost` matches against the right ledger). Crash recovery via `MandateBackend.recover_orphan_reservations(log_backend, scope, *, lock_backend=None)` with `LockBackend.acquire(scope='mandate-recovery:')` scan-inside-lock discipline (pessimistic over-report > silent under-bill — token orphans emit `mandate_reservation_committed_on_recovery`; external orphans emit BOTH `_committed_on_recovery` AND `mandate_reservation_external_unverified` so operators verify in Stripe / vendor via the `atomic-agents mandate reconcile --action {committed|rolled_back}` CLI). Post-action verification event family (`mandate_action_verified` / `mandate_action_diverged` / `mandate_action_verification_unavailable` emitted exactly once per `external_side_effect` / `irreversible` action after cost commit; operator-facing audit signal, NOT a refund mechanism in v1). Suspicious-rebind throttle (60s default; closes the source-hash-before-state edit window for prompt-injection-style threats; persisted on-disk in `MandateBackend.read_state` shape under `throttles` key — in-memory-only forbidden because crash-restart loop would defeat the prompt-injection defense). `mandates.md` parser + `judges.md ## Mandates` operator config with cascade-aware project floor (floor-wins where stricter for safety: longer throttle, "block" beats "escalate") + constraint enforceability discipline (mandates without enforceable constraints AND without `unconstrained: true` + non-empty justification are rejected at load time). Structural write protection: `mandates.md` excluded from default WritePolicy alongside `tools.md` / `judges.md` / `model.md` / `persona/IDENTITY.md` / `persona/SOUL.md` / `persona/USER.md` — even a malicious actor with a write-capable tool cannot grant itself authority; the WritePolicy is the authoritative protection, the `## Only operators grant mandates` discipline is the behavioral story. Operator override via `ATOMIC_AGENTS_MANDATE_BACKEND` env var OR `AtomicAgent(..., mandate_backend=...)` / `OutcomeRunner(..., mandate_backend=...)` / `EvalRunner(..., mandate_backend=...)` / `DreamRunner(..., mandate_backend=...)` constructor kwargs (programmatic path always wins; threads through to internal sub-agents; `delegate.py` deliberately NOT threaded — per-agent scoping per spec/29 + spec/15 delegate isolation). `doctor.check_mandate_backend` validates operator-config coherence. Implementer contract for mandate backends documented in spec/29 §"Implementer contract for mandate backends" (8 normative MUSTs covering path-traversal refusal at API boundary, per-scope isolation enforced at storage layer, state persistence via `read_state` / `write_state` Protocol methods (NOT filesystem-path contract), source-hash recomputation per load, lifecycle event emission via `LogBackend.append(record)`, reservation event discriminator shape, pessimistic crash recovery semantics, capability honesty). Operator CLI surface ships with the impl: `atomic-agents mandate list` / `show` / `usage` / `reconcile`. **Closes the durable-authorization cliff**: operators authoring `cumulative_external_usd: 6000` on a procurement mandate now have that cap defended against concurrent action races + crash-restart; post-hoc divergence audits surface when an action's executed target differed from authorization at proposal time; mandate revocation is operator-editable in `mandates.md` with immediate effect on the next agent run. Same agent definitions, same `agent.call()` flow, same audit trail — durable revocable scoped authority for actors that need to handle real money + real external side effects without re-authorization per turn. **The Mandate primitive is orthogonal to the v1.0 Protocol queue** (Corpus / MCPServerRegistry remain after PersonaBackend locked at #62 PR 4; Mandate primitive ships its OWN `MandateBackend` seam from day 1). -- **PersonaBackend Protocol** (#62, **locked at PR 4** with `tests/test_persona_protocol_conformance.py` parametrized across registered backends + `tests/test_persona_filesystem_backend.py` + `tests/test_persona_composition.py` + `tests/test_profile_composition_snapshot.py` + `tests/test_profile_composition_restore.py`) — `FilesystemPersonaBackend(personas_root)` reference impl: persona records at `/.personas//{IDENTITY,SOUL,USER}.md` + `metadata.json` sidecar (hidden namespace mirrors `.snapshots/` so `list_agents()` skips dot-prefixed entries and personas don't surface as agents). Only reference impl in v1; future Postgres / SaaS / git adapters register via `register_persona_backend(...)` per the established Protocol-pattern seam. `persona_id` charset `[a-zA-Z0-9_.+@-]+` enforced at API boundary with path-traversal / control-char / newline / leading-dot refusal. Side-effect-free construction (lazy walk on first method call so the 166 existing `AtomicAgent(...)` construction sites stay byte-identical when no `persona.link.md` exists). Group-atomic `save_persona`: `mkdir(exist_ok=False)` claims the persona dir exclusively before any file write for race-free fresh-create (`overwrite=False` losers raise `PersonaExists` WITHOUT touching disk); `overwrite=True` uses swap-and-delete via a sibling temp directory with a 20-iteration retry bound sized for 16-thread contention on macOS APFS `ENOTEMPTY` semantics; PR 1 Round 3 closed an orphan-backup leak via best-effort `shutil.rmtree(backup, ignore_errors=True)`. Snapshot trio (`snapshot` / `restore` / `list_snapshots`) flipped `supports_snapshot=False → True` in PR 3 with nested storage `//.snapshots//{IDENTITY,SOUL,USER}.md + metadata.json` (D-PP-10 — geometric cross-persona isolation: a snapshot record always resides under its parent persona's directory, so `rm -rf //` removes the persona AND its full history cleanly without an explicit `persona_id` cross-check on the snapshot record). `snap__<12hex>` snapshot ID format with 48-bit `secrets.token_hex(6)` random tail matches AgentProfile spec/24 Implementer Contract #8 (D-PP-11 — cross-Protocol uniformity enables a shared `_validate_snapshot_id` path-security guard; same-second collision probability at 4K snapshots/sec is ~6e-8). `_save_persona_group_atomic` merges backup `.snapshots/` entry-by-entry on `overwrite=True` so a concurrent `snapshot()` racing the persona-dir replace cannot destroy snapshot history (PR 3 Round 1 P1 adversarial — the original single-directory-rename approach lost the full snapshot history under contention). `list_snapshots` defense-in-depth symlink-escape guard via `entry.resolve().relative_to(snapshots_root.resolve())` (PR 3 Round 1 P2 adversarial — matches `restore()`'s confinement check). URL factory `make_filesystem_persona_backend_from_url("filesystem:///path")` handles `filesystem:///absolute/path` URLs and refuses non-filesystem schemes, netloc, fragments, duplicate / unknown query params, and relative paths; credentials redacted from all `ValueError` sites via `_redact_url`. **Composition with AgentProfileBackend (D1 + D3 + D6 + D-PP-13).** `/persona.link.md` is the ownership trigger (YAML in a code block with two scalar fields: `kind: shared` + `persona_id: customer-support-v3` per D-ER-4 — the colon-prefixed single-scalar `shared:customer-support-v3` was rejected at /plan-eng-review because the colon violates D4's `persona_id` charset). `AgentProfileBackend.external_persona_ref(agent_id) -> str | None` (D-PP-3 — supersedes D-ER-1's original boolean signature because the architecturally-right Optional[str] returns the persona_id the framework needs in one Protocol call) gives the bootstrap path the persona_id to look up without importing PersonaBackend. `AgentProfileBackend.load_profile()` repopulates persona fields via `persona_backend.load_persona(persona_id)` and re-derives `agent_mode` from the loaded persona text (D-PP-4 — `agent_mode` is derived from `persona_identity` and would otherwise be stale because the persona fields are empty at `load_profile` return time when externally owned). `save_profile()` ignores `profile.persona_identity / soul / user` when externally owned (D6 — mirrors spec/24 Decision 6's `agent_mode` ignore-on-save pattern; writes go through `persona_backend.save_persona()` only). `snapshot()` drops persona fields when externally owned (persona has its own snapshot history via PersonaBackend). `restore()` drops snapshot's persona fields when restoring a pre-PersonaBackend snapshot (carrying full persona text) into an agent that is NOW externally owned; the framework emits a one-time `agent_profile_restore_dropped_persona_fields` warning per `(agent_id, snapshot_id)` via thread-safe per-process dedup with `threading.Lock`-guarded check-and-add (D-PP-13 migration-window event; the lock-guarded check restores the "exactly once per `(agent_id, snapshot_id)` per process" promise after PR 3 Round 1 P2 adversarial caught the under-lock-or-CAS race). `/persona.link.md` AND `/persona/IDENTITY.md` both present raises `PersonaOwnershipConflict` at filesystem-backend `load_profile()` (D2a + D-PP-8 — filesystem-only loud refusal because two files on disk is a visible operator mistake the framework must surface; SQLite uses silent-drop with the equivalent `agent_profile_save_dropped_persona_fields` event for cross-backend uniformity). SQLite v1→v2 schema migration adds the `agents.persona_id` column via forward-only upgrade routine with explicit race-loser handling (catches `sqlite3.OperationalError "duplicate column name"` then re-reads `schema_version`; the original D1a wording's `INSERT OR IGNORE` pattern was the wrong shape — D-PP-2 corrected to UPDATE+ALTER per Python's `sqlite3` implicit-commit-before-DDL semantics). D-PP-1 sentinel sweep (`_is_agent_dir(agent_root)` predicate admits either `persona/IDENTITY.md` OR `persona.link.md`) updated at `load_profile`, `list_agents`, `exists`, AND extended to `list_skills` + `load_skill_body` in PR 3 (D-PP-12 — externally-owned agents now succeed at skill operations end-to-end; the two missed call sites were a shipped bug from PR 2). **Operator surface.** `atomic-agents persona list / show / snapshot --label "..." / list-snapshots / restore / clone` CLI exposes the full PersonaBackend lifecycle with zero LLM calls; catches `PersonaError` subclasses (including `PersonaNotFound`, `PersonaCorrupted`, `PersonaLinkInvalid`, `PersonaOwnershipConflict`, `PersonaSnapshotNotFound`) + `OSError` + `PermissionError` cleanly with `Error: ` on stderr + exit 1 (PR 3 Round 2 adversarial; previously bare `PersonaError` only). Default backend resolves to `FilesystemPersonaBackend(/.personas)`. Operator override via `ATOMIC_AGENTS_PERSONA_BACKEND` + optional `ATOMIC_AGENTS_PERSONA_BACKEND_URL` env vars OR `AtomicAgent(..., persona_backend=...)` / `OutcomeRunner(..., persona_backend=...)` / `EvalRunner(..., persona_backend=...)` / `DreamRunner(..., persona_backend=...)` constructor kwargs (programmatic path always wins; threads through to internal sub-agents). `delegate.py` threads `persona_backend` ONLY when the operator supplied it explicitly via the constructor kwarg (D-ER-2 — mirrors Policy's `_policy_backend_was_explicit` precedent at `agent.py:401`; default-resolved backends do not leak the coordinator's `personas_root` to delegates because persona is per-agent semantic context; distinct from fleet-scoped Policy + AgentProfile which always thread, matching the Mandate precedent that per-agent isolation is the right shape for delegate-relationship semantics). `doctor.check_persona_backend` validates operator-config coherence with PASS/WARN/FAIL ladder + capability snapshot + URL credential redaction. Implementer contract for persona backends documented in spec/33 §"Implementer contract for persona backends" (8 normative MUSTs covering `persona_id` charset validation at API boundary, side-effect-free construction, capability honesty, URL credential redaction in factory `ValueError` sites, group-atomic save with the 20-iteration retry bound + last-writer-wins semantics, snapshot id determinism + cross-persona isolation, `backend_id` property stability, and `snap__<12hex>` snapshot ID format with `metadata.json` schema). D5 retires spec/24's `TemplateProfileBackend` reservation entirely — `PersonaCapabilities.supports_templates` is the canonical home; a future persona-template marketplace (`pip install atomic-personas-starters` or a curated GitHub registry) is a v1.1+ distribution surface that the Protocol seam already accommodates without a forking change. **Closes the shared-persona cliff**: a team running 5 customer-support agents stops maintaining 5 separate `SOUL.md` files that drift; one canonical persona record (`shared:customer-support-v3`) serves all 5 regional agents with consistent identity, versioning, snapshot/restore lifecycle, and operator-editable markdown. Home users with one agent running the legacy `/persona/{IDENTITY,SOUL,USER}.md` layout see byte-identical pre-#62 behavior because the legacy layout works forever through AgentProfile's existing filesystem walk; PersonaBackend reads activate only when an operator explicitly creates a `persona.link.md` shared-reference. Same agent definitions, same `agent.call()` flow, same audit trail, different persona substrate. +**v1.0.0, stable, PUBLIC.** Core runtime stable. Test suite: run `uv run pytest --collect-only -q | tail -1` for the live count (last refresh: ~3,319-3,325 tests collected, 2026-06-04). Full CI runs against `uv sync --extra dev --extra openai --extra validation --extra redis`. **Twelve backend protocols shipped** — see `docs/protocols-shipped.md` for the per-protocol summary (reference impls, capabilities, operator overrides, doctor checks, Implementer Contracts, and what cliff each closes): + +| # | Protocol | Issue / Lock | Reference impls | +|---|----------|--------------|-----------------| +| 1 | MemoryBackend | #57 | Filesystem | +| 2 | LLMBackend | #87 | Anthropic + OpenAI + Moonshot | +| 3 | JudgeBackend | #112 PR 4 | PolicyJudge + LLMJudgeBackend | +| 4 | LockBackend | #60 PR 4 | Filesystem + Redis | +| 5 | LogBackend | #61 PR 4 | Filesystem + SQLite | +| 6 | AgentProfileBackend | #63 PR 4 | Filesystem + SQLite | +| 7 | ToolRegistryBackend | #64 PR 4 | Filesystem + SQLite | +| 8 | PolicyBackend | #89 PR 4 | Filesystem | +| 9 | MandateBackend | #124 PR 4 | Filesystem | +| 10 | PersonaBackend | #62 PR 4 | Filesystem | +| 11 | CorpusBackend | #65 PR 4 | Filesystem + SQLite (FTS5) | +| 12 | MCPServerRegistryBackend | #201 PR 5 | Filesystem + HTTP (tier-1/2/3) | MCP client support shipped (PRs #55 + #56). All twelve backend protocols shipped; v1.0.0 released 2026-06-04. Single-developer project; reference implementation that anyone can use, fork, or extend. diff --git a/docs/protocols-shipped.md b/docs/protocols-shipped.md new file mode 100644 index 0000000..c256868 --- /dev/null +++ b/docs/protocols-shipped.md @@ -0,0 +1,309 @@ +# Backend protocols shipped + +Twelve backend protocols are locked for v1.0. Each section captures the reference implementations shipped, the operator override surface, the doctor coherence check, the Implementer Contract location, and the architectural cliff the protocol closes. + +This file is the canonical reference for what the framework's storage seam looks like today. CLAUDE.md links here instead of inlining the detail so the session prompt stays under its char budget. + +For the Protocol-pattern template every backend follows, read `docs/spec/20-memory-backend.md` + PR #57. + +--- + +## MemoryBackend (#57) + +Filesystem reference impl + conformance suite. The Protocol-pattern template every later backend follows. + +--- + +## LLMBackend (#87) + +Anthropic + OpenAI + Moonshot reference impls, registered at framework import; conformance suite parametrizes across all three. + +--- + +## JudgeBackend (#112, locked at PR 4) + +`tests/test_judge_protocol_conformance.py` parametrizes across registered backends. PolicyJudge (rule engine) + LLMJudgeBackend reference impls; ESCALATE + REVISE state machines; `judges.md` operator config with cascade-aware project floor; operator-driven resolution flow (Approved / Denied / Redacted / Revised / Auto-decided); body-integrity check + O_EXCL sidecar de-dup + CAS-safe auto-decide. + +**PR 5a (unreleased):** `escalation.fallback_on_timeout` widens to per-class dict form; auto-decide resolves policy from PENDING frontmatter `action_class`. **PR 5b (unreleased):** strict JSON-Schema validation of amended `tool_arguments` via the opt-in `[validation]` extra (`validation: strict` in `judges.md`); default remains `weakened` (PR 3c behavior), so operators upgrading without flipping the field see no behavior change. Concludes the #112 arc-with-amendments. + +Dispatch opt-in via `judges.md` in the agent root or `AGENT_JUDGE_ENABLED=1` — existing deployments see no judge invocation by default. + +--- + +## LockBackend (#60, locked at PR 4) + +`tests/test_lock_protocol_conformance.py` parametrized across both backends. + +`FilesystemLockBackend` (POSIX `fcntl.flock` advisory; preserves the legacy `/.lock` on-disk artifact byte-for-byte) + `RedisLockBackend` (single-instance Redis advisory lock + atomic Lua release/renew + daemon heartbeat at TTL/3 + `LockLost` lease-expiry detection) reference impls. + +`scope(sub_path)` Protocol method lets operators pass ONE backend; framework re-scopes for dream + memory paths internally. + +Operator override via `ATOMIC_AGENTS_LOCK_BACKEND` + `ATOMIC_AGENTS_LOCK_BACKEND_URL` env vars (deployment path) OR `AtomicAgent(..., lock_backend=...)` constructor kwarg (programmatic path — always wins). `doctor.check_lock_backend` validates operator-config coherence with PASS/WARN/FAIL ladder + credential-redacted URL output. + +`_locks.AgentLock` preserved as a deprecation shim (sunset planned for v1.1; deferred from v1.0 per #201 PR 5 release decision). + +**Closes the multi-host cliff** that motivated the entire arc: atomic-agents now runs on Cloud Run / Kubernetes / gizmo without forking the framework. + +--- + +## LogBackend (#61, locked at PR 4) + +`tests/test_log_protocol_conformance.py` parametrized across both backends. + +`FilesystemLogBackend` (JSONL-on-disk; preserves the legacy `/log/YYYY-MM/YYYY-MM-DD.jsonl` artifact byte-for-byte via `_io.atomic_append_jsonl`) + `SQLiteLogBackend` (stdlib `sqlite3`, no optional extra; six indexes covering dashboard + cost-guardrail query patterns; WAL journal mode + per-thread connections for multi-process append safety on local filesystems; aggregation pushdown via SQL `GROUP BY` for canonical columns + SQLite JSON1 `json_extract` for primitive-specific `extra`-field group_bys with alphanumeric-identifier SQL injection guard; index-driven `delete_older_than`; schema version tracking with idempotent `INSERT OR IGNORE` cold-start init for multi-replica deployments). + +Operator override via `ATOMIC_AGENTS_LOG_BACKEND` + optional `ATOMIC_AGENTS_LOG_BACKEND_URL` env vars OR `AtomicAgent(..., log_backend=...)` / `OutcomeRunner(..., log_backend=...)` / `DreamRunner(..., log_backend=...)` constructor kwargs (programmatic path — always wins; threads through to internal sub-agents). + +`LogQuery.agent_name` filter (added in PR 3 review-pass per Step 11 P0 #1) for shared-backend cross-agent isolation with lenient match for legacy records (records without `agent_name` match any filter — filesystem per-agent-dir scoping is the natural isolation primitive). + +`doctor.check_log_backend` validates operator-config coherence with PASS/WARN/FAIL ladder + stats probe (records_today / records_this_month) + URL-credential redaction. + +Implementer contract for queryable backends documented in `docs/spec/22-log-backend.md` §"Implementer contract for queryable backends" — future Postgres / Datadog / Loki / Cloud Logging adapters mirror the SQLite reference's shape. + +**Closes the dashboard-perf cliff** + remote-shipping requirement: operators on Cloud Run / Kubernetes with N replicas can pin SQLite for O(log N) indexed queries + indexed retention; the same Protocol seam admits future Datadog / Loki / Postgres-with-pgvector backends without forking the framework. + +--- + +## AgentProfileBackend (#63, locked at PR 4) + +`tests/test_profile_protocol_conformance.py` parametrized across both backends — 46 tests × 2 backends = ~92 invocations. + +`FilesystemAgentProfileBackend` walks `/persona/IDENTITY.md|SOUL.md|USER.md` + `/{model,tools,judges,roster,mcp,goal}.md` + `/skills//SKILL.md` via the existing parsers; preserves byte-for-byte on-disk artifacts via `_io.atomic_write`; cascade-aware via `_cascade.detect_cascade`; JSON-based snapshot trio at `/.snapshots///{profile,metadata}.json` with `_validate_snapshot_id` path-traversal refusal + `relative_to(snapshots_root)` path-scope check + `metadata.agent_id` cross-check. + +`SQLiteAgentProfileBackend` (stdlib `sqlite3`, no optional extra; JSON blob + indexed scalars approach — `agents(name PK, agent_mode indexed, profile_json, updated_at)` + `profile_snapshots(snapshot_id PK, agent_id+created_at composite indexed, label, profile_json)` + `meta(key PK, value)` with schema_version tracking via idempotent `INSERT OR IGNORE` cold-start init; `threading.local` connection pool + WAL journal mode + `synchronous=NORMAL` for multi-process append safety on local filesystems; cross-agent snapshot isolation enforced via `WHERE snapshot_id = ? AND agent_id = ?` AND-clause). + +`supports_skills` capability dimension — filesystem=True (walks skill dirs), SQLite=False (skills stay filesystem-only in v1; future `save_skill` Protocol method lands when SaaS UI editing requires DB-backed skill bodies). + +48-bit snapshot id random tail (Step 11 adversarial F-8) makes same-second collision at 4K snapshots/sec ~6e-8. + +Operator override via `ATOMIC_AGENTS_PROFILE_BACKEND` + optional `ATOMIC_AGENTS_PROFILE_BACKEND_URL` env vars (when `=sqlite` without URL, defaults to `/.profile.db` so single-host operators get a working SQLite default by flipping ONE env var) OR `AtomicAgent(..., profile_backend=...)` / `OutcomeRunner(..., profile_backend=...)` / `EvalRunner(..., profile_backend=...)` / `DreamRunner(..., profile_backend=...)` constructor kwargs (programmatic path — always wins; threads through to internal sub-agents and `delegate.py`). + +`AgentProfile` carries typed shadow + raw text for every config file (spec/24 Decision 1 — `mcp_md_raw` preserves `$VAR` env refs verbatim so save paths never bake resolved secrets into on-disk state). `save_profile` re-derives `agent_mode` from `persona_identity` on every write (spec/24 Decision 6 — single source of truth). + +`doctor.check_agent_profile_backend` validates operator-config coherence with PASS/WARN/FAIL ladder + capability snapshot (incl. `supports_skills` disclosure) + agent-count probe + URL-credential redaction. + +Implementer contract for registry-backed backends documented in `docs/spec/24-agent-profile-backend.md` §"Implementer contract for registry-backed backends" (8 normative MUSTs covering path-traversal refusal at API boundary, cross-agent snapshot isolation at storage layer, agent_mode re-derivation discipline, raw-text round-trip preservation, idempotent schema init across processes, snapshot id entropy budget, thread-life-tied connection management, supports_skills capability honesty) — future Postgres / git / SaaS-database adapters mirror the SQLite + filesystem references' shapes. + +**Closes the SaaS-shape cliff**: SaaS / database-backed / git-backed agent registries are now ONE Protocol implementation away from the framework's existing operator-config surface. Same agent definitions, same `agent.call()` flow, same audit trail — different substrate. + +--- + +## ToolRegistryBackend (#64, locked at PR 4) + +`tests/test_tool_registry_protocol_conformance.py` parametrized across both backends — 43 conformance test functions running on filesystem + SQLite, 18 skips on capability gates. + +`FilesystemToolRegistryBackend(agent_root)` walks `/tools/.md` for descriptors + `/tools/.py` for handler modules via `importlib.util.spec_from_file_location`; refuses path-traversal in `name` at API boundary; refuses control characters; 256 KB descriptor size cap defending against YAML alias-bomb DoS (PR 1 Step 11 REPRODUCED at 33 GB RSS pre-fix); treats `chmod-000 tools/` as empty rather than `PermissionError`-crashing every agent construction (PR 2 Step 11 P1 REPRODUCED); `validate()` is static-only — descriptor parse + handler import + signature check, NO handler execution. + +`SQLiteToolRegistryBackend(db_path, agent_scope, *, handlers_root=None)` (stdlib `sqlite3`, no optional extra; hybrid storage shape — SQLite stores metadata only (descriptor JSON + handler path + version + classification + scope + timestamps), handler **bodies** live on disk as `.py` files under `//.py` and load via the same `importlib.util.spec_from_file_location` path the filesystem reference uses; base64-exec'd-source design was rejected at the plan-subagent stage because it silently breaks closures + module-level imports + `session = requests.Session()` patterns; schema `tools(agent_scope, name, descriptor_json, handler_path, version, classification, created_at, updated_at, PRIMARY KEY (agent_scope, name))` — composite PK so two scopes can both have a tool named the same; `meta(key PK, value)` schema-version with idempotent `INSERT OR IGNORE` cold-start race fix; `PRAGMA busy_timeout=5000` BEFORE `PRAGMA journal_mode=WAL` resolves the multi-process WAL race REPRODUCED 3/5 pre-fix in PR 3 Step 11; `threading.local` connection pool + WAL journal mode + `synchronous=NORMAL` for multi-process append safety on local filesystems; cross-scope isolation enforced via `WHERE agent_scope = ?` on every query; URL factory `make_sqlite_tool_registry_backend_from_url` honors `sqlite:///path?agent_scope=` and refuses non-sqlite scheme / netloc / fragments / duplicate query params / unknown query params — credential redaction across all 5 `ValueError` sites via `_redact_url` helper resolves the PR 3 Step 11 P1 REPRODUCED postgres-URL credential leak; `:memory:` mode is single-threaded test-only — `check_same_thread=True` + per-instance `tempfile.mkdtemp()` for `handlers_root` honoring the non-persistent promise; `handlers_root` refuses `<= 1`-component paths defending against root-write on misconfigured Linux). + +`install()` is TOCTOU-safe via **INSERT-first + atomic_write-on-success-only** ordering (PR 3 Step 11 REPRODUCED 50/50 pre-fix — original handler-atomic_write-first order caused concurrent installs to destroy the winner's handler file via the loser's rollback `unlink()`); losers see `rowcount=0` and raise `ToolAlreadyInstalled` WITHOUT touching disk. `install()` rejects non-callable handler at install time. `install()` rejects non-None `version` when `supports_versioning=False` (capability honesty). + +Operator override via `ATOMIC_AGENTS_TOOL_REGISTRY_BACKEND` + optional `ATOMIC_AGENTS_TOOL_REGISTRY_BACKEND_URL` env vars (when `=sqlite` without URL, defaults to `/.tools.db` with `agent_scope=` so single-host operators get a working SQLite default by flipping ONE env var) OR `AtomicAgent(..., tool_registry_backend=...)` / per-runner kwargs on OutcomeRunner/EvalRunner/DreamRunner (programmatic path — always wins; threads through to internal sub-agents — `delegate.py` deliberately does NOT thread because tool registry is per-agent scoped per spec/25 Decision 9, distinct from the fleet-scoped `profile_backend` which IS threaded). + +Backend tools register into `agent.tool_registry` AFTER operator-supplied `tools=ToolRegistry()` kwarg with `allow_overwrite=False` so collisions surface loudly as `ToolNameCollision`; **empty / missing `/tools/` yields zero registrations** — all 115 `AtomicAgent(...)` construction sites in the test suite see byte-identical pre-#64 behavior. + +`doctor.check_tool_registry_backend` validates operator-config coherence with PASS/WARN/FAIL ladder + capability snapshot + tool-count probe + URL-credential redaction. + +Implementer contract for registry-backed tool backends documented in `docs/spec/25-tool-registry-backend.md` §"Implementer contract for registry-backed tool backends" (8 normative MUSTs covering path-traversal refusal at API boundary, cross-scope isolation at storage layer, atomicity on install via INSERT-first + atomic_write-on-success-only, two-tier descriptor round-trip — raw-text-preserving for filesystem-shape backends, lossy-parse-documented for structured-storage backends, idempotent schema init + busy_timeout before WAL pragma, capability honesty, trust-model framing for shared-catalog backends, connection / handler lifecycle). + +Protocol seam in place; two reference impls (filesystem + SQLite) shipped; 43 conformance test functions across both backends pin the contract. Future PyPI / git / company-internal-HTTP / SaaS-database adapters slot in via `register_tool_registry_backend(...)` without forking core — same agent definitions, same `agent.call()` flow, same audit trail, different tool catalog. + +--- + +## PolicyBackend (#89, locked at PR 4) + +`tests/test_policy_protocol_conformance.py` parametrized across registered backends + `tests/test_policy_filesystem_backend.py` + `tests/test_policy_integration.py` + `tests/test_policy_cost_cap_consumption.py` + `tests/test_policy_noncap_log_only.py` + `tests/test_policy_noncap_integration.py`. + +`FilesystemPolicyBackend()` reference impl: markdown + embedded YAML at `/policy.md` (per Premise 5 + plan-eng-review D4 — fleet-wide single source of truth, no per-agent `policy.md`); mtime+size composite cache key catches same-second edits on 1s-granularity filesystems via the size proxy; `cache_ttl_s=0` capability declaration — operators observe edits within 0 seconds of mtime change (the framework-side mtime stat is the staleness contract; SaaS / Postgres backends declare their real internal TTL); `agent_name` charset `[a-zA-Z0-9_.+@-]+` enforced at API boundary with path-traversal / control-char / newline / leading-dot refusal; side-effect-free construction (lazy parse on first method call so the 115 existing `AtomicAgent(...)` construction sites stay byte-identical when no `policy.md` exists). + +Only reference impl in v1; future Postgres / SaaS / org-admin-console adapters register via `register_policy_backend(...)` per /office-hours 2026-05-19 D2 (full Protocol seam from day 1). + +`PolicySnapshotForCall` frozen at `agent.call()` entry per Premise 3 — every consumption site reads the SAME snapshot for the duration of the call; operator edits to `policy.md` mid-call defer to the next `agent.call()`. + +Cost-cap MIN composition in `_check_cost_guardrails` (`effective_daily = MIN(policy.daily, model_md.daily)`; `effective_monthly = MIN(...)`; per-call `cost_cap` ceiling bounds same-dimension cap arithmetic); `MandateCheck` steps 7-9 consume pre-composed effective caps so Policy and Mandate cost-cap checks share the same arithmetic (PR 3a — cost caps enforce immediately and ignore the env-var flag). + +Non-cap surfaces (tool allowlist, MCP server allowlist, model selection) consumed at the three matching call sites with `ATOMIC_AGENTS_POLICY_ENFORCE_NONCAP` env-var-gated enforcement — PR 3b shipped in log-only mode (flag default `false`); **PR 4 flipped the default to `true` so non-cap surfaces enforce by default; operators wanting log-only set `ATOMIC_AGENTS_POLICY_ENFORCE_NONCAP=false` explicitly**. + +Tool dispatch: blocked tool yields a synthesized `policy_blocked` `ToolCallResult` mirroring the judge_blocked shape so the LLM sees a refusal on the next turn. MCP discovery: denied servers filtered BEFORE `MCPClientPool` construction so the framework doesn't pay the subprocess startup cost. Model selection: Policy's `get_effective_model()` return replaces the pre-Policy effective model in enforce mode. + +Unified `policy_decision` event family with `decision_kind: deny | override` discriminator + `axis: cost_cap | tool_allowlist | mcp_allowlist | model_selection` + `enforced: bool` so SaaS / Postgres adapters target a frozen schema (Premise 4 — one event family answers "was this Policy or Mandate?" via `denying_layer`; cost-cap denials with `cap_action ∈ {alert, fallback}` emit `enforced=False`). + +`PolicyDecision.model_from_per_call_override` (#274) captures the `agent.call(model=...)` kwarg when Policy supersedes it so the caller can detect the silent override; fleet-config-wins precedence documented in `AtomicAgent.call()` docstring + spec/32 §"Composition math". Per-call dedup set bounds tool-allowlist denial emissions to one event per `(tool_name, call)` (#273). + +`policy.md` parser handles fleet-default `cost_caps` / `tools.{allow,deny}` / `mcp_servers.{allow,deny}` / `model` fields at top level + nested `agents: { : { ... } }` per-agent overrides with field-level MERGE for caps + UNION+deny-wins for allowlists + REPLACE for model selection; per-dimension MIN cap math (`daily` and `monthly` independently; cumulative deferred to v1.1). + +Cross-host cap-overrun bound `(replica_count) × (per-call ceiling)` documented in `docs/spec/32-policy-backend.md` §"Cross-host bound" for shared-FS deployments (Postgres / SaaS adapters with linearizable state get exact-cap semantics through their own consistency layer). + +Operator override via `ATOMIC_AGENTS_POLICY_BACKEND` env var OR `AtomicAgent(..., policy_backend=...)` / per-runner kwargs (programmatic path — always wins; threads through to internal sub-agents) + `delegate.py` threading per spec/32 D1 (Policy is fleet-scoped — a delegate inheriting the coordinator's pinned Postgres backend doesn't silently fall back to the filesystem default and bypass the operator's fleet cap; distinct from `mandate_backend` which is per-agent scoped and deliberately NOT threaded). + +`doctor.check_policy_backend` validates operator-config coherence with PASS/WARN/FAIL ladder + capability snapshot + URL credential redaction. + +Implementer contract for policy backends documented in `docs/spec/32-policy-backend.md` §"Implementer contract for policy backends" (7 normative MUSTs covering `agent_name` validation at API boundary, per-agent storage isolation, `cache_ttl_s`-bounded staleness, side-effect-free construction, capability honesty, URL credential redaction in factory `ValueError` sites, `PolicyDecision` event schema compliance). + +**Closes the cross-agent configuration cliff**: operators with a fleet of agents stop hand-syncing `model.md` / `tools.md` / `mcp.md` across N agents; a single project-root `policy.md` is the audit-trail source of truth, fleet-default + per-agent overrides compose with most-restrictive-wins semantics, and SaaS / Postgres / org-admin-console adapters are ONE Protocol implementation away. + +--- + +## MandateBackend (#124, locked at PR 4) + +`tests/test_mandate_protocol_conformance.py` parametrized across registered backends + `tests/test_mandate_check.py` + `tests/test_mandate_reservations.py` + `tests/test_mandate_filesystem_backend.py` + `tests/test_mandate_integration.py`. + +`FilesystemMandateBackend(scope_root)` reference impl: markdown + embedded YAML descriptors at `/mandates.md` (project scope) or `//mandates.md` (agent scope); state at `/.judge-state/mandates.json` via `_io.atomic_write`; refuses path-traversal in `mandate_id` at API boundary; source-hash recomputation on every `load_mandate`; derived-EXPIRED state computed at load time. + +Only reference impl in v1; future SaaS / mobile / Slack-bot adapters register via `register_mandate_backend(...)` per /office-hours 2026-05-17 Option 2 decision (build the seam upfront, don't retrofit later). + +`MandateCheck` judge specialist (~730 LOC) implements validation steps 1-9: existence + source-hash binding + state + tool allowlist + target allowlist via per-agent named `TargetExtractorRegistry` (7 built-in heuristic extractors pre-registered at agent construction; MCP tools prefix extracted target with `mcp::`) + time window + token-cost projection with stale-baseline defense (if most-recent matching event's `ts` is before current iteration's start, fall back to `expected_cost_per_call_usd` so stale-baseline drift doesn't compound across multi-iteration runs) + external-cost projection via `CostEstimatorRegistry` fail-closed to spec-stable `mandate_external_cost_unprojectable` BLOCK reason + escalation thresholds with ESCALATE-preempts-BLOCK precedence. + +Reservation pattern (`MandateReservationManager.create / commit / rollback / _expire` lifecycle with `threading.Timer`-driven TTL watchers + `threading.Lock`-serialized in-process state; `compute_outstanding(log_backend, scope, mandate_id)` four-clause definition — created AND NOT committed/rolled_back/expired/committed_on_recovery AND no cost event with matching `proposal_id` AND age < ttl_s — closes the cost-event-landed-without-_committed window; cost events for mandate-citing actions carry `mandate_id` + `proposal_id` so cumulative budget defense `_sum_prior_token_cost` matches against the right ledger). + +Crash recovery via `MandateBackend.recover_orphan_reservations(log_backend, scope, *, lock_backend=None)` with `LockBackend.acquire(scope='mandate-recovery:')` scan-inside-lock discipline (pessimistic over-report > silent under-bill — token orphans emit `mandate_reservation_committed_on_recovery`; external orphans emit BOTH `_committed_on_recovery` AND `mandate_reservation_external_unverified` so operators verify in Stripe / vendor via the `atomic-agents mandate reconcile --action {committed|rolled_back}` CLI). + +Post-action verification event family (`mandate_action_verified` / `mandate_action_diverged` / `mandate_action_verification_unavailable` emitted exactly once per `external_side_effect` / `irreversible` action after cost commit; operator-facing audit signal, NOT a refund mechanism in v1). + +Suspicious-rebind throttle (60s default; closes the source-hash-before-state edit window for prompt-injection-style threats; persisted on-disk in `MandateBackend.read_state` shape under `throttles` key — in-memory-only forbidden because crash-restart loop would defeat the prompt-injection defense). + +`mandates.md` parser + `judges.md ## Mandates` operator config with cascade-aware project floor (floor-wins where stricter for safety: longer throttle, "block" beats "escalate") + constraint enforceability discipline (mandates without enforceable constraints AND without `unconstrained: true` + non-empty justification are rejected at load time). + +Structural write protection: `mandates.md` excluded from default WritePolicy alongside `tools.md` / `judges.md` / `model.md` / `persona/IDENTITY.md` / `persona/SOUL.md` / `persona/USER.md` — even a malicious actor with a write-capable tool cannot grant itself authority; the WritePolicy is the authoritative protection, the `## Only operators grant mandates` discipline is the behavioral story. + +Operator override via `ATOMIC_AGENTS_MANDATE_BACKEND` env var OR `AtomicAgent(..., mandate_backend=...)` / per-runner kwargs on OutcomeRunner/EvalRunner/DreamRunner (programmatic path always wins; threads through to internal sub-agents; `delegate.py` deliberately NOT threaded — per-agent scoping per spec/29 + spec/15 delegate isolation). + +`doctor.check_mandate_backend` validates operator-config coherence. + +Implementer contract for mandate backends documented in `docs/spec/29-mandate-backend.md` §"Implementer contract for mandate backends" (8 normative MUSTs covering path-traversal refusal at API boundary, per-scope isolation enforced at storage layer, state persistence via `read_state` / `write_state` Protocol methods (NOT filesystem-path contract), source-hash recomputation per load, lifecycle event emission via `LogBackend.append(record)`, reservation event discriminator shape, pessimistic crash recovery semantics, capability honesty). + +Operator CLI surface ships with the impl: `atomic-agents mandate list` / `show` / `usage` / `reconcile`. + +**Closes the durable-authorization cliff**: operators authoring `cumulative_external_usd: 6000` on a procurement mandate now have that cap defended against concurrent action races + crash-restart; post-hoc divergence audits surface when an action's executed target differed from authorization at proposal time; mandate revocation is operator-editable in `mandates.md` with immediate effect on the next agent run. + +The Mandate primitive is orthogonal to the v1.0 Protocol queue (Corpus / MCPServerRegistry remained after PersonaBackend locked at #62 PR 4; Mandate primitive ships its OWN `MandateBackend` seam from day 1). + +--- + +## PersonaBackend (#62, locked at PR 4) + +`tests/test_persona_protocol_conformance.py` parametrized across registered backends + `tests/test_persona_filesystem_backend.py` + `tests/test_persona_composition.py` + `tests/test_profile_composition_snapshot.py` + `tests/test_profile_composition_restore.py`. + +`FilesystemPersonaBackend(personas_root)` reference impl: persona records at `/.personas//{IDENTITY,SOUL,USER}.md` + `metadata.json` sidecar (hidden namespace mirrors `.snapshots/` so `list_agents()` skips dot-prefixed entries and personas don't surface as agents). + +Only reference impl in v1; future Postgres / SaaS / git adapters register via `register_persona_backend(...)` per the established Protocol-pattern seam. + +`persona_id` charset `[a-zA-Z0-9_.+@-]+` enforced at API boundary with path-traversal / control-char / newline / leading-dot refusal. Side-effect-free construction (lazy walk on first method call so the 166 existing `AtomicAgent(...)` construction sites stay byte-identical when no `persona.link.md` exists). + +Group-atomic `save_persona`: `mkdir(exist_ok=False)` claims the persona dir exclusively before any file write for race-free fresh-create (`overwrite=False` losers raise `PersonaExists` WITHOUT touching disk); `overwrite=True` uses swap-and-delete via a sibling temp directory with a 20-iteration retry bound sized for 16-thread contention on macOS APFS `ENOTEMPTY` semantics; PR 1 Round 3 closed an orphan-backup leak via best-effort `shutil.rmtree(backup, ignore_errors=True)`. + +Snapshot trio (`snapshot` / `restore` / `list_snapshots`) flipped `supports_snapshot=False → True` in PR 3 with nested storage `//.snapshots//{IDENTITY,SOUL,USER}.md + metadata.json` (D-PP-10 — geometric cross-persona isolation: a snapshot record always resides under its parent persona's directory, so `rm -rf //` removes the persona AND its full history cleanly without an explicit `persona_id` cross-check on the snapshot record). + +`snap__<12hex>` snapshot ID format with 48-bit `secrets.token_hex(6)` random tail matches AgentProfile spec/24 Implementer Contract #8 (D-PP-11 — cross-Protocol uniformity enables a shared `_validate_snapshot_id` path-security guard; same-second collision probability at 4K snapshots/sec is ~6e-8). + +`_save_persona_group_atomic` merges backup `.snapshots/` entry-by-entry on `overwrite=True` so a concurrent `snapshot()` racing the persona-dir replace cannot destroy snapshot history (PR 3 Round 1 P1 adversarial — the original single-directory-rename approach lost the full snapshot history under contention). `list_snapshots` defense-in-depth symlink-escape guard via `entry.resolve().relative_to(snapshots_root.resolve())` (PR 3 Round 1 P2 adversarial — matches `restore()`'s confinement check). + +URL factory `make_filesystem_persona_backend_from_url("filesystem:///path")` handles `filesystem:///absolute/path` URLs and refuses non-filesystem schemes, netloc, fragments, duplicate / unknown query params, and relative paths; credentials redacted from all `ValueError` sites via `_redact_url`. + +### Composition with AgentProfileBackend (D1 + D3 + D6 + D-PP-13) + +`/persona.link.md` is the ownership trigger (YAML in a code block with two scalar fields: `kind: shared` + `persona_id: customer-support-v3` per D-ER-4 — the colon-prefixed single-scalar `shared:customer-support-v3` was rejected at /plan-eng-review because the colon violates D4's `persona_id` charset). + +`AgentProfileBackend.external_persona_ref(agent_id) -> str | None` (D-PP-3 — supersedes D-ER-1's original boolean signature because the architecturally-right Optional[str] returns the persona_id the framework needs in one Protocol call) gives the bootstrap path the persona_id to look up without importing PersonaBackend. + +`AgentProfileBackend.load_profile()` repopulates persona fields via `persona_backend.load_persona(persona_id)` and re-derives `agent_mode` from the loaded persona text (D-PP-4 — `agent_mode` is derived from `persona_identity` and would otherwise be stale because the persona fields are empty at `load_profile` return time when externally owned). + +`save_profile()` ignores `profile.persona_identity / soul / user` when externally owned (D6 — mirrors spec/24 Decision 6's `agent_mode` ignore-on-save pattern; writes go through `persona_backend.save_persona()` only). `snapshot()` drops persona fields when externally owned (persona has its own snapshot history via PersonaBackend). + +`restore()` drops snapshot's persona fields when restoring a pre-PersonaBackend snapshot (carrying full persona text) into an agent that is NOW externally owned; the framework emits a one-time `agent_profile_restore_dropped_persona_fields` warning per `(agent_id, snapshot_id)` via thread-safe per-process dedup with `threading.Lock`-guarded check-and-add (D-PP-13 migration-window event). + +`/persona.link.md` AND `/persona/IDENTITY.md` both present raises `PersonaOwnershipConflict` at filesystem-backend `load_profile()` (D2a + D-PP-8 — filesystem-only loud refusal because two files on disk is a visible operator mistake the framework must surface; SQLite uses silent-drop with the equivalent `agent_profile_save_dropped_persona_fields` event for cross-backend uniformity). + +SQLite v1→v2 schema migration adds the `agents.persona_id` column via forward-only upgrade routine with explicit race-loser handling (catches `sqlite3.OperationalError "duplicate column name"` then re-reads `schema_version`). + +D-PP-1 sentinel sweep (`_is_agent_dir(agent_root)` predicate admits either `persona/IDENTITY.md` OR `persona.link.md`) updated at `load_profile`, `list_agents`, `exists`, AND extended to `list_skills` + `load_skill_body` in PR 3 (D-PP-12 — externally-owned agents now succeed at skill operations end-to-end). + +### Operator surface + +`atomic-agents persona list / show / snapshot --label "..." / list-snapshots / restore / clone` CLI exposes the full PersonaBackend lifecycle with zero LLM calls; catches `PersonaError` subclasses (including `PersonaNotFound`, `PersonaCorrupted`, `PersonaLinkInvalid`, `PersonaOwnershipConflict`, `PersonaSnapshotNotFound`) + `OSError` + `PermissionError` cleanly with `Error: ` on stderr + exit 1. + +Default backend resolves to `FilesystemPersonaBackend(/.personas)`. Operator override via `ATOMIC_AGENTS_PERSONA_BACKEND` + optional `ATOMIC_AGENTS_PERSONA_BACKEND_URL` env vars OR `AtomicAgent(..., persona_backend=...)` / per-runner kwargs on OutcomeRunner/EvalRunner/DreamRunner (programmatic path always wins; threads through to internal sub-agents). + +`delegate.py` threads `persona_backend` ONLY when the operator supplied it explicitly via the constructor kwarg (D-ER-2 — mirrors Policy's `_policy_backend_was_explicit` precedent at `agent.py:401`; default-resolved backends do not leak the coordinator's `personas_root` to delegates because persona is per-agent semantic context). + +`doctor.check_persona_backend` validates operator-config coherence with PASS/WARN/FAIL ladder + capability snapshot + URL credential redaction. + +Implementer contract for persona backends documented in `docs/spec/33-persona-backend.md` §"Implementer contract for persona backends" (8 normative MUSTs covering `persona_id` charset validation at API boundary, side-effect-free construction, capability honesty, URL credential redaction in factory `ValueError` sites, group-atomic save with the 20-iteration retry bound + last-writer-wins semantics, snapshot id determinism + cross-persona isolation, `backend_id` property stability, and `snap__<12hex>` snapshot ID format with `metadata.json` schema). + +D5 retires spec/24's `TemplateProfileBackend` reservation entirely — `PersonaCapabilities.supports_templates` is the canonical home; a future persona-template marketplace (`pip install atomic-personas-starters` or a curated GitHub registry) is a v1.1+ distribution surface that the Protocol seam already accommodates without a forking change. + +**Closes the shared-persona cliff**: a team running 5 customer-support agents stops maintaining 5 separate `SOUL.md` files that drift; one canonical persona record (`shared:customer-support-v3`) serves all 5 regional agents with consistent identity, versioning, snapshot/restore lifecycle, and operator-editable markdown. Home users with one agent running the legacy `/persona/{IDENTITY,SOUL,USER}.md` layout see byte-identical pre-#62 behavior because the legacy layout works forever through AgentProfile's existing filesystem walk; PersonaBackend reads activate only when an operator explicitly creates a `persona.link.md` shared-reference. + +--- + +## CorpusBackend (#65, locked at PR 4) + +`tests/test_corpus_protocol_conformance.py` parametrized across registered backends + `tests/test_corpus_filesystem_backend.py` + `tests/test_corpus_sqlite_backend.py` + `tests/test_corpus_registry.py` + `tests/test_corpus_composition.py` + `tests/test_corpus_wiring.py` + `tests/test_corpus_migration_regression.py` + `tests/test_corpus_doctor.py`. + +`FilesystemCorpusBackend(agent_root)` reference impl reading `/wiki/` (distilled knowledge per the Karpathy style) + `/raw/` (operator-ingested source documents) with per-page `_io.atomic_write` safety + `render_index_summary(corpus)` Protocol method that returns the routing INDEX the agent loads at step [7] of the canonical load order per spec/04. + +`SQLiteCorpusBackend` with FTS5 (stdlib `sqlite3`, no optional extra; hybrid storage shape with metadata in SQL + bodies on disk matching ToolRegistryBackend precedent; WAL journal mode + `PRAGMA busy_timeout=5000` before WAL pragma mirroring the multi-process race fix from #64; FTS5 virtual table for O(log N) indexed full-text query on page bodies + frontmatter titles; cross-agent isolation enforced at the SQL layer via `WHERE agent_scope = ? AND corpus = ?` double discriminator; `BEGIN IMMEDIATE` transaction discipline wrapping the read-validate-UPSERT-FTS sequence in `write_page`; INSERT-first + atomic_write-on-success-only atomicity for hybrid storage half-failure recovery; idempotent `INSERT OR IGNORE` cold-start schema init for multi-replica deployments). + +Page name charset `[a-zA-Z0-9_.+@-]+` enforced at API boundary with path-traversal / control-char / leading-dot refusal. Side-effect-free construction (empty or missing `wiki/` + `raw/` yields zero registrations so all 166 existing `AtomicAgent(...)` construction sites stay byte-identical when no corpus is configured; IRON RULE byte-identity regression suite at `tests/test_corpus_migration_regression.py` pins the contract across 5 explicit assertions covering the wiki INDEX read path and bundle rendering). + +Parametrized conformance suite across both backends pins the Protocol contract so future `PgvectorCorpusBackend` + Postgres adapters register via `register_corpus_backend(...)` without forking core (the semantic-search seam is deferred to the coordinated #258 Postgres-adapter family release so semantic-search coverage stays symmetric across MemoryBackend + CorpusBackend). + +Call-site migration: `agent.py:_load_indexes()` routes `wiki/INDEX.md` reads through `corpus_backend.render_index_summary("wiki")` when registered (per spec/04 step [7]; legacy direct-read path catches `OSError` + `UnicodeDecodeError` with logged warning marker for soft-degrade symmetry). `bundle.py:_render_memory_breakpoint` gains a `corpus_backend: CorpusBackend | None = None` parameter threaded three levels through `render_bundle`, with a shared `_render_wiki_index_section(label, path, content)` helper producing byte-identical output between Protocol path and legacy fallback (IRON RULE assertion 4). `bundle.py:_source_paths` migration deferred to v1.1 (filesystem-only function; pinned by the deferral test and tracked at #314). + +`CorpusBackend` becomes the source of truth for `wiki/` and `raw/` per spec/34 while `MemoryBackend` retains exclusive ownership of `memory/` and `journal/` (spec/24 Decision 7 addendum). + +Operator override via `ATOMIC_AGENTS_CORPUS_BACKEND` + optional `ATOMIC_AGENTS_CORPUS_BACKEND_URL` env vars (when `=sqlite` without URL, defaults to `/.corpus.db` with `agent_scope=quote_plus(agent_root.name)` so single-host operators get a working SQLite default by flipping one env var) OR `AtomicAgent(..., corpus_backend=...)` constructor kwarg + per-runner kwargs on OutcomeRunner (threads at `outcome.py:255`) / EvalRunner (at `eval.py:363`) / DreamRunner (stores as `self._corpus_backend` for API parity; no internal `AtomicAgent` construction site in v1). + +`delegate.py` explicit-only threading via `_corpus_backend_was_explicit` flag mirroring PersonaBackend D-ER-2 at `agent.py:431` (default-resolved backends do not leak the coordinator's `agent_root` to delegates because corpus is per-agent semantic context, distinct from fleet-scoped Policy + AgentProfile which always thread). + +`doctor.check_corpus_backend` coherence check with PASS/WARN/FAIL ladder + capability snapshot + page-count performance cliff WARN when `stats().page_count` exceeds 1000 pages on `supports_full_text_search=False` (the WARN hint names `ATOMIC_AGENTS_CORPUS_BACKEND=sqlite` as the remedy, mirroring the LogBackend doctor precedent) + URL credential redaction across operator-facing error paths. + +`atomic-agents corpus` CLI (`list`/`show`/`query`/`version`/`restore` subcommands, zero LLM calls, env-var-aware). + +Implementer contract for corpus backends documented in `docs/spec/34-corpus-backend.md` §"Implementer contract for corpus backends" (9 normative MUSTs covering page name charset validation at API boundary, side-effect-free construction, capability honesty including `embedding_provider=None` invariant, `query()` capability precedence rule, `write_page()` 4-case behavior table, URL credential redaction across operator-facing error paths, cross-corpus isolation at storage layer, snapshot id determinism + cross-page isolation, `backend_id` stability + `close()` idempotency). + +**Closes the GB-scale wiki cliff**: operators with a 10K-page wiki or hundreds of MB of raw documents stop waiting seconds per keyword grep over an unindexed filesystem; `SQLiteCorpusBackend` with FTS5 delivers O(log N) indexed full-text search at stdlib cost (no Postgres operator burden); future `PgvectorCorpusBackend` arrives via the coordinated #258 release for symmetric semantic retrieval across both substrates. Same agent definitions, same `agent.call()` flow, same audit trail, different corpus substrate. + +--- + +## MCPServerRegistryBackend (#201, locked at PR 5 of 5) + +`tests/test_mcp_server_registry_conformance.py` parametrized across both backends + `tests/test_mcp_server_registry_http_backend.py`. + +`FilesystemMCPServerRegistryBackend(agent_root, read_paths)` reference impl reading `/mcp.md` + optional `read_paths` for shared catalogs. + +`HTTPMCPServerRegistryBackend(catalog_url, agent_scope)` reference impl with tier-1/2/3 capability negotiation (OPTIONS probe for tier negotiation, `GET /capabilities` for structured capability body, tier-1 = read-only, tier-2 = read + install/uninstall, tier-3 = read + install/uninstall + audit). + +Protocol surface: `list_mcp_servers` / `load_mcp_server` / `load_all_mcp_servers` / `validate_mcp_server` / `install` / `uninstall` / `capabilities` / `refresh_capabilities` / `close`. + +### Key decisions + +- **D1**: filesystem read-only; catalog server owns transactionality for HTTP. +- **D2**: per-agent scoping via `agent_scope` query param on HTTP. +- **D3**: MCP servers are processes; ToolRegistry is functions. Separate Protocols per spec/25 Decision 3. +- **D4**: tier negotiation — OPTIONS then capabilities endpoint. +- **D5**: `lock_backend` kwarg on filesystem for `.mcp_registry.lock` file distinct from agent main `.lock`. +- **D6**: pre-probe conservative False/False capability default; HTTP dynamic per tier; tier-1 fallback stays False/False. +- **D7**: env-var references resolve client-side at load time; install path must emit unresolved `$VAR` form. +- **D8**: 409 collision maps to `MCPServerAlreadyInstalled`; 405 triggers mid-session tier regression handler with re-probe + cache invalidation. +- **D9**: URL credential redaction via `_safe_catalog_url` in ALL error paths. + +Conformance suite covers 10 MUSTs (name charset, side-effect-free construction, capability honesty, credential redaction, per-agent scoping, backend_id stability + close idempotency, transient-vs-permanent failure honesty, env-var resolution at load time, install/uninstall atomicity + idempotency, load_all consistency). + +Capability flag evolution: PR 1-4 static False/False on HTTP (unconditional NIE on write paths); PR 5 dynamic True/True on tier-2+ probed backends (install/uninstall now live). + +405 mid-session tier regression handler: re-probes then raises `NotImplementedError` with tier-change message + updates cache; if re-probe fails raises `MCPRegistryUnavailable` with "Capability cache may be stale" message. + +Test count ~3,319-3,325 at PR 5 (delta +12 to +18 vs post-PR-4 3,307). + +**Closes the v1.0 Protocol surface**: operators with a managed MCP catalog or a private HTTP catalog registry can now install/uninstall MCP servers from the same `agent.call()` flow as home-user filesystem operators. + +--- + +## Why twelve protocols, summarized + +A person at home runs filesystem-everything with one agent. An organization runs the same agents over Postgres, behind an HTTP service, with a fleet of orchestrated roles. **Same agent definitions, same `call()` flow, same audit trail. Different backends.** + +That property is the moat. Each Protocol is one Implementer Contract away from a new substrate, and every reference impl follows the same shape established by `docs/spec/20-memory-backend.md` + PR #57. + +Going forward: **the elegance is the product.** Protect it.