Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
16 changes: 16 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -58,10 +58,26 @@ ASR.md
# - DESIGN.md: consolidated architecture + decision log.
# - AIRGAP_INSTALL.md: Phase 14 (HARD-02) air-gap install path.
# - DEVELOPMENT.md: Phase 16 (BUNDLER-01) contributor workflow.
# - 00-…-11-…: brownfield documentation set (per-topic).
# - adr/*.md: Architecture Decision Records.
docs/*
!docs/DESIGN.md
!docs/AIRGAP_INSTALL.md
!docs/DEVELOPMENT.md
!docs/00-project-overview.md
!docs/01-local-setup.md
!docs/02-architecture.md
!docs/03-code-map.md
!docs/04-main-flows.md
!docs/05-configuration.md
!docs/06-data-model.md
!docs/07-integrations.md
!docs/08-testing.md
!docs/09-build-deploy-release.md
!docs/10-known-risks-and-todos.md
!docs/11-agent-handoff.md
!docs/adr/
!docs/adr/*.md
REVIEW_*.md
review_*.md
.planning/
Expand Down
199 changes: 199 additions & 0 deletions CLAUDE.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,199 @@
# CLAUDE.md — project context for AI agents

> Loaded automatically by Claude Code (and equivalent agents) for
> every session in this repo. Companion to
> [`docs/11-agent-handoff.md`](docs/11-agent-handoff.md), which has
> the longer "action card" format with explanations.

## What this project is

Generic Python multi-agent runtime framework on **LangGraph**
(orchestration) + **LangChain** (provider + agent factory) +
**FastMCP** (tools). Single-file deploy bundle for air-gapped
corporate environments. Two reference apps in `examples/`:
`incident_management` (flagship) and `code_review` (proves the
framework is generic).

`main` is at v1.5 (see [`docs/DESIGN.md`](docs/DESIGN.md) § 13 for
milestone history).

## Read these first

In order:
1. [`docs/DESIGN.md`](docs/DESIGN.md) — long-form architecture +
12 numbered DEC-NNN decisions + milestone history
2. [`docs/11-agent-handoff.md`](docs/11-agent-handoff.md) — top
20 files to read, command allowlist / denylist, common traps
3. [`docs/02-architecture.md`](docs/02-architecture.md) —
quick-scan layered diagram
4. [`docs/04-main-flows.md`](docs/04-main-flows.md) — entry points
+ failure modes per flow

## Always-on commands

```bash
# install / sync deps (uses uv.lock)
uv sync --frozen --extra dev

# tests (full)
uv run pytest -x

# tests (single file, fast)
uv run pytest tests/<file>.py -xvs --no-cov

# lint + type check + ratchets
uv run ruff check src/ tests/
uv run pyright src/runtime
python scripts/check_genericity.py
uv run python scripts/lint_skill_prompts.py

# regenerate single-file bundle (REQUIRED after touching src/runtime/ or examples/)
uv run python scripts/build_single_file.py

# coverage gate
uv run pytest --cov=src/runtime --cov-fail-under=85 -x
```

## DO

- Use `uv run pytest …` (NOT bare `pytest`) — pythonpath is in
`pyproject.toml`.
- Regenerate `dist/*` after ANY change to `src/runtime/` or
`examples/`. CI's "Bundle staleness gate (HARD-08)" fails
otherwise.
- Run `uv lock` and commit `uv.lock` if you change `pyproject.toml`.
CI's "Lockfile freshness gate (HARD-02)" fails otherwise.
- Work on a feature branch, open a PR, squash-merge.
Conventional-commit subjects: `feat(area): …`, `fix(area): …`,
`refactor(area): …`, `docs: …`, `build: …`, `chore(area): …`.
- Use `extra_fields` JSON for app-specific fields. Do NOT add
app-specific columns to `IncidentRow`.
- Use stub LLMs (`LLMConfig.stub()` + `EnvelopeStubChatModel` from
`tests/_envelope_helpers.py`) in tests. Live LLM tests are
env-gated.
- Re-read [`docs/DESIGN.md`](docs/DESIGN.md) § 12 (decision log)
before any architectural change.

## DO NOT

- Do NOT `pip install …` — bypasses uv lockfile. Use `uv add` +
`uv sync`.
- Do NOT edit `dist/*` directly — they're generated.
- Do NOT add `TODO`/`FIXME`/`HACK` comments — fix root cause or
open an issue. The only intentional `TODO(v2)` is in
`src/runtime/locks.py:49` (slot eviction; documented).
- Do NOT add `except Exception: pass` — Phase 18 / HARD-04
removed all of these. Log + re-raise or catch a typed exception.
- Do NOT touch SQLAlchemy column names on `IncidentRow` —
destructive migration. Add to `extra_fields` instead.
- Do NOT commit anything in `.planning/` — gitignored;
local-only working state for the GSD planning workflow.
- Do NOT commit agent-generated `*.md` outside `docs/` unless
the user explicitly asks them to ship. `docs/*` is gitignored
except for the explicit allowlist in `.gitignore`.
- Do NOT call live LLM providers in CI tests — keys are dummy in
`.github/workflows/ci.yml`.
- Do NOT introduce a public-internet runtime dependency in
`src/runtime/`. Air-gap is the deploy target. The hardcoded
`https://ollama.com` fallback was explicitly removed in
Phase 13 (HARD-05); don't re-introduce.
- Do NOT force-push or rewrite history on `main` (or any branch
with collaborators). PRs only.
- Do NOT skip the bundle regeneration step ("I'll do it before
PR" leads to CI fails and time wasted on rebases).
- Do NOT bypass the concept-leak ratchet by raising
`BASELINE_TOTAL` without a rationale entry. Lowering it is
encouraged; raising requires architectural justification in the
commit message.

## Architectural rules (load-bearing)

See [`docs/11-agent-handoff.md`](docs/11-agent-handoff.md) §
"Architectural rules" for the 8 rules. Quick recap:

1. Framework stays domain-agnostic
2. One source of truth per concern (`should_gate`, `should_retry`,
`_finalize_session_status`)
3. HITL pause is NOT an error
4. Append-only audit trails
5. The bundle is the deploy unit
6. Provider abstraction stays in `runtime.llm`
7. Tests use stubs by default
8. No public-internet runtime calls in air-gap path

## Common traps (skim before debugging)

- `pytest` (bare) → `ModuleNotFoundError: runtime`. Use `uv run pytest …`.
- Touching `src/` without regenerating `dist/` → CI bundle gate fails.
- Approving a HITL session created on pre-PR-#6 code → silent no-op.
Tell the user to start a fresh session.
- Live OpenRouter `:free` model 429s on first call → retry usually
works (v1.5-D 429 backoff is 7.5s/15s/22.5s).
- Streamlit `AssertionError: scope["type"] == "http"` storm under
Python 3.14 → cosmetic Starlette compat bug; HTTP traffic still
works.

## Repo conventions

- **Branches:** `feat/`, `fix/`, `refactor/`, `docs/`, `chore/`,
`build/`. Squash-merge into `main`.
- **Commits:** Conventional Commits style. Verbose body with the
"why" + key file references when non-trivial.
- **PRs:** Use `gh pr create` with title + body; CI runs lint /
type / test / sonar / bundle / skill-lint. Squash-merge with
`gh pr merge <n> --squash --delete-branch --subject "…"`.
- **Tests:** `tests/test_*.py`. Async tests need no decorator
(`asyncio_mode=auto`). Stub LLMs from `tests/_envelope_helpers.py`.
- **Coverage:** ≥ 85% on `src/runtime/`. UI / `__main__` /
postgres saver / plugin transport are excluded
(`pyproject.toml:[tool.coverage.run].omit`).
- **Type-checker:** pyright fail-on-error (Phase 19 / HARD-03);
use `# pyright: ignore[<rule>] -- <rationale>` for legitimate
stub gaps.
- **Skill prompts:** `examples/<app>/skills/<name>/{config.yaml, system.md}`.
Must include the markdown turn-output contract block (see
`_common/output.md`).

## Worktree workflow

This repo is set up for parallel-agent worktrees under
`.claude/worktrees/`. If you're given the EnterWorktree tool:

- Use it BEFORE making any code changes — keeps the user's main
checkout clean.
- After CI passes and the PR merges, ExitWorktree with
`action=remove, discard_changes=true` (the squashed commits are
on `main`; the original SHAs are dropped, content is preserved).

If you're not given EnterWorktree, work in the main checkout but
let the user know.

## Current state snapshot (as of last update)

- Tests: 1265 passing, 8 skipped
- Coverage: 87.04%
- Concept-leak ratchet: 39 (down from 156 pre-v1.5-B)
- Ruff: clean
- SonarCloud quality gate: green
- Latest milestone: v1.5 (markdown turn output + HITL fix +
generic-noun pass + per-agent LLM + 429 retry)
- Next big move: v2.0 React UI (Streamlit retirement)

## Where to find what

| You want to … | Read |
|---|---|
| Understand the architecture | [`docs/DESIGN.md`](docs/DESIGN.md), [`docs/02-architecture.md`](docs/02-architecture.md) |
| Local setup | [`docs/01-local-setup.md`](docs/01-local-setup.md) |
| Find a file by purpose | [`docs/03-code-map.md`](docs/03-code-map.md) |
| Understand a flow end-to-end | [`docs/04-main-flows.md`](docs/04-main-flows.md) |
| Configure deployment | [`docs/05-configuration.md`](docs/05-configuration.md) |
| Inspect storage / data | [`docs/06-data-model.md`](docs/06-data-model.md) |
| External integrations | [`docs/07-integrations.md`](docs/07-integrations.md) |
| Run / write tests | [`docs/08-testing.md`](docs/08-testing.md) |
| Build / deploy / release | [`docs/09-build-deploy-release.md`](docs/09-build-deploy-release.md) |
| Risk / debt inventory | [`docs/10-known-risks-and-todos.md`](docs/10-known-risks-and-todos.md) |
| Action card for AI agents | [`docs/11-agent-handoff.md`](docs/11-agent-handoff.md) |
| Architectural baseline | [`docs/adr/0001-current-architecture.md`](docs/adr/0001-current-architecture.md) |
| Dev workflow (regenerate dist, add module) | [`docs/DEVELOPMENT.md`](docs/DEVELOPMENT.md) |
| Air-gap install | [`docs/AIRGAP_INSTALL.md`](docs/AIRGAP_INSTALL.md) |
10 changes: 10 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,15 @@
# ASR — Multi-Agent Runtime Framework

[![Python](https://img.shields.io/badge/python-3.11%2B-blue?style=for-the-badge&logo=python&logoColor=white)](https://www.python.org/)
[![LangGraph](https://img.shields.io/badge/LangGraph-1.x-orange?style=for-the-badge)](https://github.com/langchain-ai/langgraph)
[![FastMCP](https://img.shields.io/badge/FastMCP-2.x-purple?style=for-the-badge)](https://github.com/jlowin/fastmcp)
[![CI](https://img.shields.io/github/actions/workflow/status/RandomCodeSpace/asr/ci.yml?branch=main&style=for-the-badge&logo=github)](https://github.com/RandomCodeSpace/asr/actions/workflows/ci.yml)
[![Quality Gate](https://img.shields.io/sonar/quality_gate/RandomCodeSpace_asr?server=https%3A%2F%2Fsonarcloud.io&style=for-the-badge&logo=sonarcloud)](https://sonarcloud.io/project/overview?id=RandomCodeSpace_asr)
[![Coverage](https://img.shields.io/sonar/coverage/RandomCodeSpace_asr?server=https%3A%2F%2Fsonarcloud.io&style=for-the-badge&logo=sonarcloud)](https://sonarcloud.io/component_measures?id=RandomCodeSpace_asr&metric=coverage)
[![Tests](https://img.shields.io/badge/tests-1265%20passing-brightgreen?style=for-the-badge)](https://github.com/RandomCodeSpace/asr/actions)
[![Ruff](https://img.shields.io/badge/lint-ruff-261230?style=for-the-badge&logo=ruff)](https://github.com/astral-sh/ruff)
[![Pyright](https://img.shields.io/badge/types-pyright-yellow?style=for-the-badge)](https://github.com/microsoft/pyright)

Python multi-agent runtime built on **LangGraph** (orchestration) +
**FastMCP** (tool dispatch), with HITL gate, markdown turn-output
contract, and a single-file deploy bundle for air-gapped corporate
Expand Down
97 changes: 97 additions & 0 deletions docs/00-project-overview.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,97 @@
# 00 — Project overview

## What it does

ASR is a generic Python multi-agent runtime framework that wraps
**LangGraph** (orchestration), **LangChain** (LLM provider
abstraction + agent factory), and **FastMCP** (tool dispatch). It
adds a risk-rated HITL gateway, a markdown turn-output contract,
per-step telemetry, an auto-learning lesson store, and a single-file
deploy bundle for air-gapped corporate targets.

Two reference apps live in the same repo to prove the runtime is
genuinely generic:

- **`examples/incident_management/`** — 4-skill investigation pipeline
(intake → triage → deep_investigator → resolution) with ASR memory
layers (L2 Knowledge Graph, L5 Release Context, L7 Playbook Store).
- **`examples/code_review/`** — 3-skill PR review pipeline (intake
→ analyzer → recommender). Built specifically to surface every
framework leak that would have made the runtime
incident-shaped — those leaks were lifted into the framework.

References: [`docs/DESIGN.md`](DESIGN.md), [`pyproject.toml`](../pyproject.toml).

## Target users

- **Operators** of internal SRE / on-call automation in regulated /
air-gapped corporate environments. The deployment story is a
copy-only 7-file payload (no `pip install` at deploy time, no
runtime CDN/internet calls).
- **Application authors** building domain-specific agent apps on top
of the framework. Add a folder under `examples/<your_app>/` with a
`Session` subclass, MCP servers, and skill prompts.
- **Framework contributors** working on the `src/runtime/` layer.

## Core features

| Feature | Implemented in |
|---|---|
| LangGraph-driven multi-agent dispatch | `src/runtime/graph.py`, `src/runtime/agents/*.py` |
| LangChain-driven LLM provider abstraction (Ollama, Azure OpenAI, OpenAI-compat) | `src/runtime/llm.py` |
| FastMCP tool servers (in-process / stdio / http) | `src/runtime/mcp_loader.py` |
| Risk-rated HITL gateway with `interrupt()` / `Command(resume=…)` | `src/runtime/tools/gateway.py` |
| Markdown turn-output contract + 6-path parser + permissive fallback | `src/runtime/agents/turn_output.py` |
| Per-step telemetry events (agent_started, tool_invoked, gate_fired, etc.) | `src/runtime/storage/event_log.py` |
| Auto-learning lesson store + nightly refresher | `src/runtime/learning/extractor.py`, `src/runtime/learning/scheduler.py` |
| Two-stage dedup (embedding + LLM) | `src/runtime/dedup.py` |
| Optimistic-concurrency `SessionStore` over SQLAlchemy | `src/runtime/storage/session_store.py` |
| Read-only similarity store | `src/runtime/storage/history_store.py` |
| Trigger registry (api / webhook / schedule / plugin) | `src/runtime/triggers/` |
| Single-file deploy bundle (`dist/`) | `scripts/build_single_file.py` |
| Streamlit UI shell | `src/runtime/ui.py`, `ui/streamlit_app.py` |
| FastAPI surface (`/sessions/*`, SSE/WebSocket, approvals) | `src/runtime/api.py` |
| Concept-leak ratchet (CI-enforced framework genericity) | `tests/test_genericity_ratchet.py`, `scripts/check_genericity.py` |

## Current status

`main` is at v1.5 (last squash commit `b97ddb3`). All milestones
shipped:

| Milestone | Title | PR |
|---|---|---|
| v1.0 | Prompt-vs-Code Remediation | #1 |
| v1.1 | Framework De-coupling (generic runtime) | #2 |
| v1.2 + v1.3 + v1.4 | FOC + HARD + telemetry + auto-learning + React-ready API | bundled into #5 |
| v1.5-A | Markdown turn output + HITL fix on langgraph 1.x | #6 / #7 |
| v1.5-B | Generic-noun pass (concept-leak ratchet 156 → 39) | #8 |
| v1.5-C | Per-agent LLM proof point | #9 |
| v1.5-D | 429 rate-limit retry + multi-provider integration driver | #10 |

**1265 tests passing**, **87% coverage**, **ratchet at 39**, ruff
clean, SonarCloud quality gate green. See [`docs/DESIGN.md` § 13](DESIGN.md#13-milestone-history)
for the full history.

## Production-ready vs experimental

| Surface | Status | Notes |
|---|---|---|
| Framework runtime (`src/runtime/`) | **production** | Used in air-gapped corporate environments |
| `incident_management` example | **production** | Flagship use case |
| `code_review` example | **demo / proof-of-genericity** | Tools are mocks (no real GitHub/GitLab fetch) — `examples/code_review/README.md` |
| Streamlit UI | **prototype** | Stable but slated for replacement by React in v2.0 |
| FastAPI surface | **production-ready** | v1.4 added generic `/sessions/*` REST + SSE/WebSocket + CORS + structured error envelope |
| Postgres checkpointer | **optional / opt-in** | Default is SQLite; install `pip install asr[postgres]` (`pyproject.toml:39`) |
| Trigger registry — webhook / schedule | **functional, lightly exercised** | Used by the example apps; no large-scale fan-in tested |
| Trigger registry — plugin transport | **stub** (`src/runtime/triggers/transports/plugin.py`) — Inference: scaffold for future SQS/Kafka/NATS transports |
| ASR memory layers (incident_management) | **read-only** | Mutation paths (write-back) deferred per `examples/incident_management/README.md` |
| Auto-learning lesson refresher | **production** | Nightly APScheduler job, gated on config |

## What's next

- **v2.0 — React UI**, replacing the Streamlit prototype, parity-port
against the v1.4 `/sessions/*` API surface. The long pole.
- Smaller cleanups: duplicate `ToolCall` audit rows
(gateway colon-form vs harvester `__`-form), `ApprovalWatchdog`
regression test, `ASR_LOG_LEVEL` doc, `src/runtime/locks.py:49`
TODO. See [`docs/10-known-risks-and-todos.md`](10-known-risks-and-todos.md).
Loading
Loading