Audience. New contributors and operators who need one document covering what the framework is, how it composes, and why the non-obvious decisions are the way they are.
Scope. Architecture, core abstractions, runtime model, storage, deployment, and a decision log. Operational how-tos live in
docs/DEVELOPMENT.md(dev workflow) anddocs/AIRGAP_INSTALL.md(corporate-mirror install).
ASR is a generic Python multi-agent runtime that wraps LangGraph for orchestration and FastMCP for tool dispatch, adds a HITL gateway and a markdown turn-output contract on top, and ships as a single-file bundle into air-gapped corporate environments.
Two reference apps live in the same repo to prove the runtime is genuinely generic:
examples/incident_management/— 4-skill investigation pipeline (intake → triage → deep_investigator → resolution) with ASR memory layers (L2 Knowledge Graph, L5 Release Context, L7 Playbook Store) and a remediation workflow that pauses on high-risk actions.examples/code_review/— 3-skill PR review pipeline (intake → analyzer → recommender). Built specifically to surface every framework leak that would have made the runtime incident-shaped — those leaks were lifted into the framework rather than worked around.
What the framework owns: session lifecycle, agent dispatch, tool gateway, HITL pause/resume, telemetry, storage, deployment bundling.
What an app owns: domain Session subclass, MCP servers, skill
prompts + per-skill YAML, App*Config for cross-cutting domain
knobs (severity aliases, escalation roster, similarity thresholds).
Layers from bottom to top:
+------------------------------------------------------------+
| App layer (examples/incident_management, examples/code_review)
| - state.py, config.py, skills/, mcp_server.py, ui.py |
+------------------------------------------------------------+
| Framework — runtime/ |
| - Session, Skill, AgentRun, ToolCall, AgentTurnOutput |
| - Orchestrator, OrchestratorService |
| - Gateway (wrap_tool), policies, ToolRegistry |
| - SessionStore, HistoryStore, EventLog |
| - graph.py: build_graph + make_agent_node |
| - llm.py: provider abstraction |
| - ui.py: Streamlit shell |
| - api.py: FastAPI surface |
+------------------------------------------------------------+
| LangGraph 1.x (orchestration / state / checkpointing) |
| LangChain 1.x (chat models, agents.create_agent, tools) |
| FastMCP (in-process / stdio / http MCP servers) |
+------------------------------------------------------------+
| Providers: Ollama Cloud · OpenRouter · Azure OpenAI · … |
+------------------------------------------------------------+
Control flow for one session (steady state):
UI / API ──start_session──▶ OrchestratorService ──▶ Orchestrator
│
▼
build_graph (langgraph StateGraph)
│
per-agent step ▼
┌───────────────────────────────────┐
│ make_agent_node │
│ - reload session from store │
│ - emit agent_started event │
│ - wrap_tool(s) with gateway │
│ - create_agent (langchain/langgraph)
│ - _drive_agent_with_resume │
│ loop: ainvoke / handle pause │
│ - parse_envelope_from_result │
│ - record AgentRun │
│ - decide route from signal │
└───────────────────────────────────┘
│
gate node? (low confidence) ──▼
terminal tool? ──▶ status set by tool
else ──▶ default_terminal_status
│
▼
finalize_session_status_async
The framework's unit of work. All apps subclass it; the framework itself only reads/writes the fields declared on the base class.
class Session(BaseModel):
id: str
status: str
created_at: str
updated_at: str
deleted_at: str | None
agents_run: list[AgentRun]
tool_calls: list[ToolCall]
findings: dict[str, Any]
token_usage: TokenUsage
pending_intervention: dict | None
user_inputs: list[str]
parent_session_id: str | None # dedup linkage
dedup_rationale: str | None
extra_fields: dict[str, Any] # bag for app-specific domain data
version: int # optimistic concurrency
turn_confidence_hint: float | None # transient (excluded from persistence)Apps add domain fields on a subclass:
class IncidentState(Session):
query: str
environment: str
reporter: Reporter
severity: str
summary: str
resolution: str | NoneFields the row schema doesn't have a column for round-trip via the
extra_fields JSON bag — see § 8 Storage.
YAML-driven configuration unit:
name: triage
description: Hypothesis-loop triage agent
kind: responsive # responsive | supervisor | monitor
model: gpt_oss_cheap # optional per-agent override
tools:
local_inc: [submit_hypothesis, update_incident]
local_observability: [get_logs, get_metrics, ...]
routes:
- when: success
next: deep_investigator
- when: needs_input
next: __end__
gate: confidence
- when: default
next: deep_investigator
system_prompt: |
...Three kinds:
responsive— ReAct LLM agent (the default; useslangchain.agents.create_agent).supervisor— non-LLM rule-based dispatcher (or LLM-dispatched viadispatch_strategy: llm); used by intake to pre-filter.monitor— out-of-band runner (e.g.MonitorRunner); not a graph node.
Append-only audit rows:
class AgentRun(BaseModel):
agent: str
started_at: str
ended_at: str
summary: str # final_text, or "agent failed: <exc>"
token_usage: TokenUsage
confidence: float | None
confidence_rationale: str | None
signal: str | None
class ToolCall(BaseModel):
agent: str
tool: str
args: dict
result: dict | str | list | int | float | bool | None
ts: str
risk: ToolRisk | None # low | medium | high
status: ToolStatus # executed | executed_with_notify
# | pending_approval | approved
# | rejected | timeout
approver: str | None
approved_at: str | None
approval_rationale: str | NoneOrchestrator(src/runtime/orchestrator.py) — owns the compiled langgraph, theSessionStore, the per-session async lock registry, and the synchronous lifecycle methods (start_session,stream_session,resume_session,retry_session).OrchestratorService(src/runtime/service.py) — long-lived asyncio loop wrapper aroundOrchestrator. Owns the loop thread, registers in-flight sessions, exposes a thread-safesubmit_async/submit_and_waitbridge so the Streamlit UI thread and the FastAPI request handlers can both schedule work without fighting over the same FastMCP / SQLAlchemy transports.
Every BaseTool an agent sees is wrapped by the gateway. The
wrapper:
- Injects session-derived args (e.g.
environmentfrom the session row) before the LLM-visible arg surface, so the LLM physically cannot fabricate them. - Consults the risk policy:
low→ run, emittool_invokedwithstatus=executed.medium→ run, append aexecuted_with_notifyaudit row.high→ calllanggraph.types.interrupt(payload), append apending_approvalrow, save to DB, pause the graph.
- After resume:
- On
approve→ run the inner tool, update the pending row toapproved, save. - On
reject/timeout→ return a marker dict, update the pending row to the matching status, save.
- On
(src/runtime/agents/turn_output.py)
The structured output every agent must produce per turn:
class AgentTurnOutput(BaseModel):
content: str
confidence: float # [0.0, 1.0], reconciled
confidence_rationale: str
signal: str | None # success | failed | needs_input | NoneHow the envelope is sourced — see § 6 Markdown turn output.
States a session walks through:
new ─▶ in_progress ─▶ <terminal>
resolved | escalated | needs_review |
awaiting_input | error | stopped | duplicate
new— row created, graph not yet entered.in_progress— at least one agent has run; non-terminal.- Terminal states are set by:
- Terminal tool calls (e.g.
mark_resolved→resolved,mark_escalated→escalated); the tool registry maps tool names to status transitions. default_terminal_status(needs_reviewfor incident management) when the graph completes without a terminal tool._handle_agent_failure→erroron agent exceptions.stop_session()→stoppedon explicit cancellation.dedup_check→duplicate(withparent_session_id) when stage-2 LLM dedup confirms a match against a prior closed session.
- Terminal tool calls (e.g.
(src/runtime/graph.py:_build_agent_nodes)
For every skill in cfg.orchestrator.skills:
llm = get_llm(cfg.llm, skill.model, role=agent_name, ...)
node = make_agent_node(skill=skill, llm=llm, tools=run_tools, ...)
sg.add_node(agent_name, node)skill.model is the per-agent override; falls through to
cfg.llm.default when None. This is what lets intake run on
Ollama while triage / DI / resolution run on OpenRouter — see the
v1.5-C decision below.
skill.routes is a list of (when, next, gate?) rules. The
runtime evaluates them after each agent step:
routes:
- when: success # signal value
next: deep_investigator
- when: needs_input
next: __end__
gate: confidence # route through gate node first
- when: default # fallback
next: triageThe framework's gate node fires when the upstream agent's confidence
is below framework.confidence_threshold (default 0.75). The gate
emits a pending_intervention and the session moves to
awaiting_input until the operator supplies a resume_with_input
verdict. Agents emit signals via the signal arg of typed-terminal
or patch tools.
Three independent paths:
- Tool-driven — an agent calls a tool the registry recognises
as terminal (
local_inc:mark_resolved,…:mark_escalated). The tool setsinc.statusdirectly. - Inferred —
_finalize_session_statuswalkstool_callsmatching againstcfg.orchestrator.terminal_toolsrules. - Default — falls through to
cfg.orchestrator.default_terminal_statuswhen no rule fires AND the graph wasn't paused on a HITL gate.
The pause-aware guard
(Orchestrator._is_graph_paused) is what keeps a paused HITL
session from being coerced to default_terminal_status while the
operator is still deciding.
+----------------------------------------------------------+
| Skill (YAML) model: gpt_oss_cheap |
+----------------------------------------------------------+
| runtime.llm.get_llm resolves name → cfg.models[name] |
| → ProviderConfig → BaseChatModel |
+----------------------------------------------------------+
| LangChain provider class |
| - ChatOpenAI openai_compat (OpenRouter) |
| - ChatOllama ollama (Ollama Cloud + local) |
| - AzureChatOpenAI azure_openai |
+----------------------------------------------------------+
| Driven by langchain.agents.create_agent (langgraph subgraph) |
+----------------------------------------------------------+
config/config.yaml declares providers + named models:
llm:
default: workhorse
providers:
ollama_cloud:
kind: ollama
base_url: https://ollama.com
api_key: ${OLLAMA_API_KEY}
azure:
kind: azure_openai
endpoint: ${AZURE_ENDPOINT}
api_version: 2024-08-01-preview
api_key: ${AZURE_OPENAI_KEY}
openrouter:
kind: openai_compat
base_url: https://openrouter.ai/api/v1
api_key: ${OPENROUTER_API_KEY}
models:
workhorse:
provider: openrouter
model: inclusionai/ring-2.6-1t:free
gpt_oss:
provider: ollama_cloud
model: gpt-oss:20b
smart:
provider: azure
model: gpt-4o
deployment: gpt-4o_ainvoke_with_retry (src/runtime/graph.py) splits transient
errors into two classes:
| Class | Markers | Backoff | Total |
|---|---|---|---|
| 5xx + connection | internal server error, status code: 5xx, connection reset, remoteprotocolerror, incomplete chunked read |
1.5s × attempt | ~9s |
| 429 / rate-limit | status code: 429, error code: 429, 429, 429 , ratelimiterror, rate limit, rate-limited, too many requests |
7.5s × attempt | ~45s |
Non-429 4xx (auth, validation) propagates immediately so quota / schema problems fail fast.
tests/test_integration_driver_s1.py parametrises three legs
(local, workhorse, azure); each skips independently if its
keys are absent. Run with OLLAMA_API_KEY + OLLAMA_BASE_URL,
OPENROUTER_API_KEY, and/or AZURE_OPENAI_KEY + AZURE_ENDPOINT
exported.
Pre-Phase-22 the framework forced agents through
response_format=AgentTurnOutput (a JSON schema). Multiple problems:
- gpt-oss / Ollama models drifted on JSON schema adherence.
- LangGraph's
with_structured_outputsecond pass interacted badly with the React END signal underrecursion_limit=25. - Adding tools to the schema confused some providers' tool dispatch.
Phase 22 dropped response_format and made the agent close its turn
with a markdown contract block. Markdown is the format every chat
model writes well; the parse step happens in the framework where
leniency is in our control.
Every skill prompt ends with:
## Output contract — REQUIRED
Every final reply MUST end with these three sections, in order, each
preceded by a level-2 markdown header:
## Response
<body>
## Confidence
<0.0-1.0 float> -- <one-line rationale>
## Signal
<success|failed|needs_input|none>
**CRITICAL — final-reply rule:** the markdown envelope is mandatory;
the framework hard-fails if it is missing.
parse_envelope_from_result walks 6 paths and falls through to a
hard fail:
| Path | Source | When it fires |
|---|---|---|
| 1 | result["structured_response"] |
Pre-Phase-22 stub fixtures and explicit-schema callers |
| 2 | JSON-decode last AIMessage content | Models that still emit valid JSON |
| 4 | _parse_confidence_line over the ## Confidence body |
Markdown-primary path; the production happy path |
| 5 | Typed-terminal-tool args (confidence, confidence_rationale, resolution_summary) |
Models that treat a terminal tool call as completion |
| 6 | Permissive: any tool was called → synthesise a 0.30-confidence placeholder | Last-ditch fallback so the session reaches a terminal status instead of hard-failing |
| 7 | raise EnvelopeMissingError |
Truly nothing parseable |
(Path 3 was the original location for what became Path 4; the numbering is preserved in code comments to keep historical commits diff-friendly.)
- gpt-oss prefers EN DASH (
–,–) over EM DASH (—,—); the dash separator accepts the full Unicode Pd block. - gpt-oss sometimes emits an empty closing AIMessage after a tool call; Path 5 / Path 6 cover that.
- The skill prompts carry an explicit
**CRITICAL — final-reply rule:**paragraph because gpt-oss initially treated the first tool result as completion.
The procedural confidence-line parser
(_parse_confidence_line) replaces an earlier regex that Sonar's
S5852 (regex DoS) flagged; the procedural form has no backtracking
surface to attack.
Tools are policy-gated per
cfg.runtime.gateway.policy:
runtime:
gateway:
policy:
apply_fix: high # gate
restart_service: medium # notify-only audit
get_logs: low # default; no row writtenApps configure cfg.orchestrator.gate_policy for cross-cutting
behaviour:
gate_policy:
threshold: 0.75
gated_environments: [production]
gated_risk_actions: [approve]
resolution_trigger_tools: ['local_remediation:apply_*']langgraph 1.x changed the interrupt() contract: a tool calling
interrupt() no longer raises GraphInterrupt to the caller —
agent.ainvoke() returns a normal result with
result["__interrupt__"] populated. The framework's wrapper had to
catch up:
_drive_agent_with_resume(src/runtime/graph.py) detects an inner pause viaagent_executor.aget_state(inner_cfg).nextbeing non-empty, calls outerinterrupt()to fetch the verdict, and forwards viaagent_executor.ainvoke(Command(resume=verdict), config=inner_cfg).- The inner
create_agentnow receives the orchestrator's checkpointer + a deterministic per-invocation thread id (f"{inc_id}:agent:{skill.name}:turn{len(agents_run)}"). Without these,Command(resume=…)raises and the gated tool gets silently skipped. make_agent_nodereloads fromstore.load(inc_id)at entry — defends against stalestate["session"]snapshots from outer Pregel checkpoints (which capture state at step boundaries, not mid-step).gateway.wrap_toolcallsstore.saveafter every status transition (rejected / timeout / approved) so the audit row in the DB matches the operator's actual decision.Orchestrator._is_graph_pausedguards_finalize_session_status_asyncinstream_session/retry_session/ the API approval handler — a HITL pause must not be coerced intodefault_terminal_status.
These five fixes shipped together as PR #6; before them, clicking Approve would do nothing because the framework had already moved past the pause point.
Two ways to resolve a pending_approval:
- UI —
_render_pending_approvals_blockshows the Approve / Reject buttons and rationale field; click drivesCommand(resume={"decision": "approve", ...})viaOrchestratorService.submit_and_wait. - API —
POST /sessions/{sid}/approvals/{tcid}does the same resume, scoped under the per-session lock so two concurrent approvals on the same thread can't race.
(src/runtime/tools/approval_watchdog.py)
Background task that scans pending_approval rows older than
framework.approval_timeout and resolves them with verdict=timeout,
freeing operators from manual intervention on stale rows. Triggered
by the lifespan startup hook.
CRUD for the row schema. Owns:
_next_id— monotonic per-day sequence; respectsstate_cls.id_format(seq=…, prefix=…)so each app picks its own ID namespace (INC-…,CR-…, etc.).save— optimistic-version update. Bumpsversion; raisesStaleVersionErroron mismatch so the caller can reload + retry._row_to_incident/_incident_to_row_dict— round-trip betweenIncidentRow(SQLAlchemy) and the app'sSessionsubclass. Fields the row schema has columns for go to typed fields; everything else lands inextra_fieldsJSON.- Vector write-through —
_persist_vector/_add_vector/_refresh_vectorkeep a FAISS index aligned with the row table.
The persistent schema. Fields are deliberately broad enough to host
the example apps' typed fields (severity, reporter_id,
reporter_team, summary, tags, parent_session_id,
dedup_rationale, extra_fields JSON) without forcing every app
to declare them. An app's Session subclass declares whichever
typed fields it cares about; the rest stay in extra_fields.
The severity / reporter_* columns ARE incident-shaped — the
v1.5-B generic-noun pass left them in place because renaming would
require a schema migration. Apps that don't model severity or a
human submitter ignore those columns; the round-trip silently
omits them.
Read-only similarity search over the same engine + vector store.
Used by intake's similarity retrieval (lookup_similar_incidents).
Filter dimensions are pluggable — apps construct a
HistoryStore(filter_resolver=…) matching their own row shape.
(src/runtime/checkpointer.py)
Separate from the SessionStore. SQLite default (sqlite:////tmp/asr.db),
Postgres optional via runtime.checkpointer_postgres. Holds langgraph
Pregel state + pending interrupts. The HITL approve / reject path
relies on this checkpointer being durable.
Append-only session_events table. Records:
agent_started,agent_finished,confidence_emitted,route_decidedtool_invoked(every wrapped tool call, with latency + result_kind)gate_fired(HITL gate decisions)status_changed(terminal-status transitions with cause)lesson_extracted(M5/M6 auto-learning)
Per-step events feed any external observability stack and the auto-learning pipeline.
The incident-management app ships an ASR (Automated Site Reliability) memory bundle hydrated by the supervisor at intake:
| Layer | What | Backend |
|---|---|---|
| L2 | Knowledge Graph — services, owners, runbooks, dependencies | examples/incident_management/asr/kg_store.py (filesystem JSON) |
| L5 | Release Context — recent deploys per service | release_store.py (filesystem JSON) |
| L7 | Playbooks — known-good remediation steps per failure mode | playbook_store.py (filesystem JSON) |
hydrate_and_gate (in the example's MCP server) walks the user's
query, extracts mentioned components, and returns a MemoryLayerState
bundle that the triage agent reads as additional context.
This is app-level, not framework — the runtime stays memory-
agnostic. A different app can ship its own L1/L2/L3 memory layers
without touching runtime/.
The deployment env is corporate / air-gapped: no public-internet
runtime calls, no CDN fetches, no pip install at deploy time.
scripts/build_single_file.py flattens the runtime + each app into
self-contained .py files under dist/:
| File | Contents |
|---|---|
dist/app.py |
framework only — no example code |
dist/apps/incident-management.py |
framework + incident_management example |
dist/apps/code-review.py |
framework + code_review example |
dist/ui.py |
Streamlit shell |
CI gate Bundle staleness gate (HARD-08) rebuilds the bundles
from src/ and fails the build if they don't match the committed
dist/* — this keeps the deploy bundles "repaired by construction"
on every merge.
Copy onto the target host:
app.py (renamed from dist/apps/<app>.py)
ui.py (dist/ui.py)
config/config.yaml (framework: LLM, MCP, storage)
config/<app>.yaml (app: severity aliases, escalation roster, …)
config/skills/ (skill prompts, optional override)
.env (provider keys)
Boot:
python -m runtime --config config/<app>.yaml
streamlit run ui.py --server.port 37777uv.lock pins direct + transitive deps with sha256 hashes. CI
installs from the lock with uv sync --frozen; an internal
package mirror is sufficient for a fully offline build. See
docs/AIRGAP_INSTALL.md.
Every meaningful boundary emits an EventLog row keyed by
session_id. The four agent-boundary events
(agent_started → confidence_emitted → route_decided → agent_finished) fire in order; tool_invoked and gate_fired
fire at the gateway boundary.
src/runtime/learning/extractor.py runs at session finalize and
distills outcome + winning hypothesis + applied fix into a
Lesson row. The intake supervisor reads recent lessons via
LessonStore.find_relevant(query, …) to prime the next session.
src/runtime/learning/scheduler.py runs an APScheduler job
nightly (configurable) that walks recent sessions and extracts
lessons missed at finalize time (e.g. sessions resolved manually
in the UI long after the agent's run).
Compact rationale for the non-obvious calls. Each entry is a single "why".
When. From the start.
Why. Out-of-the-box Pregel-style step boundaries +
checkpointing + first-class HITL interrupt() semantics. We don't
maintain a graph engine ourselves; we just wrap it.
When. v1.3 hardening, after langgraph.prebuilt.create_react_agent
was deprecated.
Why. Single tool-loop with native ToolStrategy fallback, removes
the recursion_limit=25 workaround we previously needed.
When. v1.5-A.
Why. JSON-schema-shaped output via response_format triggered a
class of brittleness across providers (model-specific JSON drift,
tool-strategy + React END interaction, recursion_limit ceilings).
Markdown is the native format every chat model writes well; the parse
step happens in the framework where leniency is in our control. Path
5 / Path 6 fallbacks cover models that occasionally drop the
contract.
When. v1.2.
Why. The gate decision (high-risk tool? gated env? low
confidence?) was previously scattered across the gateway, the
orchestrator, and the skill prompts. Phase 11 moved it into a single
pure function should_gate(session, tool_call, confidence, cfg) so
auditing what gates is a one-grep operation.
When. v1.1.
Why. Pre-v1.1 the framework had IncidentState baked in. Adding
a second app (code_review) was the forcing function — every
"incident-shaped" leak that surfaced moved into the framework as
Session.extra_fields (the JSON bag) or the row schema's existing
typed columns. Apps now subclass Session and write whatever fields
they need; the framework stays domain-agnostic.
When. v1.5-C.
Why. The intake supervisor can run on a fast / cheap model
while the deep-investigator agent needs a smarter (more expensive)
one. _build_agent_nodes resolves get_llm(cfg.llm, skill.model, role=agent_name) per skill; falls back to cfg.llm.default when
model is None.
When. v1.3.
Why. Corporate deploy env is copy-only. A multi-file
pip install step is out of scope. The bundler turns the
multi-file source tree into the smallest possible deploy payload
(7 files total).
When. v1.5-B.
Why. The decoupling work (DEC-005) wasn't binary — incident /
severity / reporter tokens kept creeping into src/runtime/ via
local variables, docstrings, and helper names. The ratchet test
counts those tokens and fails the build if the count grows. v1.5-B
took it from 156 down to 39 (the residual 39 are
schema-coupled / public-API / intentional example-app callouts).
When. v1.5-D.
Why. Free / shared upstream tiers (e.g. OpenRouter …:free)
throttle on 30-60s windows; the 5xx default backoff (1.5s/3s/4.5s)
exhausted retries before the window cleared. Now 429 retries on
7.5s/15s/22.5s (~45s total).
When. v1.5-A.
Why. Outer Pregel checkpoints at step boundaries, not mid-step.
On resume, state["session"] reflects the prior step's output, NOT
the gateway's pending_approval row + version bump that happened
mid-step. Without make_agent_node reloading from store at entry,
the gateway sees no pending row, double-appends, and store.save
raises StaleVersionError. The reload + the inner checkpointer
together are what make Approve / Reject actually drive the gated
tool to completion.
When. v1.1 (incident_management lifted), Phase 8 (code_review added). Why. Without a second app, "is the framework generic?" is unanswerable. The code_review app was built specifically to surface every incident-shaped assumption that hadn't been lifted yet — id format, row schema, build pipeline, intra-bundle imports. Each leak became a framework PR rather than an app workaround.
When. v1.3.
Why. dist/ files drift if a contributor updates src/runtime/
or examples/ without re-running the bundler. The drift turns into
a deploy-time bug ("works in dev, broken in prod"). The CI gate
rebuilds the bundles from source on every PR and refuses the merge
if they differ from the committed dist/*.
| Milestone | Title | PR | Squash SHA | Headline change |
|---|---|---|---|---|
| v1.0 | Prompt-vs-Code Remediation | #1 | 02378dd |
Code becomes the authority — skill prompts no longer carry policy logic |
| v1.1 | Framework De-coupling | #2 | 0ff8914 |
Generic runtime, ASR as use case |
| v1.2 | Framework Owns Flow Control | bundled into #5 | 9018371 |
FOC-01..06 — gate / retry / signal / dedup all framework-owned |
| v1.3 | Hardening + Real-LLM Compatibility | bundled into #5 | 9018371 |
HARD-01..09 + LLM-COMPAT-01 + BUNDLER-01 + SKILL-LINTER-01 |
| v1.4 | Per-step telemetry + auto-learning intake + React-ready API | #5 | 9018371 |
M1..M9 telemetry + LessonStore + generic /sessions/* + SSE + WebSocket + CORS + structured error envelope |
| v1.5-A | Markdown turn output (Phase 22) + HITL approve/reject end-to-end on langgraph 1.x | #6 + #7 | f0586a8, 3f0eb5f |
DEC-003 + DEC-010 |
| v1.5-B | Generic-noun pass — concept-leak ratchet 156 → 39 | #8 | 25e363c |
DEC-008 |
| v1.5-C | Per-agent LLM proof point — intake on Ollama Cloud, downstream on llm.default |
#9 | 54a830d |
DEC-006 |
| v1.5-D | 429 rate-limit retry + multi-provider integration driver | #10 | adefae6 |
DEC-009 |
Per-phase artefacts under .planning/phases/<NN>-<slug>/ (gitignored
working state; selected artefacts are committed for historical record).
Stack pick + scaffold + parity-port against the v1.4
/sessions/* REST + SSE/WebSocket API. ~1–2 weeks. The Streamlit
shell stays as the prototype until React reaches parity.
- Duplicate ToolCall audit rows. The gateway records the gated
tool under the FastMCP composite name (
local_remediation:apply_fix, colon form), the harvester records the same tool call under the LLM-visible name (local_remediation__apply_fix, double-underscore form). Cosmetic in the UI; matters if any consumer aggregates tool counts. Fix: align both on the__form. ~30 min. ApprovalWatchdogregression test. PR #6 added gateway saves on resolution transitions; the watchdog should observe a faster cleanup signal but no focused test was added. ~15 min.ASR_LOG_LEVELenv var documentation. Added in PR #6, no README mention. One-line doc fix.src/runtime/locks.py:49—TODO(v2). Evict idle slots to cap memory in long-running servers. Real concern for production; not urgent for HITL-paced workloads.
| You want to… | Look at |
|---|---|
| Add a new skill | examples/<app>/skills/<name>/{config.yaml, system.md} |
| Add a new app | New folder under examples/; subclass Session in state.py; declare App*Config in config.py; write MCP servers and skills |
| Add a tool | App's mcp_server.py; register in YAML; gateway picks up risk policy from cfg.runtime.gateway.policy |
| Change LLM provider | config/config.yaml llm.providers / llm.models; per-agent override on skill.model |
| Change HITL policy | cfg.orchestrator.gate_policy (cross-cutting), cfg.runtime.gateway.policy (per-tool) |
| Trace one session end-to-end | EventLog rows for that session_id; agents_run and tool_calls on the row; session_events table |
| Update the bundle | uv run python scripts/build_single_file.py; commit dist/* |
| Add a new framework module | RUNTIME_MODULE_ORDER in scripts/build_single_file.py (after deps); regen + commit |
| Run live LLM tests | Set OLLAMA_API_KEY + OLLAMA_BASE_URL, OPENROUTER_API_KEY, AZURE_OPENAI_KEY + AZURE_ENDPOINT; uv run pytest tests/test_integration_driver_s1.py -v |
| Reset state for a fresh run | rm /tmp/asr.db /tmp/asr.db-{wal,shm}; rm -rf /tmp/asr-faiss then restart |
docs/DESIGN.md(this file) — architecture, abstractions, decisions, milestone history.docs/DEVELOPMENT.md— day-to-day contributor loop (setup, bundle regeneration, adding modules).docs/AIRGAP_INSTALL.md— corporate-mirror install procedure.README.md(repo root) — one-screen overview pointing at the three docs above.examples/incident_management/README.md— incident-management app surface; per-skill prompts underskills/.examples/code_review/README.md— code-review app surface; per-skill prompts underskills/..planning/(gitignored) — working state for the GSD planning workflow (STATE.md,ROADMAP.md,phases/<NN>-<slug>/). Not shipped; selected phase artefacts are committed for the historical record.