feat(runtimes): native declarative runtimes to express cortex-class pipelines as JSON (reduce python-func reliance)

## Summary

cortex-class pipelines (turn a GitHub issue into a merged PR) run today as **imperative `python-func` code** living in the external `dap-cortex` package. A portable DAP bundle can only *reference* that code (`runtime_id: python-func`, `callable_path: cortex.nodes.coder:run`) — it can't *contain* the behaviour, so the JSON does nothing without the package installed in the engine venv.

`python-func` is DAP's deliberate escape hatch for imperative logic, and that's fine. But a large slice of what cortex does is **generic** (git, GitHub, guards, retry, sandboxed exec) and could be **native, declaratively-configured DAP runtimes**. If we add them, cortex-class pipelines become *near-pure JSON* (graph + agents that reference built-in runtimes + `backend_profiles`), with `python-func` left only for truly bespoke bits.

This epic captures (A) the runtimes/primitives that would make cortex expressible declaratively, (B) the keystone "tool-calling agent" runtime, and (C) a brainstorm of new runtimes beyond what cortex even has.

## Background

- DAP runtimes today: `api-call`, `bash`, `http`, `claude-code`, `gemini-cli`, `codex`, `aider`, `python-func` (adapters in `packages/runtimes/src/dap_runtimes/adapters/`, contract in `base.py`; dashboard schemas in `apps/dashboard/src/components/agents/runtime-config-schemas.ts`).
- Per-node LLM selection already exists: `backend_profiles` (`available` + `agent_assignments`), resolved at run time in `apps/engine/src/dap_engine/execution/backend_profiles.py`.
- Conditional routing already exists (edge `comparison`/`logical` conditions reading state). What's missing for cortex-class flows is the **node-body capabilities** that *write* that state.

Related: #739 (managed/bundled cortex agents), `area:cortex-integration`.

## Goals

- Add generic, reusable runtimes so cortex-class behaviour is authorable as JSON config, not external code.
- Each runtime: a `BaseAdapter` implementation + registry entry + dashboard runtime-config schema + TDD tests + docs.
- Keep the engine generic — these are domain-agnostic primitives (git/GitHub/exec/retry), not cortex-specific.

## Non-goals

- Deleting `python-func` — it stays the escape hatch for genuinely bespoke logic (e.g. dispatcher issue-decomposition).
- Re-implementing cortex inside this repo. We provide primitives; a thin bundle composes them.
- Absorbing cortex's exact prompts/policies — those stay data (`prompt_template`, `backend_profiles`).

---

## Part A — Runtimes/primitives to make cortex declarative

Each is an independently shippable slice.

- [ ] **`github` runtime** — declarative GitHub ops: `op: read_issue | comment | update_issue_section | create_branch | open_pr | merge_pr | read_pr`, params templated from state; token via DAP env layering (role-separated tokens map to instance/project env vars). Replaces the GitHub half of every cortex node.
- [ ] **`git`/workspace-aware code runtime** — either a dedicated `git` runtime or config flags on the existing code runtimes (`claude-code`/`codex`/`aider`): `{ branch, base, push, workspace }` so the branch/checkout/push lifecycle is declarative instead of hand-coded in `execution.py`.
- [ ] **Declarative guards** (flags on the code/git runtime): `require_nonempty_diff` (silent-zero-output), `append_only` (reflog rewrite guard), `ancestry_guard` (no force-push over operator commits). Lifted 1:1 from `cortex/nodes/execution.py`.
- [ ] **Backend fallback chain** — extend `backend_profiles` with `fallback: [...]` (cortex already has this in `agents.yaml`). Engine tries the chain on `BackendError`.
- [ ] **Retry-policy primitive** — declarative loop: `{ max_retries, on_failure: <route>, feedback_into: <state field> }` so the tester→retry→coder loop (and its `*_status` short-circuits, see cortex `_route_after_tester`) is config, not Python. Needs a small condition/expression story for multi-field guards.
- [ ] **Workspace introspection into state** — runtime exposes `git diff`/changed-files/commits to `state` so downstream `prompt_template` (Jinja) and edge conditions can use them declaratively (cortex's `_collect_commits`, retry-context).
- [ ] **Per-task budgets** — surface `max_turns`/`timeout` per node/task in config (partially exists). 

## Part B — Keystone: tool-calling agent runtime

- [ ] **`agent`/`tool-loop` runtime** — an LLM node that runs an agentic loop with a **declarative toolset composed of other DAP runtimes** (e.g. `tools: [github, git, bash-sandboxed]`). This is the real unlock: "an agent that reads the issue, edits code, runs tests, opens a PR" becomes JSON that wires built-in tools, instead of bespoke `python-func`. Bounded by `max_turns`/budget; every tool call audited.

## Part C — New runtimes worth exploring (beyond what cortex has)

Group as exploration; spin promising ones into their own issues.

**Execution & safety**
- [ ] **Sandboxed exec runtime** — run code in an isolated container (not in-process like cortex's `bash`, which runs with engine privileges). A genuine security upgrade cortex lacks.
- [ ] **Cost/budget guard primitive** — halt/reroute when cumulative spend exceeds a cap.
- [ ] **Circuit-breaker / timeout policy primitives** — declarative resilience around flaky external calls.

**Composition & scale**
- [ ] **Sub-pipeline / `call` runtime** — a node that invokes another pipeline (reuse/modularity). Huge for building libraries of flows.
- [ ] **Map / fan-out runtime** — run a sub-step over a list (parallel map over files/tasks/items) and gather results.

**Data & retrieval**
- [ ] **SQL/database runtime** — parameterized query → results into state (data pipelines).
- [ ] **RAG/retrieval runtime** — embed a query + retrieve from a vector store, inject context into the next node (grounded generation).
- [ ] **Browser/Playwright runtime** — drive/scrape/test the web, screenshots → state (E2E, web research).
- [ ] **Code-search / static-analysis runtime** — grep/AST/semgrep over a repo → findings into state (feeds reviewer nodes).

**Human & integration**
- [ ] **Human-input / form gate** — collect *structured* input from a human into state (beyond the binary approve/reject gate).
- [ ] **Wait / poll runtime** — block until an external condition holds (CI run, deploy, queue) before proceeding.
- [ ] **Notification runtime** — Slack/email/webhook send, declarative.
- [ ] **Cloud/infra runtime** — `kubectl`/`terraform`/`docker` ops with guards.

**Quality**
- [ ] **Judge / eval runtime** — LLM-as-judge with a rubric → score into state; generalizes the Code Review Council into a reusable gate.
- [ ] **Multi-agent ensemble / debate runtime** — run N agents, vote/synthesize.

## Suggested phasing

1. **Slice 1 (highest leverage):** `github` runtime + git/push/guard flags on a code runtime → ~80% of cortex's node bodies become declarative.
2. **Slice 2:** retry-policy primitive + workspace-diff-into-state + backend fallback → the tester/retry loop becomes config.
3. **Slice 3 (keystone):** tool-calling `agent` runtime composing the above as tools.
4. **Exploration:** pick 1-2 Part C runtimes by demand (sandboxed exec + sub-pipeline are strong candidates).

## Acceptance criteria (per slice)

- [ ] New adapter implements the `BaseAdapter` contract; registered in the runtime registry.
- [ ] Dashboard runtime-config schema added (`runtime-config-schemas.ts`) so the agent editor renders the config form.
- [ ] TDD: unit + integration tests (real behaviour, not file-existence checks — per `.claude/rules/infrastructure-quality.md`); no clear-text secret logging.
- [ ] Docs + a minimal example bundle showing the runtime in a pipeline.
- [ ] `ruff`/`mypy`/`pytest` + dashboard `typecheck`/`build` green; Council review passes.

## Open questions

- Retry/guard multi-field conditions: extend edge conditions with a small expression language, or a dedicated retry-policy node? 
- Do git/GitHub belong as **standalone runtimes** or as **tools** consumed only by the Part B agent runtime? (Probably both: standalone for simple nodes, tools for the agent loop.)
- Sandboxing model for the exec runtime (container image, network policy, filesystem scope).

## Strategic note

This is a product decision as much as engineering: do we want the **engine** to grow an opinionated SDLC/DevOps toolkit (git/GitHub/guards), or stay generic and keep that domain in packages via `python-func`? Recommendation: build the **generic, domain-agnostic** primitives (github/git/exec/retry/sub-pipeline) — they serve far more than cortex — and let domain *policy* stay in JSON bundles.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(runtimes): native declarative runtimes to express cortex-class pipelines as JSON (reduce python-func reliance) #776

Summary

Background

Goals

Non-goals

Part A — Runtimes/primitives to make cortex declarative

Part B — Keystone: tool-calling agent runtime

Part C — New runtimes worth exploring (beyond what cortex has)

Suggested phasing

Acceptance criteria (per slice)

Open questions

Strategic note

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

feat(runtimes): native declarative runtimes to express cortex-class pipelines as JSON (reduce python-func reliance) #776

Description

Summary

Background

Goals

Non-goals

Part A — Runtimes/primitives to make cortex declarative

Part B — Keystone: tool-calling agent runtime

Part C — New runtimes worth exploring (beyond what cortex has)

Suggested phasing

Acceptance criteria (per slice)

Open questions

Strategic note

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions