Skip to content

feat(runtimes): native declarative runtimes to express cortex-class pipelines as JSON (reduce python-func reliance) #776

Description

@rafeekpro

Summary

cortex-class pipelines (turn a GitHub issue into a merged PR) run today as imperative python-func code living in the external dap-cortex package. A portable DAP bundle can only reference that code (runtime_id: python-func, callable_path: cortex.nodes.coder:run) — it can't contain the behaviour, so the JSON does nothing without the package installed in the engine venv.

python-func is DAP's deliberate escape hatch for imperative logic, and that's fine. But a large slice of what cortex does is generic (git, GitHub, guards, retry, sandboxed exec) and could be native, declaratively-configured DAP runtimes. If we add them, cortex-class pipelines become near-pure JSON (graph + agents that reference built-in runtimes + backend_profiles), with python-func left only for truly bespoke bits.

This epic captures (A) the runtimes/primitives that would make cortex expressible declaratively, (B) the keystone "tool-calling agent" runtime, and (C) a brainstorm of new runtimes beyond what cortex even has.

Background

  • DAP runtimes today: api-call, bash, http, claude-code, gemini-cli, codex, aider, python-func (adapters in packages/runtimes/src/dap_runtimes/adapters/, contract in base.py; dashboard schemas in apps/dashboard/src/components/agents/runtime-config-schemas.ts).
  • Per-node LLM selection already exists: backend_profiles (available + agent_assignments), resolved at run time in apps/engine/src/dap_engine/execution/backend_profiles.py.
  • Conditional routing already exists (edge comparison/logical conditions reading state). What's missing for cortex-class flows is the node-body capabilities that write that state.

Related: #739 (managed/bundled cortex agents), area:cortex-integration.

Goals

  • Add generic, reusable runtimes so cortex-class behaviour is authorable as JSON config, not external code.
  • Each runtime: a BaseAdapter implementation + registry entry + dashboard runtime-config schema + TDD tests + docs.
  • Keep the engine generic — these are domain-agnostic primitives (git/GitHub/exec/retry), not cortex-specific.

Non-goals

  • Deleting python-func — it stays the escape hatch for genuinely bespoke logic (e.g. dispatcher issue-decomposition).
  • Re-implementing cortex inside this repo. We provide primitives; a thin bundle composes them.
  • Absorbing cortex's exact prompts/policies — those stay data (prompt_template, backend_profiles).

Part A — Runtimes/primitives to make cortex declarative

Each is an independently shippable slice.

  • github runtime — declarative GitHub ops: op: read_issue | comment | update_issue_section | create_branch | open_pr | merge_pr | read_pr, params templated from state; token via DAP env layering (role-separated tokens map to instance/project env vars). Replaces the GitHub half of every cortex node.
  • git/workspace-aware code runtime — either a dedicated git runtime or config flags on the existing code runtimes (claude-code/codex/aider): { branch, base, push, workspace } so the branch/checkout/push lifecycle is declarative instead of hand-coded in execution.py.
  • Declarative guards (flags on the code/git runtime): require_nonempty_diff (silent-zero-output), append_only (reflog rewrite guard), ancestry_guard (no force-push over operator commits). Lifted 1:1 from cortex/nodes/execution.py.
  • Backend fallback chain — extend backend_profiles with fallback: [...] (cortex already has this in agents.yaml). Engine tries the chain on BackendError.
  • Retry-policy primitive — declarative loop: { max_retries, on_failure: <route>, feedback_into: <state field> } so the tester→retry→coder loop (and its *_status short-circuits, see cortex _route_after_tester) is config, not Python. Needs a small condition/expression story for multi-field guards.
  • Workspace introspection into state — runtime exposes git diff/changed-files/commits to state so downstream prompt_template (Jinja) and edge conditions can use them declaratively (cortex's _collect_commits, retry-context).
  • Per-task budgets — surface max_turns/timeout per node/task in config (partially exists).

Part B — Keystone: tool-calling agent runtime

  • agent/tool-loop runtime — an LLM node that runs an agentic loop with a declarative toolset composed of other DAP runtimes (e.g. tools: [github, git, bash-sandboxed]). This is the real unlock: "an agent that reads the issue, edits code, runs tests, opens a PR" becomes JSON that wires built-in tools, instead of bespoke python-func. Bounded by max_turns/budget; every tool call audited.

Part C — New runtimes worth exploring (beyond what cortex has)

Group as exploration; spin promising ones into their own issues.

Execution & safety

  • Sandboxed exec runtime — run code in an isolated container (not in-process like cortex's bash, which runs with engine privileges). A genuine security upgrade cortex lacks.
  • Cost/budget guard primitive — halt/reroute when cumulative spend exceeds a cap.
  • Circuit-breaker / timeout policy primitives — declarative resilience around flaky external calls.

Composition & scale

  • Sub-pipeline / call runtime — a node that invokes another pipeline (reuse/modularity). Huge for building libraries of flows.
  • Map / fan-out runtime — run a sub-step over a list (parallel map over files/tasks/items) and gather results.

Data & retrieval

  • SQL/database runtime — parameterized query → results into state (data pipelines).
  • RAG/retrieval runtime — embed a query + retrieve from a vector store, inject context into the next node (grounded generation).
  • Browser/Playwright runtime — drive/scrape/test the web, screenshots → state (E2E, web research).
  • Code-search / static-analysis runtime — grep/AST/semgrep over a repo → findings into state (feeds reviewer nodes).

Human & integration

  • Human-input / form gate — collect structured input from a human into state (beyond the binary approve/reject gate).
  • Wait / poll runtime — block until an external condition holds (CI run, deploy, queue) before proceeding.
  • Notification runtime — Slack/email/webhook send, declarative.
  • Cloud/infra runtimekubectl/terraform/docker ops with guards.

Quality

  • Judge / eval runtime — LLM-as-judge with a rubric → score into state; generalizes the Code Review Council into a reusable gate.
  • Multi-agent ensemble / debate runtime — run N agents, vote/synthesize.

Suggested phasing

  1. Slice 1 (highest leverage): github runtime + git/push/guard flags on a code runtime → ~80% of cortex's node bodies become declarative.
  2. Slice 2: retry-policy primitive + workspace-diff-into-state + backend fallback → the tester/retry loop becomes config.
  3. Slice 3 (keystone): tool-calling agent runtime composing the above as tools.
  4. Exploration: pick 1-2 Part C runtimes by demand (sandboxed exec + sub-pipeline are strong candidates).

Acceptance criteria (per slice)

  • New adapter implements the BaseAdapter contract; registered in the runtime registry.
  • Dashboard runtime-config schema added (runtime-config-schemas.ts) so the agent editor renders the config form.
  • TDD: unit + integration tests (real behaviour, not file-existence checks — per .claude/rules/infrastructure-quality.md); no clear-text secret logging.
  • Docs + a minimal example bundle showing the runtime in a pipeline.
  • ruff/mypy/pytest + dashboard typecheck/build green; Council review passes.

Open questions

  • Retry/guard multi-field conditions: extend edge conditions with a small expression language, or a dedicated retry-policy node?
  • Do git/GitHub belong as standalone runtimes or as tools consumed only by the Part B agent runtime? (Probably both: standalone for simple nodes, tools for the agent loop.)
  • Sandboxing model for the exec runtime (container image, network policy, filesystem scope).

Strategic note

This is a product decision as much as engineering: do we want the engine to grow an opinionated SDLC/DevOps toolkit (git/GitHub/guards), or stay generic and keep that domain in packages via python-func? Recommendation: build the generic, domain-agnostic primitives (github/git/exec/retry/sub-pipeline) — they serve far more than cortex — and let domain policy stay in JSON bundles.

Metadata

Metadata

Assignees

No one assigned

    Labels

    area:cortex-integrationDAP <-> cortex boundary, pipeline bundlearea:engineCode in apps/engine/*area:runtimesCode in packages/runtimes/*enhancementNew feature or requestsize:XL> 3 days or cross-cutting; usually wants an epic split

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions