feat(kernel): PR-D — sub-agents by rtpa25 · Pull Request #17 · rtpa25/agent-os

rtpa25 · 2026-05-15T06:15:27Z

Implements the sub-agent substrate per docs/superpowers/specs/2026-05-14-sub-agents-design.md.

Second of three slices: Tasks (PR-C ✅) → Sub-agents (this PR) → TUI rendering (PR-E).

What ships

spawn_sub_agent tool with { model: 'opus'|'sonnet'|'haiku', prompt, task_id? } → { runId, finalText }. Wrapped via wrappedTool({ touchesFS: false }).
Same-process execution. Nested streamText inside spawn_sub_agent.execute(). Sub-agent shares kernel.db + env bindings directly — no marshalling, no isolate boundary.
Sub-agent tool surface = main's tools minus spawn_sub_agent (no recursion) minus manage_tasks (sub-agents execute, parent plans). Deny-list pattern via destructured rename.
Sub-agent context: trimmed subset (<connected_accounts>, <files>, <computer_sandboxes>, <memory>, <current_date>) + conditional <task> + conditional <prior_attempts> (with full attempt history truncated to ~500 chars per attempt).
Schema additions (migration 0010_lumpy_black_widow.sql): run gains kind/parentRunId/taskId/finalText; message + tool_call gain denormalized kind. New index idx_run_task_kind for retry-context queries; idx_message_thread_created replaced with kind-aware composite.
User-visible reads filter to kind='main' at 13+ sites: runChatTurn's history fetch, paused/complete broadcasts, onConnect initial sync, getSessionInfo count + page, hybridSearch BM25 join + row hydration, vector ingest consumer, <sessions_recent> block subqueries, forkAndCompact summarizer history. Producer-side enqueue untouched (consumer-side filter is DRY).
Sub-agent system prompt (SUB_AGENT_SYSTEM_PROMPT): ~700 tokens, task-focused. No <voice>, no <task_management>, no <roadmap>, no <style>. New <sub_agent_role> section with output guidance.
Main's step cap bumped 50 → 100; sub-agent step cap = 50.
Anthropic options for sub-agent: effort='high', sendReasoning=true, thinking={adaptive,summarized}, cacheControl.ttl='5m' at all 4 breakpoints.
PerTurnContext gains runKind ("main"/"sub" — stamped onto tool_call.kind) + clientTimezone (propagated through sub-agent context build).
Parallel sub-agents: AI SDK's native parallel tool calls work transparently — e.g. 8 spawn_sub_agent calls in one assistant step = 8 concurrent sub-agent runs in the same DO.

Plus a housekeeping pass (e849ce3): stripped PR-A/PR-B/PR-C stale references from apps/kernel/src/. Going forward the codebase doesn't track which PR a line came from — git blame is the right tool for "when did this change."

Pause semantics

When parent run is paused mid-sub-agent:

Sub-agent's run.status='paused' (transcript stays observable post-resume)
The spawn_sub_agent tool_call row marks status='error' via wrappedTool's re-throw path
Main agent sees the failure on its next history scan and can re-spawn

Smoke checklist — pending interactive testing

Deployed to agent-os.pandaronit25.workers.dev. Migration 0010 (0010_lumpy_black_widow.sql) applies automatically.

1. Simple sub-agent invocation: "Use a sub-agent to research Cloudflare Durable Objects, summarize in 3 sentences." → verify run row with kind='sub', parentRunId populated, finalText non-empty.
2. Sub-agent without task_id returns answer to main: same run as Feat/r2-v1 #1; verify main's reply references findings, run.taskId IS NULL.
3. Sub-agent WITH task_id auto-completes task: create task, spawn sub-agent with task_id; verify task.status='complete', task.result populated.
4. ⭐ Parallel sub-agents in one step: "Spawn 3 sub-agents to fetch current weather in Tokyo, Berlin, São Paulo." → verify 3 sub-runs with overlapping startedAt timestamps.
5. Cross-thread task_id rejected: "belongs to a different thread" error.
6. Sub-agent's tool calls have kind='sub'.
7. Retry context populated: re-spawn on same task; verify <prior_attempts> block contains first attempt's finalText.
8. Sessions tools filter sub-agent transcripts: sessions_search returns NO matches against sub-agent internal text.

Out of scope (PR-E)

TUI rendering of <tasks> as a status panel (Claude Code style)
TUI rendering of sub-agent transcripts (collapsible drill-in under spawn_sub_agent tool call)
Live streaming of sub-agent events to TUI
Unified custom-renderer pattern in apps/cli/src/components/tools/

Reviews (subagent-driven, opus)

All 8 implementation tasks + Task 0 (cleanup) went through spec-compliance + code-quality review per superpowers:subagent-driven-development. Three iterations during Task 3 (filter completeness): initial pass caught 5 sites, follow-up pass added 3 more (broadcasts + onConnect), final pass added 2 more (sessions-recent subqueries + forkAndCompact summarizer history) — totaling 13 user-visible read sites filtered.

Final whole-PR review approved with 5 Minor follow-up items (SQLite ALTER TABLE FK-cascade limitation, defensive consistency nits) — none blocking.

🤖 Generated with Claude Code

spawn_sub_agent tool that runs a fresh agent loop in-process within the Kernel DO. Sub-agents share kernel.db + env bindings directly (no marshalling), get most of main's tool surface minus spawn_sub_agent (no recursion) and minus manage_tasks (sub-agents execute, parent plans). Key decisions: - Same-process execution (nested streamText), not Worker Loader isolate - Depth 1 only — no recursive spawning - Single message/tool_call tables + denormalized kind column ("main"|"sub") - Each sub-agent gets own run row with parentRunId + optional taskId FK - Parallel sub-agents in one step (AI SDK native parallel tool calls) - Sub-agent context blocks trimmed + conditional <task> + <prior_attempts> - finalText denormalized on run for fast retry-context lookup - Vector ingestion and sessions tools filter to kind="main" only Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Three revisions after spec review: 1. Pause semantics on parent abort: sub-agent run.status = 'paused' (mirrors parent's resulting state). spawn_sub_agent tool_call row separately marks 'error' so main agent sees the failure on resume. 2. Step caps: main bumped 50 → 100 (was hitting cap on complex turns); sub-agent at 50 (was 20). Half-of-main heuristic preserved. 3. Model enum: tier names (opus/sonnet/haiku) instead of concrete IDs. Internal MODEL_ROUTE constant maps tier → current model ID. When opus 4.8 ships, one-line change vs schema migration. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

8 tasks: schema+migration → PerTurnContext additions → user-visible read filters → sub-agent system prompt → sub-agent context builder → spawn_sub_agent tool → bump main step cap + register → deploy+smoke+PR. Each task self-contained with full code, exact paths, typecheck-as-test gate, individual commit. No TDD scaffolding (per project convention). Final smoke checklist runs 8 scenarios against deployed kernel + d1 verifications. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

We denormalized kind onto message precisely so we can filter in the where clause. The JS-side filter was wasteful — fetched sub-agent rows from D1 just to throw them away. SQL filter saves the row-read budget and the JS pass. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Two queries were doing select-* via Drizzle's findFirst/findMany default: - deriveFinalText: only uses contentText; ship just that column. - fetchPriorAttempts: only uses id/startedAt/finalText; ship just those. Local PriorAttempt = Pick<Run, ...> type captures the narrower shape; buildPriorAttemptsBlock signature updated to match. Same principle as the prior 'SQL filter not JS filter' fix — minimize data over the D1 wire. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…task Two-part change: 1. Removed all PR-letter and 'NEW' marker comments from code samples in both the spec and the plan. These don't add value once code lands — git blame is the right tool for 'when did this change'. JSDoc explaining WHY a field exists is kept; sticker comments announcing newness are not. 2. Added Task 0 (Repo hygiene) at the front of the plan: grep the source for PR-A/PR-B/PR-C references and strip them. PR-C was merged 2026-05-14 and several comments still reference it in apps/kernel/src/. Per-task implementer handles each match with contextual judgment (delete vs strip-parenthetical vs re-phrase). 3. New 'House-keeping rules' section in the plan tells future implementers: no PR-letter markers, no '// NEW' stickers, no '// MODIFIED' tags in source. JSDoc for WHY is fine. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

PR-A / PR-B / PR-C markers were useful during development but become noise once those PRs are merged. Removed throughout. JSDoc/comments describing substantive WHY logic are kept; only the PR-N tags and 'NEW (PR-N)' sticker comments are stripped. git blame is the right tool for 'when did this change' — comments are for 'why does this exist'.

PR-D foundational schema: - run: new columns kind (main|sub), parentRunId (self-FK cascade), taskId (FK to task, SET NULL), finalText (nullable text). New index idx_run_task_kind for retry-context lookups. - message: new kind column (denormalized from run for hot-path filter). Old idx_message_thread_created replaced with kind-aware composite idx_message_thread_kind_created. - tool_call: new kind column (denormalized for analytical filtering). Also annotates tasks.runId FK thunk with AnySQLiteColumn return type to break the run<->task circular type inference now that run.taskId references task.id. All pre-PR-D rows take kind='main' via DEFAULT. No backfill needed. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Two new required fields on PerTurnContext: - runKind: 'main' | 'sub' — stamped on tool_call.kind at INSERT time via wrappedTool's onInputAvailable hook - clientTimezone — propagates from the parent run's chat-request frame; needed by buildCurrentDateBlock in main's context and by the sub-agent context builder (Tasks 5/6) runChatTurn sets runKind='main' and passes clientTimezone through from its existing function parameter. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Four filter additions at the SQL layer to keep sub-agent transcripts out of user-facing surfaces: - runChatTurn's messageRows fetch: AND kind='main' so the model's view of the conversation excludes sub-agent internal monologue. - getSessionInfo (backs session_info tool): kind='main' on both the count/last-activity aggregate and the message page query. - hybridSearch (backs sessions_search tool): kind='main' on the BM25 recall join (FTS5 trigger mirrors every kind, so the filter lands on the message join) AND on the row-hydration query as defense-in-depth. - Vector ingestion consumer: skip messages where kind='sub' before embedding. Producer-side enqueue is unchanged (consumer-side filter is DRY across all three current producers — handlers.ts, turn.ts, turn-compaction.ts). sessions.ts itself is not modified; its tools delegate to kernel methods (searchMessages → hybridSearch, getSessionInfo) where the actual SQL lives. Sub-agent transcripts remain reachable for retry-context queries via run.taskId + kind='sub' (separate code path, lands in Task 5). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Three additional message-fetch sites broadcast to the TUI but lacked the kind='main' filter added in c33450e: - turn.ts:~445 — broadcast after paused-save - turn.ts:~540 — broadcast after turn completion - kernel.ts:~576 — onConnect initial WS sync Without these filters, sub-agent messages would leak to the TUI as soon as Task 6's spawn_sub_agent starts persisting to the message table. Per spec §1, sub-agents run invisibly from the user's POV in PR-D (PR-E adds proper rendering surfaces). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Two more message-read sites missed in the plan's enumeration but caught by spec review: - sessions-recent-block.ts: three subqueries (count + first/last user preview) within the <sessions_recent> context block builder. Without kind='main' filter, sub-agent's INSERTed prompt user-messages would leak into the 'last user preview' line of another thread's row in main agent's context. - turn-compaction.ts:78: forkAndCompact's history fetch for the Anthropic compaction summarizer. Without kind='main', sub-agent rows would conflate with main's conversation history in the generated <previous_thread_summary>. (turn-compaction.ts:223, the INSERT producer side, is intentionally NOT modified per the consumer-side filter strategy.) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Smaller, task-focused variant of main's SYSTEM_PROMPT. Strips <voice> (output goes to LLM not human), <task_management> (no manage_tasks), <roadmap> (no user to admit to), <style> (no TUI). Keeps <honesty>, <capabilities>, <memory_and_skills>, <sessions>. Adds new <sub_agent_role> section explaining scope, task/prior-attempt handling, and output shape. ~700 tokens. Cached at b1; paid once per fresh cache across all sub-agent invocations using the same model.

buildSubAgentContextMessages: trimmed subset of main's blocks (memory, current_date, connected_accounts, files, computer_sandboxes) plus conditional <task> (when task_id provided) and <prior_attempts> (when retry — prior sub-agent runs for the same task with non-null finalText, oldest-first, truncated to ~500 chars each). deriveFinalText: read last assistant message from a run; used by spawn_sub_agent after streamText finishes to derive sub-agent's final answer for run.finalText + task.result.

Single write tool: { model: 'opus'|'sonnet'|'haiku', prompt, task_id? }. Returns { runId, finalText } so main agent gets the sub-agent's answer plus a handle for observability. Same-process execution: nested streamText inside execute(). Sub-agent shares kernel.db + env bindings directly (no marshalling). Sub-agent's tool surface = main minus spawn_sub_agent + minus manage_tasks (via buildSubAgentTools deny-list — landing in Task 7). Sub-agent persistence: prompt INSERTed before streamText; assistant + tool messages INSERTed via toUIMessageStream onFinish at end-of-stream (mirrors main's pattern in turn.ts). Pause semantics: parent abort → sub-agent run.status='paused', tool_call row marked 'error' via wrappedTool re-throw. Hard error → run.status='error'. Cross-thread task_id rejected. Step cap = 50. Cache TTL = '5m' at all 4 breakpoints. Anthropic effort='high' (one tier below main's xhigh). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Two changes in lockstep: - turn.ts: stopWhen stepCountIs(50) → stepCountIs(100). Main was hitting the 50 cap on complex multi-step turns; bumping gives it room. Sub-agent inherits the original 50 (half of main) via spawn_sub_agent's own stopWhen. - tools/index.ts: register spawn_sub_agent in buildTools spread (before get_time so withTailCache's b1 marker stays on get_time). Add buildSubAgentTools factory — deny-list filter that strips spawn_sub_agent (no recursion) and manage_tasks (sub-agents execute, parent plans). Consumed by spawn-sub-agent.ts's nested streamText. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…FinalText Two PR-D review comments addressed: 1. /api/__debug/search now 403s in production. Added NODE_ENV var to wrangler.jsonc (default 'production'), updated .dev.vars.example to suggest NODE_ENV=development locally. Cast through string in the comparison because wrangler types narrows the literal value from wrangler.jsonc and TS would flag the !== check as always-true. Regenerated worker-configuration.d.ts via wrangler types. 2. Removed apps/kernel/src/agent/derive-final-text.ts. Inlined the helper as a module-private function at the bottom of apps/kernel/src/tools/spawn-sub-agent.ts (its sole caller). JSDoc preserved verbatim — the race-safety rationale is still useful at the call site. Imports widened in spawn-sub-agent.ts: added and, desc, type DB to the @agent-os/models line.

CF dashboard's Workers Logs panel will now capture console.log / console.error from all execution contexts — including Durable Object methods, WebSocket onMessage handlers, and async callbacks like streamText.onFinish. Broader coverage than 'wrangler tail' which misses DO message-handler logs. invocation_logs emits a structured entry per request with timing + outcome — useful for 'did this fire at all?' baseline checks during debugging. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Anthropic returns 'adaptive thinking is not supported on this model' when spawn_sub_agent uses model='haiku'. Both opus and sonnet support extended thinking; haiku (the fast tier) doesn't. Conditional providerOptions: for haiku, send only sendReasoning:true (safe no-op since haiku won't emit reasoning content anyway). For opus/sonnet, keep effort='high' + thinking={adaptive,summarized}. Surfaced via Workers Observability log of AI_APICallError on a haiku sub-agent invocation during smoke. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

… status Main agent's evaluation step was being short-circuited. spawn_sub_agent's execute() was doing: UPDATE task SET status='complete', result=finalText WHERE id=? This conflated 'sub-agent delivered an answer' with 'task is complete'. Two distinct events: - Sub-agent finishing = a fact (LLM returned text) - Task being complete = a judgment (the answer satisfies the task) Auto-marking robbed main of the judgment step and broke the retry workflow: main would see task=complete and never re-evaluate, even when the answer was inadequate. To retry, main would have had to first manage_tasks(update, status='in_progress') to revert the auto-mark — friction the design never intended. Fix: spawn_sub_agent now writes only task.result. Status is unchanged. Main reads the result, evaluates, then either: - marks the task complete/failed/cancelled via manage_tasks, OR - re-spawns the sub-agent on the same task_id with retry guidance (the previous answer stays in task.result and surfaces as <prior_attempts> in the next sub-agent's context) Tool description updated to teach the model this flow. Spec §1, §3, §9 and plan Task 6 + smoke scenario 3 updated to match. Surfaced during PR-D smoke scenario 3 — Ronit noticed the task was 'complete' before main agent had even shown its analysis of the result. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

rtpa25 and others added 16 commits May 14, 2026 20:02

rtpa25 commented May 15, 2026

View reviewed changes

Comment thread apps/kernel/src/index.ts

Comment thread apps/kernel/src/agent/derive-final-text.ts Outdated

rtpa25 and others added 4 commits May 15, 2026 14:09

rtpa25 merged commit 2c9df4f into main May 15, 2026
3 checks passed

rtpa25 deleted the feat/sub-agents branch May 15, 2026 10:01

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(kernel): PR-D — sub-agents#17

feat(kernel): PR-D — sub-agents#17
rtpa25 merged 20 commits into
mainfrom
feat/sub-agents

rtpa25 commented May 15, 2026

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

rtpa25 commented May 15, 2026

What ships

Pause semantics

Smoke checklist — pending interactive testing

Out of scope (PR-E)

Reviews (subagent-driven, opus)

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant