Skip to content

feat(kernel): PR-D — sub-agents#17

Merged
rtpa25 merged 20 commits into
mainfrom
feat/sub-agents
May 15, 2026
Merged

feat(kernel): PR-D — sub-agents#17
rtpa25 merged 20 commits into
mainfrom
feat/sub-agents

Conversation

@rtpa25

@rtpa25 rtpa25 commented May 15, 2026

Copy link
Copy Markdown
Owner

Implements the sub-agent substrate per docs/superpowers/specs/2026-05-14-sub-agents-design.md.

Second of three slices: Tasks (PR-C ✅) → Sub-agents (this PR) → TUI rendering (PR-E).

What ships

  • spawn_sub_agent tool with { model: 'opus'|'sonnet'|'haiku', prompt, task_id? }{ runId, finalText }. Wrapped via wrappedTool({ touchesFS: false }).
  • Same-process execution. Nested streamText inside spawn_sub_agent.execute(). Sub-agent shares kernel.db + env bindings directly — no marshalling, no isolate boundary.
  • Sub-agent tool surface = main's tools minus spawn_sub_agent (no recursion) minus manage_tasks (sub-agents execute, parent plans). Deny-list pattern via destructured rename.
  • Sub-agent context: trimmed subset (<connected_accounts>, <files>, <computer_sandboxes>, <memory>, <current_date>) + conditional <task> + conditional <prior_attempts> (with full attempt history truncated to ~500 chars per attempt).
  • Schema additions (migration 0010_lumpy_black_widow.sql): run gains kind/parentRunId/taskId/finalText; message + tool_call gain denormalized kind. New index idx_run_task_kind for retry-context queries; idx_message_thread_created replaced with kind-aware composite.
  • User-visible reads filter to kind='main' at 13+ sites: runChatTurn's history fetch, paused/complete broadcasts, onConnect initial sync, getSessionInfo count + page, hybridSearch BM25 join + row hydration, vector ingest consumer, <sessions_recent> block subqueries, forkAndCompact summarizer history. Producer-side enqueue untouched (consumer-side filter is DRY).
  • Sub-agent system prompt (SUB_AGENT_SYSTEM_PROMPT): ~700 tokens, task-focused. No <voice>, no <task_management>, no <roadmap>, no <style>. New <sub_agent_role> section with output guidance.
  • Main's step cap bumped 50 → 100; sub-agent step cap = 50.
  • Anthropic options for sub-agent: effort='high', sendReasoning=true, thinking={adaptive,summarized}, cacheControl.ttl='5m' at all 4 breakpoints.
  • PerTurnContext gains runKind ("main"/"sub" — stamped onto tool_call.kind) + clientTimezone (propagated through sub-agent context build).
  • Parallel sub-agents: AI SDK's native parallel tool calls work transparently — e.g. 8 spawn_sub_agent calls in one assistant step = 8 concurrent sub-agent runs in the same DO.

Plus a housekeeping pass (e849ce3): stripped PR-A/PR-B/PR-C stale references from apps/kernel/src/. Going forward the codebase doesn't track which PR a line came from — git blame is the right tool for "when did this change."

Pause semantics

When parent run is paused mid-sub-agent:

  • Sub-agent's run.status='paused' (transcript stays observable post-resume)
  • The spawn_sub_agent tool_call row marks status='error' via wrappedTool's re-throw path
  • Main agent sees the failure on its next history scan and can re-spawn

Smoke checklist — pending interactive testing

Deployed to agent-os.pandaronit25.workers.dev. Migration 0010 (0010_lumpy_black_widow.sql) applies automatically.

  • 1. Simple sub-agent invocation: "Use a sub-agent to research Cloudflare Durable Objects, summarize in 3 sentences." → verify run row with kind='sub', parentRunId populated, finalText non-empty.
  • 2. Sub-agent without task_id returns answer to main: same run as Feat/r2-v1 #1; verify main's reply references findings, run.taskId IS NULL.
  • 3. Sub-agent WITH task_id auto-completes task: create task, spawn sub-agent with task_id; verify task.status='complete', task.result populated.
  • 4.Parallel sub-agents in one step: "Spawn 3 sub-agents to fetch current weather in Tokyo, Berlin, São Paulo." → verify 3 sub-runs with overlapping startedAt timestamps.
  • 5. Cross-thread task_id rejected: "belongs to a different thread" error.
  • 6. Sub-agent's tool calls have kind='sub'.
  • 7. Retry context populated: re-spawn on same task; verify <prior_attempts> block contains first attempt's finalText.
  • 8. Sessions tools filter sub-agent transcripts: sessions_search returns NO matches against sub-agent internal text.

Out of scope (PR-E)

  • TUI rendering of <tasks> as a status panel (Claude Code style)
  • TUI rendering of sub-agent transcripts (collapsible drill-in under spawn_sub_agent tool call)
  • Live streaming of sub-agent events to TUI
  • Unified custom-renderer pattern in apps/cli/src/components/tools/

Reviews (subagent-driven, opus)

All 8 implementation tasks + Task 0 (cleanup) went through spec-compliance + code-quality review per superpowers:subagent-driven-development. Three iterations during Task 3 (filter completeness): initial pass caught 5 sites, follow-up pass added 3 more (broadcasts + onConnect), final pass added 2 more (sessions-recent subqueries + forkAndCompact summarizer history) — totaling 13 user-visible read sites filtered.

Final whole-PR review approved with 5 Minor follow-up items (SQLite ALTER TABLE FK-cascade limitation, defensive consistency nits) — none blocking.

🤖 Generated with Claude Code

rtpa25 and others added 16 commits May 14, 2026 20:02
spawn_sub_agent tool that runs a fresh agent loop in-process within the
Kernel DO. Sub-agents share kernel.db + env bindings directly (no
marshalling), get most of main's tool surface minus spawn_sub_agent
(no recursion) and minus manage_tasks (sub-agents execute, parent plans).

Key decisions:
- Same-process execution (nested streamText), not Worker Loader isolate
- Depth 1 only — no recursive spawning
- Single message/tool_call tables + denormalized kind column ("main"|"sub")
- Each sub-agent gets own run row with parentRunId + optional taskId FK
- Parallel sub-agents in one step (AI SDK native parallel tool calls)
- Sub-agent context blocks trimmed + conditional <task> + <prior_attempts>
- finalText denormalized on run for fast retry-context lookup
- Vector ingestion and sessions tools filter to kind="main" only

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Three revisions after spec review:

1. Pause semantics on parent abort: sub-agent run.status = 'paused'
   (mirrors parent's resulting state). spawn_sub_agent tool_call row
   separately marks 'error' so main agent sees the failure on resume.

2. Step caps: main bumped 50 → 100 (was hitting cap on complex turns);
   sub-agent at 50 (was 20). Half-of-main heuristic preserved.

3. Model enum: tier names (opus/sonnet/haiku) instead of concrete IDs.
   Internal MODEL_ROUTE constant maps tier → current model ID.
   When opus 4.8 ships, one-line change vs schema migration.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
8 tasks: schema+migration → PerTurnContext additions → user-visible
read filters → sub-agent system prompt → sub-agent context builder →
spawn_sub_agent tool → bump main step cap + register → deploy+smoke+PR.

Each task self-contained with full code, exact paths, typecheck-as-test
gate, individual commit. No TDD scaffolding (per project convention).
Final smoke checklist runs 8 scenarios against deployed kernel + d1
verifications.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
We denormalized kind onto message precisely so we can filter in the
where clause. The JS-side filter was wasteful — fetched sub-agent rows
from D1 just to throw them away. SQL filter saves the row-read budget
and the JS pass.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Two queries were doing select-* via Drizzle's findFirst/findMany default:

- deriveFinalText: only uses contentText; ship just that column.
- fetchPriorAttempts: only uses id/startedAt/finalText; ship just those.
  Local PriorAttempt = Pick<Run, ...> type captures the narrower shape;
  buildPriorAttemptsBlock signature updated to match.

Same principle as the prior 'SQL filter not JS filter' fix — minimize
data over the D1 wire.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…task

Two-part change:

1. Removed all PR-letter and 'NEW' marker comments from code samples
   in both the spec and the plan. These don't add value once code
   lands — git blame is the right tool for 'when did this change'.
   JSDoc explaining WHY a field exists is kept; sticker comments
   announcing newness are not.

2. Added Task 0 (Repo hygiene) at the front of the plan: grep the
   source for PR-A/PR-B/PR-C references and strip them. PR-C was
   merged 2026-05-14 and several comments still reference it in
   apps/kernel/src/. Per-task implementer handles each match with
   contextual judgment (delete vs strip-parenthetical vs re-phrase).

3. New 'House-keeping rules' section in the plan tells future
   implementers: no PR-letter markers, no '// NEW' stickers, no
   '// MODIFIED' tags in source. JSDoc for WHY is fine.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
PR-A / PR-B / PR-C markers were useful during development but become
noise once those PRs are merged. Removed throughout. JSDoc/comments
describing substantive WHY logic are kept; only the PR-N tags and
'NEW (PR-N)' sticker comments are stripped.

git blame is the right tool for 'when did this change' — comments
are for 'why does this exist'.
PR-D foundational schema:
- run: new columns kind (main|sub), parentRunId (self-FK cascade),
  taskId (FK to task, SET NULL), finalText (nullable text).
  New index idx_run_task_kind for retry-context lookups.
- message: new kind column (denormalized from run for hot-path filter).
  Old idx_message_thread_created replaced with kind-aware composite
  idx_message_thread_kind_created.
- tool_call: new kind column (denormalized for analytical filtering).

Also annotates tasks.runId FK thunk with AnySQLiteColumn return type
to break the run<->task circular type inference now that run.taskId
references task.id.

All pre-PR-D rows take kind='main' via DEFAULT. No backfill needed.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Two new required fields on PerTurnContext:
- runKind: 'main' | 'sub' — stamped on tool_call.kind at INSERT time
  via wrappedTool's onInputAvailable hook
- clientTimezone — propagates from the parent run's chat-request frame;
  needed by buildCurrentDateBlock in main's context and by the sub-agent
  context builder (Tasks 5/6)

runChatTurn sets runKind='main' and passes clientTimezone through from
its existing function parameter.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Four filter additions at the SQL layer to keep sub-agent transcripts
out of user-facing surfaces:

- runChatTurn's messageRows fetch: AND kind='main' so the model's view
  of the conversation excludes sub-agent internal monologue.
- getSessionInfo (backs session_info tool): kind='main' on both the
  count/last-activity aggregate and the message page query.
- hybridSearch (backs sessions_search tool): kind='main' on the BM25
  recall join (FTS5 trigger mirrors every kind, so the filter lands on
  the message join) AND on the row-hydration query as defense-in-depth.
- Vector ingestion consumer: skip messages where kind='sub' before
  embedding. Producer-side enqueue is unchanged (consumer-side filter
  is DRY across all three current producers — handlers.ts, turn.ts,
  turn-compaction.ts).

sessions.ts itself is not modified; its tools delegate to kernel
methods (searchMessages → hybridSearch, getSessionInfo) where the
actual SQL lives.

Sub-agent transcripts remain reachable for retry-context queries via
run.taskId + kind='sub' (separate code path, lands in Task 5).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Three additional message-fetch sites broadcast to the TUI but lacked
the kind='main' filter added in c33450e:

- turn.ts:~445 — broadcast after paused-save
- turn.ts:~540 — broadcast after turn completion
- kernel.ts:~576 — onConnect initial WS sync

Without these filters, sub-agent messages would leak to the TUI as soon
as Task 6's spawn_sub_agent starts persisting to the message table.
Per spec §1, sub-agents run invisibly from the user's POV in PR-D
(PR-E adds proper rendering surfaces).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Two more message-read sites missed in the plan's enumeration but
caught by spec review:

- sessions-recent-block.ts: three subqueries (count + first/last user
  preview) within the <sessions_recent> context block builder. Without
  kind='main' filter, sub-agent's INSERTed prompt user-messages would
  leak into the 'last user preview' line of another thread's row in
  main agent's context.

- turn-compaction.ts:78: forkAndCompact's history fetch for the
  Anthropic compaction summarizer. Without kind='main', sub-agent
  rows would conflate with main's conversation history in the
  generated <previous_thread_summary>.

(turn-compaction.ts:223, the INSERT producer side, is intentionally
NOT modified per the consumer-side filter strategy.)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Smaller, task-focused variant of main's SYSTEM_PROMPT. Strips <voice>
(output goes to LLM not human), <task_management> (no manage_tasks),
<roadmap> (no user to admit to), <style> (no TUI). Keeps <honesty>,
<capabilities>, <memory_and_skills>, <sessions>. Adds new <sub_agent_role>
section explaining scope, task/prior-attempt handling, and output shape.

~700 tokens. Cached at b1; paid once per fresh cache across all
sub-agent invocations using the same model.
buildSubAgentContextMessages: trimmed subset of main's blocks (memory,
current_date, connected_accounts, files, computer_sandboxes) plus
conditional <task> (when task_id provided) and <prior_attempts>
(when retry — prior sub-agent runs for the same task with non-null
finalText, oldest-first, truncated to ~500 chars each).

deriveFinalText: read last assistant message from a run; used by
spawn_sub_agent after streamText finishes to derive sub-agent's
final answer for run.finalText + task.result.
Single write tool: { model: 'opus'|'sonnet'|'haiku', prompt, task_id? }.
Returns { runId, finalText } so main agent gets the sub-agent's answer
plus a handle for observability.

Same-process execution: nested streamText inside execute(). Sub-agent
shares kernel.db + env bindings directly (no marshalling). Sub-agent's
tool surface = main minus spawn_sub_agent + minus manage_tasks (via
buildSubAgentTools deny-list — landing in Task 7).

Sub-agent persistence: prompt INSERTed before streamText; assistant +
tool messages INSERTed via toUIMessageStream onFinish at end-of-stream
(mirrors main's pattern in turn.ts).

Pause semantics: parent abort → sub-agent run.status='paused', tool_call
row marked 'error' via wrappedTool re-throw. Hard error → run.status='error'.

Cross-thread task_id rejected. Step cap = 50. Cache TTL = '5m' at all
4 breakpoints. Anthropic effort='high' (one tier below main's xhigh).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Two changes in lockstep:
- turn.ts: stopWhen stepCountIs(50) → stepCountIs(100). Main was hitting
  the 50 cap on complex multi-step turns; bumping gives it room. Sub-agent
  inherits the original 50 (half of main) via spawn_sub_agent's own
  stopWhen.
- tools/index.ts: register spawn_sub_agent in buildTools spread (before
  get_time so withTailCache's b1 marker stays on get_time). Add
  buildSubAgentTools factory — deny-list filter that strips
  spawn_sub_agent (no recursion) and manage_tasks (sub-agents execute,
  parent plans). Consumed by spawn-sub-agent.ts's nested streamText.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Comment thread apps/kernel/src/index.ts
Comment thread apps/kernel/src/agent/derive-final-text.ts Outdated
rtpa25 and others added 4 commits May 15, 2026 14:09
…FinalText

Two PR-D review comments addressed:

1. /api/__debug/search now 403s in production. Added NODE_ENV var
   to wrangler.jsonc (default 'production'), updated .dev.vars.example
   to suggest NODE_ENV=development locally. Cast through string in
   the comparison because wrangler types narrows the literal value
   from wrangler.jsonc and TS would flag the !== check as always-true.
   Regenerated worker-configuration.d.ts via wrangler types.

2. Removed apps/kernel/src/agent/derive-final-text.ts. Inlined the
   helper as a module-private function at the bottom of
   apps/kernel/src/tools/spawn-sub-agent.ts (its sole caller). JSDoc
   preserved verbatim — the race-safety rationale is still useful at
   the call site. Imports widened in spawn-sub-agent.ts: added and,
   desc, type DB to the @agent-os/models line.
CF dashboard's Workers Logs panel will now capture console.log /
console.error from all execution contexts — including Durable Object
methods, WebSocket onMessage handlers, and async callbacks like
streamText.onFinish. Broader coverage than 'wrangler tail' which
misses DO message-handler logs.

invocation_logs emits a structured entry per request with timing +
outcome — useful for 'did this fire at all?' baseline checks during
debugging.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Anthropic returns 'adaptive thinking is not supported on this model'
when spawn_sub_agent uses model='haiku'. Both opus and sonnet support
extended thinking; haiku (the fast tier) doesn't.

Conditional providerOptions: for haiku, send only sendReasoning:true
(safe no-op since haiku won't emit reasoning content anyway). For
opus/sonnet, keep effort='high' + thinking={adaptive,summarized}.

Surfaced via Workers Observability log of AI_APICallError on a haiku
sub-agent invocation during smoke.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
… status

Main agent's evaluation step was being short-circuited. spawn_sub_agent's
execute() was doing:
  UPDATE task SET status='complete', result=finalText WHERE id=?

This conflated 'sub-agent delivered an answer' with 'task is complete'.
Two distinct events:
- Sub-agent finishing = a fact (LLM returned text)
- Task being complete = a judgment (the answer satisfies the task)

Auto-marking robbed main of the judgment step and broke the retry
workflow: main would see task=complete and never re-evaluate, even when
the answer was inadequate. To retry, main would have had to first
manage_tasks(update, status='in_progress') to revert the auto-mark —
friction the design never intended.

Fix: spawn_sub_agent now writes only task.result. Status is unchanged.
Main reads the result, evaluates, then either:
  - marks the task complete/failed/cancelled via manage_tasks, OR
  - re-spawns the sub-agent on the same task_id with retry guidance
    (the previous answer stays in task.result and surfaces as
    <prior_attempts> in the next sub-agent's context)

Tool description updated to teach the model this flow. Spec §1, §3, §9
and plan Task 6 + smoke scenario 3 updated to match.

Surfaced during PR-D smoke scenario 3 — Ronit noticed the task was
'complete' before main agent had even shown its analysis of the result.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@rtpa25 rtpa25 merged commit 2c9df4f into main May 15, 2026
3 checks passed
@rtpa25 rtpa25 deleted the feat/sub-agents branch May 15, 2026 10:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant