feat(kernel): PR-D — sub-agents#17
Merged
Merged
Conversation
spawn_sub_agent tool that runs a fresh agent loop in-process within the
Kernel DO. Sub-agents share kernel.db + env bindings directly (no
marshalling), get most of main's tool surface minus spawn_sub_agent
(no recursion) and minus manage_tasks (sub-agents execute, parent plans).
Key decisions:
- Same-process execution (nested streamText), not Worker Loader isolate
- Depth 1 only — no recursive spawning
- Single message/tool_call tables + denormalized kind column ("main"|"sub")
- Each sub-agent gets own run row with parentRunId + optional taskId FK
- Parallel sub-agents in one step (AI SDK native parallel tool calls)
- Sub-agent context blocks trimmed + conditional <task> + <prior_attempts>
- finalText denormalized on run for fast retry-context lookup
- Vector ingestion and sessions tools filter to kind="main" only
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Three revisions after spec review: 1. Pause semantics on parent abort: sub-agent run.status = 'paused' (mirrors parent's resulting state). spawn_sub_agent tool_call row separately marks 'error' so main agent sees the failure on resume. 2. Step caps: main bumped 50 → 100 (was hitting cap on complex turns); sub-agent at 50 (was 20). Half-of-main heuristic preserved. 3. Model enum: tier names (opus/sonnet/haiku) instead of concrete IDs. Internal MODEL_ROUTE constant maps tier → current model ID. When opus 4.8 ships, one-line change vs schema migration. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
8 tasks: schema+migration → PerTurnContext additions → user-visible read filters → sub-agent system prompt → sub-agent context builder → spawn_sub_agent tool → bump main step cap + register → deploy+smoke+PR. Each task self-contained with full code, exact paths, typecheck-as-test gate, individual commit. No TDD scaffolding (per project convention). Final smoke checklist runs 8 scenarios against deployed kernel + d1 verifications. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
We denormalized kind onto message precisely so we can filter in the where clause. The JS-side filter was wasteful — fetched sub-agent rows from D1 just to throw them away. SQL filter saves the row-read budget and the JS pass. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Two queries were doing select-* via Drizzle's findFirst/findMany default: - deriveFinalText: only uses contentText; ship just that column. - fetchPriorAttempts: only uses id/startedAt/finalText; ship just those. Local PriorAttempt = Pick<Run, ...> type captures the narrower shape; buildPriorAttemptsBlock signature updated to match. Same principle as the prior 'SQL filter not JS filter' fix — minimize data over the D1 wire. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…task Two-part change: 1. Removed all PR-letter and 'NEW' marker comments from code samples in both the spec and the plan. These don't add value once code lands — git blame is the right tool for 'when did this change'. JSDoc explaining WHY a field exists is kept; sticker comments announcing newness are not. 2. Added Task 0 (Repo hygiene) at the front of the plan: grep the source for PR-A/PR-B/PR-C references and strip them. PR-C was merged 2026-05-14 and several comments still reference it in apps/kernel/src/. Per-task implementer handles each match with contextual judgment (delete vs strip-parenthetical vs re-phrase). 3. New 'House-keeping rules' section in the plan tells future implementers: no PR-letter markers, no '// NEW' stickers, no '// MODIFIED' tags in source. JSDoc for WHY is fine. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
PR-A / PR-B / PR-C markers were useful during development but become noise once those PRs are merged. Removed throughout. JSDoc/comments describing substantive WHY logic are kept; only the PR-N tags and 'NEW (PR-N)' sticker comments are stripped. git blame is the right tool for 'when did this change' — comments are for 'why does this exist'.
PR-D foundational schema: - run: new columns kind (main|sub), parentRunId (self-FK cascade), taskId (FK to task, SET NULL), finalText (nullable text). New index idx_run_task_kind for retry-context lookups. - message: new kind column (denormalized from run for hot-path filter). Old idx_message_thread_created replaced with kind-aware composite idx_message_thread_kind_created. - tool_call: new kind column (denormalized for analytical filtering). Also annotates tasks.runId FK thunk with AnySQLiteColumn return type to break the run<->task circular type inference now that run.taskId references task.id. All pre-PR-D rows take kind='main' via DEFAULT. No backfill needed. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Two new required fields on PerTurnContext: - runKind: 'main' | 'sub' — stamped on tool_call.kind at INSERT time via wrappedTool's onInputAvailable hook - clientTimezone — propagates from the parent run's chat-request frame; needed by buildCurrentDateBlock in main's context and by the sub-agent context builder (Tasks 5/6) runChatTurn sets runKind='main' and passes clientTimezone through from its existing function parameter. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Four filter additions at the SQL layer to keep sub-agent transcripts out of user-facing surfaces: - runChatTurn's messageRows fetch: AND kind='main' so the model's view of the conversation excludes sub-agent internal monologue. - getSessionInfo (backs session_info tool): kind='main' on both the count/last-activity aggregate and the message page query. - hybridSearch (backs sessions_search tool): kind='main' on the BM25 recall join (FTS5 trigger mirrors every kind, so the filter lands on the message join) AND on the row-hydration query as defense-in-depth. - Vector ingestion consumer: skip messages where kind='sub' before embedding. Producer-side enqueue is unchanged (consumer-side filter is DRY across all three current producers — handlers.ts, turn.ts, turn-compaction.ts). sessions.ts itself is not modified; its tools delegate to kernel methods (searchMessages → hybridSearch, getSessionInfo) where the actual SQL lives. Sub-agent transcripts remain reachable for retry-context queries via run.taskId + kind='sub' (separate code path, lands in Task 5). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Three additional message-fetch sites broadcast to the TUI but lacked the kind='main' filter added in c33450e: - turn.ts:~445 — broadcast after paused-save - turn.ts:~540 — broadcast after turn completion - kernel.ts:~576 — onConnect initial WS sync Without these filters, sub-agent messages would leak to the TUI as soon as Task 6's spawn_sub_agent starts persisting to the message table. Per spec §1, sub-agents run invisibly from the user's POV in PR-D (PR-E adds proper rendering surfaces). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Two more message-read sites missed in the plan's enumeration but caught by spec review: - sessions-recent-block.ts: three subqueries (count + first/last user preview) within the <sessions_recent> context block builder. Without kind='main' filter, sub-agent's INSERTed prompt user-messages would leak into the 'last user preview' line of another thread's row in main agent's context. - turn-compaction.ts:78: forkAndCompact's history fetch for the Anthropic compaction summarizer. Without kind='main', sub-agent rows would conflate with main's conversation history in the generated <previous_thread_summary>. (turn-compaction.ts:223, the INSERT producer side, is intentionally NOT modified per the consumer-side filter strategy.) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Smaller, task-focused variant of main's SYSTEM_PROMPT. Strips <voice> (output goes to LLM not human), <task_management> (no manage_tasks), <roadmap> (no user to admit to), <style> (no TUI). Keeps <honesty>, <capabilities>, <memory_and_skills>, <sessions>. Adds new <sub_agent_role> section explaining scope, task/prior-attempt handling, and output shape. ~700 tokens. Cached at b1; paid once per fresh cache across all sub-agent invocations using the same model.
buildSubAgentContextMessages: trimmed subset of main's blocks (memory, current_date, connected_accounts, files, computer_sandboxes) plus conditional <task> (when task_id provided) and <prior_attempts> (when retry — prior sub-agent runs for the same task with non-null finalText, oldest-first, truncated to ~500 chars each). deriveFinalText: read last assistant message from a run; used by spawn_sub_agent after streamText finishes to derive sub-agent's final answer for run.finalText + task.result.
Single write tool: { model: 'opus'|'sonnet'|'haiku', prompt, task_id? }.
Returns { runId, finalText } so main agent gets the sub-agent's answer
plus a handle for observability.
Same-process execution: nested streamText inside execute(). Sub-agent
shares kernel.db + env bindings directly (no marshalling). Sub-agent's
tool surface = main minus spawn_sub_agent + minus manage_tasks (via
buildSubAgentTools deny-list — landing in Task 7).
Sub-agent persistence: prompt INSERTed before streamText; assistant +
tool messages INSERTed via toUIMessageStream onFinish at end-of-stream
(mirrors main's pattern in turn.ts).
Pause semantics: parent abort → sub-agent run.status='paused', tool_call
row marked 'error' via wrappedTool re-throw. Hard error → run.status='error'.
Cross-thread task_id rejected. Step cap = 50. Cache TTL = '5m' at all
4 breakpoints. Anthropic effort='high' (one tier below main's xhigh).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Two changes in lockstep: - turn.ts: stopWhen stepCountIs(50) → stepCountIs(100). Main was hitting the 50 cap on complex multi-step turns; bumping gives it room. Sub-agent inherits the original 50 (half of main) via spawn_sub_agent's own stopWhen. - tools/index.ts: register spawn_sub_agent in buildTools spread (before get_time so withTailCache's b1 marker stays on get_time). Add buildSubAgentTools factory — deny-list filter that strips spawn_sub_agent (no recursion) and manage_tasks (sub-agents execute, parent plans). Consumed by spawn-sub-agent.ts's nested streamText. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
rtpa25
commented
May 15, 2026
…FinalText Two PR-D review comments addressed: 1. /api/__debug/search now 403s in production. Added NODE_ENV var to wrangler.jsonc (default 'production'), updated .dev.vars.example to suggest NODE_ENV=development locally. Cast through string in the comparison because wrangler types narrows the literal value from wrangler.jsonc and TS would flag the !== check as always-true. Regenerated worker-configuration.d.ts via wrangler types. 2. Removed apps/kernel/src/agent/derive-final-text.ts. Inlined the helper as a module-private function at the bottom of apps/kernel/src/tools/spawn-sub-agent.ts (its sole caller). JSDoc preserved verbatim — the race-safety rationale is still useful at the call site. Imports widened in spawn-sub-agent.ts: added and, desc, type DB to the @agent-os/models line.
CF dashboard's Workers Logs panel will now capture console.log / console.error from all execution contexts — including Durable Object methods, WebSocket onMessage handlers, and async callbacks like streamText.onFinish. Broader coverage than 'wrangler tail' which misses DO message-handler logs. invocation_logs emits a structured entry per request with timing + outcome — useful for 'did this fire at all?' baseline checks during debugging. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Anthropic returns 'adaptive thinking is not supported on this model'
when spawn_sub_agent uses model='haiku'. Both opus and sonnet support
extended thinking; haiku (the fast tier) doesn't.
Conditional providerOptions: for haiku, send only sendReasoning:true
(safe no-op since haiku won't emit reasoning content anyway). For
opus/sonnet, keep effort='high' + thinking={adaptive,summarized}.
Surfaced via Workers Observability log of AI_APICallError on a haiku
sub-agent invocation during smoke.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
… status
Main agent's evaluation step was being short-circuited. spawn_sub_agent's
execute() was doing:
UPDATE task SET status='complete', result=finalText WHERE id=?
This conflated 'sub-agent delivered an answer' with 'task is complete'.
Two distinct events:
- Sub-agent finishing = a fact (LLM returned text)
- Task being complete = a judgment (the answer satisfies the task)
Auto-marking robbed main of the judgment step and broke the retry
workflow: main would see task=complete and never re-evaluate, even when
the answer was inadequate. To retry, main would have had to first
manage_tasks(update, status='in_progress') to revert the auto-mark —
friction the design never intended.
Fix: spawn_sub_agent now writes only task.result. Status is unchanged.
Main reads the result, evaluates, then either:
- marks the task complete/failed/cancelled via manage_tasks, OR
- re-spawns the sub-agent on the same task_id with retry guidance
(the previous answer stays in task.result and surfaces as
<prior_attempts> in the next sub-agent's context)
Tool description updated to teach the model this flow. Spec §1, §3, §9
and plan Task 6 + smoke scenario 3 updated to match.
Surfaced during PR-D smoke scenario 3 — Ronit noticed the task was
'complete' before main agent had even shown its analysis of the result.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Implements the sub-agent substrate per docs/superpowers/specs/2026-05-14-sub-agents-design.md.
Second of three slices: Tasks (PR-C ✅) → Sub-agents (this PR) → TUI rendering (PR-E).
What ships
spawn_sub_agenttool with{ model: 'opus'|'sonnet'|'haiku', prompt, task_id? }→{ runId, finalText }. Wrapped viawrappedTool({ touchesFS: false }).streamTextinsidespawn_sub_agent.execute(). Sub-agent shareskernel.db+ env bindings directly — no marshalling, no isolate boundary.spawn_sub_agent(no recursion) minusmanage_tasks(sub-agents execute, parent plans). Deny-list pattern via destructured rename.<connected_accounts>,<files>,<computer_sandboxes>,<memory>,<current_date>) + conditional<task>+ conditional<prior_attempts>(with full attempt history truncated to ~500 chars per attempt).0010_lumpy_black_widow.sql):rungainskind/parentRunId/taskId/finalText;message+tool_callgain denormalizedkind. New indexidx_run_task_kindfor retry-context queries;idx_message_thread_createdreplaced with kind-aware composite.kind='main'at 13+ sites:runChatTurn's history fetch, paused/complete broadcasts,onConnectinitial sync,getSessionInfocount + page,hybridSearchBM25 join + row hydration, vector ingest consumer,<sessions_recent>block subqueries,forkAndCompactsummarizer history. Producer-side enqueue untouched (consumer-side filter is DRY).SUB_AGENT_SYSTEM_PROMPT): ~700 tokens, task-focused. No<voice>, no<task_management>, no<roadmap>, no<style>. New<sub_agent_role>section with output guidance.effort='high',sendReasoning=true,thinking={adaptive,summarized},cacheControl.ttl='5m'at all 4 breakpoints.runKind("main"/"sub"— stamped ontotool_call.kind) +clientTimezone(propagated through sub-agent context build).spawn_sub_agentcalls in one assistant step = 8 concurrent sub-agent runs in the same DO.Plus a housekeeping pass (
e849ce3): stripped PR-A/PR-B/PR-C stale references fromapps/kernel/src/. Going forward the codebase doesn't track which PR a line came from —git blameis the right tool for "when did this change."Pause semantics
When parent run is paused mid-sub-agent:
run.status='paused'(transcript stays observable post-resume)spawn_sub_agenttool_callrow marksstatus='error'via wrappedTool's re-throw pathSmoke checklist — pending interactive testing
Deployed to
agent-os.pandaronit25.workers.dev. Migration 0010 (0010_lumpy_black_widow.sql) applies automatically.runrow withkind='sub',parentRunIdpopulated,finalTextnon-empty.run.taskId IS NULL.task.status='complete',task.resultpopulated.startedAttimestamps.task_idrejected: "belongs to a different thread" error.kind='sub'.<prior_attempts>block contains first attempt'sfinalText.sessions_searchreturns NO matches against sub-agent internal text.Out of scope (PR-E)
<tasks>as a status panel (Claude Code style)apps/cli/src/components/tools/Reviews (subagent-driven, opus)
All 8 implementation tasks + Task 0 (cleanup) went through spec-compliance + code-quality review per
superpowers:subagent-driven-development. Three iterations during Task 3 (filter completeness): initial pass caught 5 sites, follow-up pass added 3 more (broadcasts + onConnect), final pass added 2 more (sessions-recent subqueries + forkAndCompact summarizer history) — totaling 13 user-visible read sites filtered.Final whole-PR review approved with 5 Minor follow-up items (SQLite ALTER TABLE FK-cascade limitation, defensive consistency nits) — none blocking.
🤖 Generated with Claude Code