feat(kernel): PR-C — task management#16
Merged
Merged
Conversation
Per-thread task list with single batchable manage_tasks tool + <tasks> context
block refreshed per-step via streamText prepareStep. First of three slices:
Tasks (this PR) → Sub-agents (PR-D) → Streaming (PR-E).
Key decisions:
- Tasks live AFTER b2 cache marker (in dynamic suffix), not in cached prefix
- prepareStep callback owns tasks block render — never buildContextMessages
- Soft 'one in_progress' rule via system prompt, NOT DB constraint
(preserves PR-D parallel sub-agent option)
- result field optional but encouraged via system prompt
- 5-state enum: pending/in_progress/complete/failed/cancelled
- No get_task read tool (context block + tool_result history covers retrieval)
Adjacent provider-option additions in §9: toolStreaming, effort=xhigh,
sendReasoning, thinking={adaptive,summarized}, explicit cacheControl.ttl=5m.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
7 tasks: schema+migration → tasks-block renderer → manage_tasks tool → prepareStep wiring → system prompt section → Anthropic provider options → deploy+smoke+PR. Each task self-contained with full code, exact paths, typecheck-as-test gate, individual commit. No TDD scaffolding (per project convention). Final smoke checklist runs 8 scenarios against deployed kernel + d1 verifications. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Per-thread tasks with 5-state status enum (pending/in_progress/complete/ failed/cancelled), nullable result, composite index on (threadId, createdAt). Cascade-delete with both thread and run. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
$defaultFn only fires on INSERT; without $onUpdate the column drifts stale unless every writer remembers to set it manually. Match the thread.updatedAt precedent (threads.ts:13-14). Spec §2 corrected to match — keeps PR-D / PR-E writers honest. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Pure SQL fetch from task table. Status-aware description truncation (in_progress full / pending 120 / terminal 80 chars), result shown in full for terminal entries when non-null. Returns null on empty thread so the block is omitted from prompts entirely. Consumed by prepareStep callback in turn.ts (Task 4).
Use Task['status'] instead of inline 5-state union, extract isTerminalStatus() helper so formatDescription and the row map share one source of truth for terminal-vs-non-terminal. Add explicit Task[] type on the rows local. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Single write tool: ops:[{action:'create',name,description}|{action:'update',
id,...patch}]. Returns full thread task list sorted by createdAt ASC.
threadId/runId injected server-side from perTurn; cross-thread updates
throw. Wrapped via wrappedTool({touchesFS:false}) so audit rows write
without per-call worktree overhead.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
streamText runs the entire react loop internally; without prepareStep the context blocks would be frozen at turn start and the agent would see stale task state across iterations. Capture stableBlockCount after buildContext- Messages and splice the freshly-rendered <tasks> block at that index every step. Splice invariant: only touches indices >= stableBlockCount, leaving the b2-cached stable prefix byte-identical so b2 keeps hitting across steps. Costs ~300 tokens of tasks-render per step vs ~5-10k of cache rebuild if tasks were inside the b2 prefix.
Teaches when to plan vs just do (substantial vs trivial test), lifecycle with status transitions, description quality bar with good/bad pair, discipline rules (one in_progress, immediate status updates), and don'ts. ~370 tokens, cached in b1 — paid once per fresh cache, then free. Lifted from Dimension AI's <delegation>/<task_descriptions> framing, trimmed to exclude sub-agent guidance (PR-D scope).
- toolStreaming: true — TUI sees partial tool args mid-emit
- effort: 'xhigh' — max thinking budget per step
- sendReasoning: true — reasoning sent back across react steps
- thinking.adaptive+summary — auto-tune per step, lean conv history
- cacheControl.ttl: '5m' — explicit (was implicit default) at all
cacheControl marker sites so future 1h
experiment is a one-character diff
Also widened ContextMessage.providerOptions.anthropic.cacheControl to
accept an optional 'ttl' field so the b2 marker site typechecks; uses
the same '5m' | '1h' literal union the Anthropic SDK exposes.
Combined with adaptive: trivial requests barely think, substantial ones
think hard. Net positive for multi-step workflows that PR-C enables.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Old entry said 'Plans / task lists' were unshipped, contradicting the new <task_management> block in the same prompt. Rescope to specifically the TUI-side-panel rendering (the actual remaining unshipped piece) so the agent doesn't carry contradictory instructions about whether task tracking exists. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Implements the task-list substrate per docs/superpowers/specs/2026-05-14-task-management-design.md.
First of three slices: Tasks (this PR) → Sub-agents (PR-D) → Streaming (PR-E).
What ships
taskDrizzle table (5-state enum, cascade with thread + run,$onUpdateonupdatedAt).manage_taskstool — single batchable write (ops:[{action:'create',name,description}|{action:'update',id,...patch}]), returns full thread state, cross-thread guard.<tasks>context block — pure SQL renderer, status-aware truncation (in_progress full / pending 120 / terminal 80 chars), omitted on empty threads.prepareStepwiring — refreshes<tasks>per react-loop step so the model sees fresh state across iterations without busting b2 cache.<task_management>system prompt section — when-to-plan decision rule, lifecycle, description quality, discipline, don'ts (~370 tokens, cached in b1).toolStreaming,effort: "xhigh",sendReasoning,thinking: { adaptive, summarized }, explicitcacheControl.ttl: "5m"at all 5 marker sites.Cache topology preserved
The
<tasks>block lives after the b2 cache marker (post<current_date>). Per-step refresh costs ~300 tokens; b2 stays hit across all within-turn steps. b4 misses on steps that mutate tasks — acceptable trade vs the ~5–10K rebuild cost if tasks were inside the b2 prefix. See spec §6 worked example.Smoke checklist — PENDING (8 scenarios)
Deployed to
https://agent-os.pandaronit25.workers.devand*.ronit.one. Migration 0009 (0009_melodic_multiple_man.sql) applies automatically on first DO write per thread.Interactive smoke when ready to test (from spec §8):
<tasks>block. Fresh thread, ask "what time is it" — no manage_taskstool_callrow.tasktable.in_progressin the very next step after creating tasks.<tasks>block.manage_tasksreturns full state. Inspecttool_call.response— contains{tasks:[...]}with all thread tasks.Cache verification (scenario 10) deferred per memory —
wrangler taildoesn't capture DO logs reliably.Out of scope (PR-D / PR-E)
spawn_sub_agenttool, sub-agent runtime, parallel execution<tasks>as a sidebar/panelparentTaskId),get_task/list_tasksread tools, task templates — YAGNIReviews (subagent-driven, opus)
All 6 implementation tasks went through spec-compliance + code-quality review per
superpowers:subagent-driven-development. Two iterations during implementation:$defaultFnonly fires on INSERT — fixed with.$onUpdate(() => new Date())matchingthread.updatedAtprecedent (commit b7c8946). Spec updated in lockstep.TASK_STATUS/isTerminalStatus()(commit 0f39944).Final whole-PR review caught one cross-task seam: stale
<roadmap>entry contradicting the new<task_management>block — fixed in d112dd5.🤖 Generated with Claude Code