feat(kernel): PR-C — task management by rtpa25 · Pull Request #16 · rtpa25/agent-os

rtpa25 · 2026-05-14T11:53:09Z

Implements the task-list substrate per docs/superpowers/specs/2026-05-14-task-management-design.md.

First of three slices: Tasks (this PR) → Sub-agents (PR-D) → Streaming (PR-E).

What ships

New per-thread task Drizzle table (5-state enum, cascade with thread + run, $onUpdate on updatedAt).
manage_tasks tool — single batchable write (ops:[{action:'create',name,description}|{action:'update',id,...patch}]), returns full thread state, cross-thread guard.
<tasks> context block — pure SQL renderer, status-aware truncation (in_progress full / pending 120 / terminal 80 chars), omitted on empty threads.
prepareStep wiring — refreshes <tasks> per react-loop step so the model sees fresh state across iterations without busting b2 cache.
<task_management> system prompt section — when-to-plan decision rule, lifecycle, description quality, discipline, don'ts (~370 tokens, cached in b1).
Anthropic provider options: toolStreaming, effort: "xhigh", sendReasoning, thinking: { adaptive, summarized }, explicit cacheControl.ttl: "5m" at all 5 marker sites.

Cache topology preserved

The <tasks> block lives after the b2 cache marker (post <current_date>). Per-step refresh costs ~300 tokens; b2 stays hit across all within-turn steps. b4 misses on steps that mutate tasks — acceptable trade vs the ~5–10K rebuild cost if tasks were inside the b2 prefix. See spec §6 worked example.

Smoke checklist — PENDING (8 scenarios)

Deployed to https://agent-os.pandaronit25.workers.dev and *.ronit.one. Migration 0009 (0009_melodic_multiple_man.sql) applies automatically on first DO write per thread.

Interactive smoke when ready to test (from spec §8):

1. Empty thread renders no <tasks> block. Fresh thread, ask "what time is it" — no manage_tasks tool_call row.
2. Trivial request doesn't trigger planning. "Read README.md" — no manage_tasks call.
3. Multi-step request creates 2–3 tasks. "Research bun vs deno, write a short summary, save it to summary.md" — verify rows in task table.
4. Mid-turn state visibility (the prepareStep test). Same run — agent should mark a task in_progress in the very next step after creating tasks.
5. Persistence across user turns. Send a second message in the same thread; tasks still visible.
6. Cross-thread isolation. New thread renders no <tasks> block.
7. Inline completion with result. Multi-step turn; agent marks task complete with 1–3 sentence result.
8. manage_tasks returns full state. Inspect tool_call.response — contains {tasks:[...]} with all thread tasks.

Cache verification (scenario 10) deferred per memory — wrangler tail doesn't capture DO logs reliably.

Out of scope (PR-D / PR-E)

spawn_sub_agent tool, sub-agent runtime, parallel execution
TUI rendering of <tasks> as a sidebar/panel
2× result duplication handling (decided: accept; lands in PR-D)
Nested tasks (parentTaskId), get_task/list_tasks read tools, task templates — YAGNI

Reviews (subagent-driven, opus)

All 6 implementation tasks went through spec-compliance + code-quality review per superpowers:subagent-driven-development. Two iterations during implementation:

Task 1 code-quality caught $defaultFn only fires on INSERT — fixed with .$onUpdate(() => new Date()) matching thread.updatedAt precedent (commit b7c8946). Spec updated in lockstep.
Task 2 code-quality caught status union duplication — extracted TASK_STATUS/isTerminalStatus() (commit 0f39944).

Final whole-PR review caught one cross-task seam: stale <roadmap> entry contradicting the new <task_management> block — fixed in d112dd5.

🤖 Generated with Claude Code

Per-thread task list with single batchable manage_tasks tool + <tasks> context block refreshed per-step via streamText prepareStep. First of three slices: Tasks (this PR) → Sub-agents (PR-D) → Streaming (PR-E). Key decisions: - Tasks live AFTER b2 cache marker (in dynamic suffix), not in cached prefix - prepareStep callback owns tasks block render — never buildContextMessages - Soft 'one in_progress' rule via system prompt, NOT DB constraint (preserves PR-D parallel sub-agent option) - result field optional but encouraged via system prompt - 5-state enum: pending/in_progress/complete/failed/cancelled - No get_task read tool (context block + tool_result history covers retrieval) Adjacent provider-option additions in §9: toolStreaming, effort=xhigh, sendReasoning, thinking={adaptive,summarized}, explicit cacheControl.ttl=5m. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

7 tasks: schema+migration → tasks-block renderer → manage_tasks tool → prepareStep wiring → system prompt section → Anthropic provider options → deploy+smoke+PR. Each task self-contained with full code, exact paths, typecheck-as-test gate, individual commit. No TDD scaffolding (per project convention). Final smoke checklist runs 8 scenarios against deployed kernel + d1 verifications. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Per-thread tasks with 5-state status enum (pending/in_progress/complete/ failed/cancelled), nullable result, composite index on (threadId, createdAt). Cascade-delete with both thread and run. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

$defaultFn only fires on INSERT; without $onUpdate the column drifts stale unless every writer remembers to set it manually. Match the thread.updatedAt precedent (threads.ts:13-14). Spec §2 corrected to match — keeps PR-D / PR-E writers honest. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Pure SQL fetch from task table. Status-aware description truncation (in_progress full / pending 120 / terminal 80 chars), result shown in full for terminal entries when non-null. Returns null on empty thread so the block is omitted from prompts entirely. Consumed by prepareStep callback in turn.ts (Task 4).

Use Task['status'] instead of inline 5-state union, extract isTerminalStatus() helper so formatDescription and the row map share one source of truth for terminal-vs-non-terminal. Add explicit Task[] type on the rows local. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Single write tool: ops:[{action:'create',name,description}|{action:'update', id,...patch}]. Returns full thread task list sorted by createdAt ASC. threadId/runId injected server-side from perTurn; cross-thread updates throw. Wrapped via wrappedTool({touchesFS:false}) so audit rows write without per-call worktree overhead. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

streamText runs the entire react loop internally; without prepareStep the context blocks would be frozen at turn start and the agent would see stale task state across iterations. Capture stableBlockCount after buildContext- Messages and splice the freshly-rendered <tasks> block at that index every step. Splice invariant: only touches indices >= stableBlockCount, leaving the b2-cached stable prefix byte-identical so b2 keeps hitting across steps. Costs ~300 tokens of tasks-render per step vs ~5-10k of cache rebuild if tasks were inside the b2 prefix.

Teaches when to plan vs just do (substantial vs trivial test), lifecycle with status transitions, description quality bar with good/bad pair, discipline rules (one in_progress, immediate status updates), and don'ts. ~370 tokens, cached in b1 — paid once per fresh cache, then free. Lifted from Dimension AI's <delegation>/<task_descriptions> framing, trimmed to exclude sub-agent guidance (PR-D scope).

- toolStreaming: true — TUI sees partial tool args mid-emit - effort: 'xhigh' — max thinking budget per step - sendReasoning: true — reasoning sent back across react steps - thinking.adaptive+summary — auto-tune per step, lean conv history - cacheControl.ttl: '5m' — explicit (was implicit default) at all cacheControl marker sites so future 1h experiment is a one-character diff Also widened ContextMessage.providerOptions.anthropic.cacheControl to accept an optional 'ttl' field so the b2 marker site typechecks; uses the same '5m' | '1h' literal union the Anthropic SDK exposes. Combined with adaptive: trivial requests barely think, substantial ones think hard. Net positive for multi-step workflows that PR-C enables. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Old entry said 'Plans / task lists' were unshipped, contradicting the new <task_management> block in the same prompt. Rescope to specifically the TUI-side-panel rendering (the actual remaining unshipped piece) so the agent doesn't carry contradictory instructions about whether task tracking exists. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

rtpa25 and others added 11 commits May 14, 2026 16:20

rtpa25 merged commit 50e5047 into main May 14, 2026
3 checks passed

rtpa25 deleted the feat/task-management branch May 14, 2026 14:27

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(kernel): PR-C — task management#16

feat(kernel): PR-C — task management#16
rtpa25 merged 11 commits into
mainfrom
feat/task-management

rtpa25 commented May 14, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

rtpa25 commented May 14, 2026

What ships

Cache topology preserved

Smoke checklist — PENDING (8 scenarios)

Out of scope (PR-D / PR-E)

Reviews (subagent-driven, opus)

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant