Skip to content

feat(kernel): PR-C — task management#16

Merged
rtpa25 merged 11 commits into
mainfrom
feat/task-management
May 14, 2026
Merged

feat(kernel): PR-C — task management#16
rtpa25 merged 11 commits into
mainfrom
feat/task-management

Conversation

@rtpa25

@rtpa25 rtpa25 commented May 14, 2026

Copy link
Copy Markdown
Owner

Implements the task-list substrate per docs/superpowers/specs/2026-05-14-task-management-design.md.

First of three slices: Tasks (this PR) → Sub-agents (PR-D) → Streaming (PR-E).

What ships

  • New per-thread task Drizzle table (5-state enum, cascade with thread + run, $onUpdate on updatedAt).
  • manage_tasks tool — single batchable write (ops:[{action:'create',name,description}|{action:'update',id,...patch}]), returns full thread state, cross-thread guard.
  • <tasks> context block — pure SQL renderer, status-aware truncation (in_progress full / pending 120 / terminal 80 chars), omitted on empty threads.
  • prepareStep wiring — refreshes <tasks> per react-loop step so the model sees fresh state across iterations without busting b2 cache.
  • <task_management> system prompt section — when-to-plan decision rule, lifecycle, description quality, discipline, don'ts (~370 tokens, cached in b1).
  • Anthropic provider options: toolStreaming, effort: "xhigh", sendReasoning, thinking: { adaptive, summarized }, explicit cacheControl.ttl: "5m" at all 5 marker sites.

Cache topology preserved

The <tasks> block lives after the b2 cache marker (post <current_date>). Per-step refresh costs ~300 tokens; b2 stays hit across all within-turn steps. b4 misses on steps that mutate tasks — acceptable trade vs the ~5–10K rebuild cost if tasks were inside the b2 prefix. See spec §6 worked example.

Smoke checklist — PENDING (8 scenarios)

Deployed to https://agent-os.pandaronit25.workers.dev and *.ronit.one. Migration 0009 (0009_melodic_multiple_man.sql) applies automatically on first DO write per thread.

Interactive smoke when ready to test (from spec §8):

  • 1. Empty thread renders no <tasks> block. Fresh thread, ask "what time is it" — no manage_tasks tool_call row.
  • 2. Trivial request doesn't trigger planning. "Read README.md" — no manage_tasks call.
  • 3. Multi-step request creates 2–3 tasks. "Research bun vs deno, write a short summary, save it to summary.md" — verify rows in task table.
  • 4. Mid-turn state visibility (the prepareStep test). Same run — agent should mark a task in_progress in the very next step after creating tasks.
  • 5. Persistence across user turns. Send a second message in the same thread; tasks still visible.
  • 6. Cross-thread isolation. New thread renders no <tasks> block.
  • 7. Inline completion with result. Multi-step turn; agent marks task complete with 1–3 sentence result.
  • 8. manage_tasks returns full state. Inspect tool_call.response — contains {tasks:[...]} with all thread tasks.

Cache verification (scenario 10) deferred per memory — wrangler tail doesn't capture DO logs reliably.

Out of scope (PR-D / PR-E)

  • spawn_sub_agent tool, sub-agent runtime, parallel execution
  • TUI rendering of <tasks> as a sidebar/panel
  • 2× result duplication handling (decided: accept; lands in PR-D)
  • Nested tasks (parentTaskId), get_task/list_tasks read tools, task templates — YAGNI

Reviews (subagent-driven, opus)

All 6 implementation tasks went through spec-compliance + code-quality review per superpowers:subagent-driven-development. Two iterations during implementation:

  • Task 1 code-quality caught $defaultFn only fires on INSERT — fixed with .$onUpdate(() => new Date()) matching thread.updatedAt precedent (commit b7c8946). Spec updated in lockstep.
  • Task 2 code-quality caught status union duplication — extracted TASK_STATUS/isTerminalStatus() (commit 0f39944).

Final whole-PR review caught one cross-task seam: stale <roadmap> entry contradicting the new <task_management> block — fixed in d112dd5.

🤖 Generated with Claude Code

rtpa25 and others added 11 commits May 14, 2026 16:20
Per-thread task list with single batchable manage_tasks tool + <tasks> context
block refreshed per-step via streamText prepareStep. First of three slices:
Tasks (this PR) → Sub-agents (PR-D) → Streaming (PR-E).

Key decisions:
- Tasks live AFTER b2 cache marker (in dynamic suffix), not in cached prefix
- prepareStep callback owns tasks block render — never buildContextMessages
- Soft 'one in_progress' rule via system prompt, NOT DB constraint
  (preserves PR-D parallel sub-agent option)
- result field optional but encouraged via system prompt
- 5-state enum: pending/in_progress/complete/failed/cancelled
- No get_task read tool (context block + tool_result history covers retrieval)

Adjacent provider-option additions in §9: toolStreaming, effort=xhigh,
sendReasoning, thinking={adaptive,summarized}, explicit cacheControl.ttl=5m.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
7 tasks: schema+migration → tasks-block renderer → manage_tasks tool →
prepareStep wiring → system prompt section → Anthropic provider options
→ deploy+smoke+PR. Each task self-contained with full code, exact paths,
typecheck-as-test gate, individual commit.

No TDD scaffolding (per project convention). Final smoke checklist runs
8 scenarios against deployed kernel + d1 verifications.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Per-thread tasks with 5-state status enum (pending/in_progress/complete/
failed/cancelled), nullable result, composite index on (threadId, createdAt).
Cascade-delete with both thread and run.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
$defaultFn only fires on INSERT; without $onUpdate the column drifts
stale unless every writer remembers to set it manually. Match the
thread.updatedAt precedent (threads.ts:13-14). Spec §2 corrected to
match — keeps PR-D / PR-E writers honest.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Pure SQL fetch from task table. Status-aware description truncation
(in_progress full / pending 120 / terminal 80 chars), result shown in
full for terminal entries when non-null. Returns null on empty thread
so the block is omitted from prompts entirely. Consumed by prepareStep
callback in turn.ts (Task 4).
Use Task['status'] instead of inline 5-state union, extract
isTerminalStatus() helper so formatDescription and the row map share
one source of truth for terminal-vs-non-terminal. Add explicit Task[]
type on the rows local.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Single write tool: ops:[{action:'create',name,description}|{action:'update',
id,...patch}]. Returns full thread task list sorted by createdAt ASC.
threadId/runId injected server-side from perTurn; cross-thread updates
throw. Wrapped via wrappedTool({touchesFS:false}) so audit rows write
without per-call worktree overhead.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
streamText runs the entire react loop internally; without prepareStep the
context blocks would be frozen at turn start and the agent would see stale
task state across iterations. Capture stableBlockCount after buildContext-
Messages and splice the freshly-rendered <tasks> block at that index every
step.

Splice invariant: only touches indices >= stableBlockCount, leaving the
b2-cached stable prefix byte-identical so b2 keeps hitting across steps.
Costs ~300 tokens of tasks-render per step vs ~5-10k of cache rebuild if
tasks were inside the b2 prefix.
Teaches when to plan vs just do (substantial vs trivial test), lifecycle
with status transitions, description quality bar with good/bad pair,
discipline rules (one in_progress, immediate status updates), and don'ts.
~370 tokens, cached in b1 — paid once per fresh cache, then free.

Lifted from Dimension AI's <delegation>/<task_descriptions> framing,
trimmed to exclude sub-agent guidance (PR-D scope).
- toolStreaming: true        — TUI sees partial tool args mid-emit
- effort: 'xhigh'            — max thinking budget per step
- sendReasoning: true        — reasoning sent back across react steps
- thinking.adaptive+summary  — auto-tune per step, lean conv history
- cacheControl.ttl: '5m'     — explicit (was implicit default) at all
                               cacheControl marker sites so future 1h
                               experiment is a one-character diff

Also widened ContextMessage.providerOptions.anthropic.cacheControl to
accept an optional 'ttl' field so the b2 marker site typechecks; uses
the same '5m' | '1h' literal union the Anthropic SDK exposes.

Combined with adaptive: trivial requests barely think, substantial ones
think hard. Net positive for multi-step workflows that PR-C enables.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Old entry said 'Plans / task lists' were unshipped, contradicting the
new <task_management> block in the same prompt. Rescope to specifically
the TUI-side-panel rendering (the actual remaining unshipped piece) so
the agent doesn't carry contradictory instructions about whether task
tracking exists.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@rtpa25 rtpa25 merged commit 50e5047 into main May 14, 2026
3 checks passed
@rtpa25 rtpa25 deleted the feat/task-management branch May 14, 2026 14:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant