Skip to content

feat(tasks): turn-boundary background-task inbox (M3.4 / §3)#20

Merged
wusijian007 merged 1 commit into
mainfrom
feat/m3.4-background-tasks
Jun 17, 2026
Merged

feat(tasks): turn-boundary background-task inbox (M3.4 / §3)#20
wusijian007 merged 1 commit into
mainfrom
feat/m3.4-background-tasks

Conversation

@wusijian007

Copy link
Copy Markdown
Owner

Fourth and final v3 milestone (§3 True Background Task Control). First cut: the "task inbox" -- §3's defining capability, turning background task observation from polling into push. DAG dependencies + concurrency control are deferred to §3 follow-ups per the roadmap. Includes the written §3 design section in docs/v3-kernel-roadmap.md.

M3.4a -- the inbox (query.ts + types.ts + tools/index.ts):

  • ToolContext.startedBackgroundTaskIds: a shared, mutable Set scoped to THIS query run. runBackgroundSubAgent adds each local_agent task id it starts. (The per-turn ToolContext is a shallow spread, so the Set reference is shared across the run.)
  • QueryOptions.drainBackgroundTasks: opt-in.
  • At each turn boundary (same spot as M3.2b compaction / M3.3 verify), drainBackgroundTasks scans the run's OWN task ids for terminal, not-yet-notified tasks, injects a synthetic user observation ([background tasks finished] + state + bounded output tail), marks them notifiedAt (dedup), and yields a background_tasks LoopEvent.
  • Strictly scoped: leftover tasks from prior sessions or CLI task start-bash are NEVER drained (they're not in the run's registry). This was the key correctness constraint -- a naive collectTaskNotifications over the whole store would surface unrelated tasks into a fresh agent's context.
  • Injection is append-only, so unlike compaction's rewrite it only loses cache after the injection point (gentler on §2's prefix).

M3.4b -- CLI + eval:
The agent path seeds startedBackgroundTaskIds + sets
drainBackgroundTasks; the CLI prints [background] lines. An 8th eval
task "background-inbox" seeds a pre-completed background task the run
"started" and asserts the inbox drains it deterministically. Eval
fingerprint updated (tasks 8, turns 17, in 11850, out 695).

Tests: 3 query-loop cases (drains this-run task; does NOT drain leftover tasks not in the registry; no-op when the flag is off). Determinism holds via seeded TaskStore records -- no real sub-agent spawn needed to exercise the inbox (the sub-agent machinery is tested separately, and FakeModel's shared cursor makes an end-to-end spawn awkward).

This completes the v3 kernel-excellence track: §2 cache aligning, §1 smart compaction, §4 self-correction, §3 background inbox -- all shipped, all offline-deterministic-testable.

Local: 204 tests, 3/3 green.

Fourth and final v3 milestone (§3 True Background Task Control). First
cut: the "task inbox" -- §3's defining capability, turning background
task observation from polling into push. DAG dependencies + concurrency
control are deferred to §3 follow-ups per the roadmap. Includes the
written §3 design section in docs/v3-kernel-roadmap.md.

M3.4a -- the inbox (query.ts + types.ts + tools/index.ts):
  - ToolContext.startedBackgroundTaskIds: a shared, mutable Set scoped
    to THIS query run. runBackgroundSubAgent adds each local_agent task
    id it starts. (The per-turn ToolContext is a shallow spread, so the
    Set reference is shared across the run.)
  - QueryOptions.drainBackgroundTasks: opt-in.
  - At each turn boundary (same spot as M3.2b compaction / M3.3 verify),
    drainBackgroundTasks scans the run's OWN task ids for terminal,
    not-yet-notified tasks, injects a synthetic user observation
    ([background tasks finished] + state + bounded output tail), marks
    them notifiedAt (dedup), and yields a `background_tasks` LoopEvent.
  - Strictly scoped: leftover tasks from prior sessions or CLI
    `task start-bash` are NEVER drained (they're not in the run's
    registry). This was the key correctness constraint -- a naive
    collectTaskNotifications over the whole store would surface
    unrelated tasks into a fresh agent's context.
  - Injection is append-only, so unlike compaction's rewrite it only
    loses cache after the injection point (gentler on §2's prefix).

M3.4b -- CLI + eval:
  The agent path seeds startedBackgroundTaskIds + sets
  drainBackgroundTasks; the CLI prints `[background]` lines. An 8th eval
  task "background-inbox" seeds a pre-completed background task the run
  "started" and asserts the inbox drains it deterministically. Eval
  fingerprint updated (tasks 8, turns 17, in 11850, out 695).

Tests: 3 query-loop cases (drains this-run task; does NOT drain leftover
tasks not in the registry; no-op when the flag is off). Determinism
holds via seeded TaskStore records -- no real sub-agent spawn needed to
exercise the inbox (the sub-agent machinery is tested separately, and
FakeModel's shared cursor makes an end-to-end spawn awkward).

This completes the v3 kernel-excellence track: §2 cache aligning,
§1 smart compaction, §4 self-correction, §3 background inbox -- all
shipped, all offline-deterministic-testable.

Local: 204 tests, 3/3 green.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
@wusijian007 wusijian007 merged commit c962135 into main Jun 17, 2026
3 checks passed
@wusijian007 wusijian007 deleted the feat/m3.4-background-tasks branch June 17, 2026 00:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant