feat(tasks): turn-boundary background-task inbox (M3.4 / §3)#20
Merged
Conversation
Fourth and final v3 milestone (§3 True Background Task Control). First
cut: the "task inbox" -- §3's defining capability, turning background
task observation from polling into push. DAG dependencies + concurrency
control are deferred to §3 follow-ups per the roadmap. Includes the
written §3 design section in docs/v3-kernel-roadmap.md.
M3.4a -- the inbox (query.ts + types.ts + tools/index.ts):
- ToolContext.startedBackgroundTaskIds: a shared, mutable Set scoped
to THIS query run. runBackgroundSubAgent adds each local_agent task
id it starts. (The per-turn ToolContext is a shallow spread, so the
Set reference is shared across the run.)
- QueryOptions.drainBackgroundTasks: opt-in.
- At each turn boundary (same spot as M3.2b compaction / M3.3 verify),
drainBackgroundTasks scans the run's OWN task ids for terminal,
not-yet-notified tasks, injects a synthetic user observation
([background tasks finished] + state + bounded output tail), marks
them notifiedAt (dedup), and yields a `background_tasks` LoopEvent.
- Strictly scoped: leftover tasks from prior sessions or CLI
`task start-bash` are NEVER drained (they're not in the run's
registry). This was the key correctness constraint -- a naive
collectTaskNotifications over the whole store would surface
unrelated tasks into a fresh agent's context.
- Injection is append-only, so unlike compaction's rewrite it only
loses cache after the injection point (gentler on §2's prefix).
M3.4b -- CLI + eval:
The agent path seeds startedBackgroundTaskIds + sets
drainBackgroundTasks; the CLI prints `[background]` lines. An 8th eval
task "background-inbox" seeds a pre-completed background task the run
"started" and asserts the inbox drains it deterministically. Eval
fingerprint updated (tasks 8, turns 17, in 11850, out 695).
Tests: 3 query-loop cases (drains this-run task; does NOT drain leftover
tasks not in the registry; no-op when the flag is off). Determinism
holds via seeded TaskStore records -- no real sub-agent spawn needed to
exercise the inbox (the sub-agent machinery is tested separately, and
FakeModel's shared cursor makes an end-to-end spawn awkward).
This completes the v3 kernel-excellence track: §2 cache aligning,
§1 smart compaction, §4 self-correction, §3 background inbox -- all
shipped, all offline-deterministic-testable.
Local: 204 tests, 3/3 green.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Fourth and final v3 milestone (§3 True Background Task Control). First cut: the "task inbox" -- §3's defining capability, turning background task observation from polling into push. DAG dependencies + concurrency control are deferred to §3 follow-ups per the roadmap. Includes the written §3 design section in docs/v3-kernel-roadmap.md.
M3.4a -- the inbox (query.ts + types.ts + tools/index.ts):
background_tasksLoopEvent.task start-bashare NEVER drained (they're not in the run's registry). This was the key correctness constraint -- a naive collectTaskNotifications over the whole store would surface unrelated tasks into a fresh agent's context.M3.4b -- CLI + eval:
The agent path seeds startedBackgroundTaskIds + sets
drainBackgroundTasks; the CLI prints
[background]lines. An 8th evaltask "background-inbox" seeds a pre-completed background task the run
"started" and asserts the inbox drains it deterministically. Eval
fingerprint updated (tasks 8, turns 17, in 11850, out 695).
Tests: 3 query-loop cases (drains this-run task; does NOT drain leftover tasks not in the registry; no-op when the flag is off). Determinism holds via seeded TaskStore records -- no real sub-agent spawn needed to exercise the inbox (the sub-agent machinery is tested separately, and FakeModel's shared cursor makes an end-to-end spawn awkward).
This completes the v3 kernel-excellence track: §2 cache aligning, §1 smart compaction, §4 self-correction, §3 background inbox -- all shipped, all offline-deterministic-testable.
Local: 204 tests, 3/3 green.