feat(compaction): tiered smart compaction + proactive trigger (M3.2 /…#18
Merged
Conversation
… §1)
Second v3 milestone (§1 Smart Compaction). Upgrades context management
from reactive dumb-snip to proactive, tiered, deterministic compaction
that cooperates with the M3.1a cache prefix breakpoint. Includes the
written §1 design section in docs/v3-kernel-roadmap.md (design-before-
code per the roadmap's blast-radius tiering).
M3.2a -- tiered deterministic compaction (context.ts):
compactMessagesTiered() shrinks stale messages IN PLACE instead of
dropping a middle slice (which risks orphaning a tool_use/tool_result
pair -> Anthropic 400). Policy: root task + recent window kept
verbatim; in the stale zone, large tool_result blocks become compact
pointers ("[archived Read(src/x.ts) result: N chars omitted ->
<artifactPath>]") and long text is snipped. Token reduction comes
from pointer-izing the whales (tool_results dominate an agent
transcript), so message COUNT is preserved and all pairing stays
valid. Falls back to legacy drop-middle only if shrink isn't enough.
Fully deterministic (no model call) -> eval-safe by construction.
M3.2b -- proactive trigger (query.ts):
At each turn boundary, if the transcript estimate crosses
proactiveCompactionSoftLimitRatio (default 75%) of contextBudgetTokens,
compact down to proactiveCompactionTargetRatio (50%) BEFORE the next
request and yield a `compaction` LoopEvent. Compacting aggressively
(well below the soft limit) maximizes turns-between-compactions --
each compaction is one cache miss (the amortized cache-reset framing
in the roadmap). The reactive prompt_too_long/max_output path stays
as the safety net. The CLI prints a `[compaction]` line.
M3.1c -- cache attribution (scoped down, with rationale):
Within a single run the ONLY event that invalidates the rolling
message-prefix cache is a compaction (it rewrites stale messages); a
normal turn merely appends and keeps hitting the cache. fork.ts's
prefixHash is a whole-list hash that changes every appended turn, so
a literal per-turn fork-trace would mislabel every turn as a prefix
miss. Instead we mark the real reset with a `query.cache_prefix_reset`
profile event at the compaction site. Documented in the roadmap.
Tests: context-budget.test.ts (tiered pointer-ization, root+recent
verbatim, archive-once, under-target no-op); query.test.ts (proactive
fires past soft limit / stays quiet under it); a 6th eval task
"proactive-compaction" that drives a big-file Read past a tiny budget
and asserts the compaction event -- the eval gate's deterministic
fingerprint updated accordingly (tasks 6, turns 13, in 9600, out 555).
CLAUDE.md + roadmap updated; M3.2c (LLM summarizer) stays deferred.
Local: 197 tests, 3/3 green.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
… §1)
Second v3 milestone (§1 Smart Compaction). Upgrades context management from reactive dumb-snip to proactive, tiered, deterministic compaction that cooperates with the M3.1a cache prefix breakpoint. Includes the written §1 design section in docs/v3-kernel-roadmap.md (design-before- code per the roadmap's blast-radius tiering).
M3.2a -- tiered deterministic compaction (context.ts):
compactMessagesTiered() shrinks stale messages IN PLACE instead of
dropping a middle slice (which risks orphaning a tool_use/tool_result
pair -> Anthropic 400). Policy: root task + recent window kept
verbatim; in the stale zone, large tool_result blocks become compact
pointers ("[archived Read(src/x.ts) result: N chars omitted ->
]") and long text is snipped. Token reduction comes
from pointer-izing the whales (tool_results dominate an agent
transcript), so message COUNT is preserved and all pairing stays
valid. Falls back to legacy drop-middle only if shrink isn't enough.
Fully deterministic (no model call) -> eval-safe by construction.
M3.2b -- proactive trigger (query.ts):
At each turn boundary, if the transcript estimate crosses
proactiveCompactionSoftLimitRatio (default 75%) of contextBudgetTokens,
compact down to proactiveCompactionTargetRatio (50%) BEFORE the next
request and yield a
compactionLoopEvent. Compacting aggressively(well below the soft limit) maximizes turns-between-compactions --
each compaction is one cache miss (the amortized cache-reset framing
in the roadmap). The reactive prompt_too_long/max_output path stays
as the safety net. The CLI prints a
[compaction]line.M3.1c -- cache attribution (scoped down, with rationale):
Within a single run the ONLY event that invalidates the rolling
message-prefix cache is a compaction (it rewrites stale messages); a
normal turn merely appends and keeps hitting the cache. fork.ts's
prefixHash is a whole-list hash that changes every appended turn, so
a literal per-turn fork-trace would mislabel every turn as a prefix
miss. Instead we mark the real reset with a
query.cache_prefix_resetprofile event at the compaction site. Documented in the roadmap.
Tests: context-budget.test.ts (tiered pointer-ization, root+recent verbatim, archive-once, under-target no-op); query.test.ts (proactive fires past soft limit / stays quiet under it); a 6th eval task "proactive-compaction" that drives a big-file Read past a tiny budget and asserts the compaction event -- the eval gate's deterministic fingerprint updated accordingly (tasks 6, turns 13, in 9600, out 555). CLAUDE.md + roadmap updated; M3.2c (LLM summarizer) stays deferred.
Local: 197 tests, 3/3 green.