Skip to content

feat(compaction): tiered smart compaction + proactive trigger (M3.2 /…#18

Merged
wusijian007 merged 1 commit into
mainfrom
feat/m3.2-smart-compaction
Jun 16, 2026
Merged

feat(compaction): tiered smart compaction + proactive trigger (M3.2 /…#18
wusijian007 merged 1 commit into
mainfrom
feat/m3.2-smart-compaction

Conversation

@wusijian007

Copy link
Copy Markdown
Owner

… §1)

Second v3 milestone (§1 Smart Compaction). Upgrades context management from reactive dumb-snip to proactive, tiered, deterministic compaction that cooperates with the M3.1a cache prefix breakpoint. Includes the written §1 design section in docs/v3-kernel-roadmap.md (design-before- code per the roadmap's blast-radius tiering).

M3.2a -- tiered deterministic compaction (context.ts):
compactMessagesTiered() shrinks stale messages IN PLACE instead of
dropping a middle slice (which risks orphaning a tool_use/tool_result
pair -> Anthropic 400). Policy: root task + recent window kept
verbatim; in the stale zone, large tool_result blocks become compact
pointers ("[archived Read(src/x.ts) result: N chars omitted ->
]") and long text is snipped. Token reduction comes
from pointer-izing the whales (tool_results dominate an agent
transcript), so message COUNT is preserved and all pairing stays
valid. Falls back to legacy drop-middle only if shrink isn't enough.
Fully deterministic (no model call) -> eval-safe by construction.

M3.2b -- proactive trigger (query.ts):
At each turn boundary, if the transcript estimate crosses
proactiveCompactionSoftLimitRatio (default 75%) of contextBudgetTokens,
compact down to proactiveCompactionTargetRatio (50%) BEFORE the next
request and yield a compaction LoopEvent. Compacting aggressively
(well below the soft limit) maximizes turns-between-compactions --
each compaction is one cache miss (the amortized cache-reset framing
in the roadmap). The reactive prompt_too_long/max_output path stays
as the safety net. The CLI prints a [compaction] line.

M3.1c -- cache attribution (scoped down, with rationale):
Within a single run the ONLY event that invalidates the rolling
message-prefix cache is a compaction (it rewrites stale messages); a
normal turn merely appends and keeps hitting the cache. fork.ts's
prefixHash is a whole-list hash that changes every appended turn, so
a literal per-turn fork-trace would mislabel every turn as a prefix
miss. Instead we mark the real reset with a query.cache_prefix_reset
profile event at the compaction site. Documented in the roadmap.

Tests: context-budget.test.ts (tiered pointer-ization, root+recent verbatim, archive-once, under-target no-op); query.test.ts (proactive fires past soft limit / stays quiet under it); a 6th eval task "proactive-compaction" that drives a big-file Read past a tiny budget and asserts the compaction event -- the eval gate's deterministic fingerprint updated accordingly (tasks 6, turns 13, in 9600, out 555). CLAUDE.md + roadmap updated; M3.2c (LLM summarizer) stays deferred.

Local: 197 tests, 3/3 green.

… §1)

Second v3 milestone (§1 Smart Compaction). Upgrades context management
from reactive dumb-snip to proactive, tiered, deterministic compaction
that cooperates with the M3.1a cache prefix breakpoint. Includes the
written §1 design section in docs/v3-kernel-roadmap.md (design-before-
code per the roadmap's blast-radius tiering).

M3.2a -- tiered deterministic compaction (context.ts):
  compactMessagesTiered() shrinks stale messages IN PLACE instead of
  dropping a middle slice (which risks orphaning a tool_use/tool_result
  pair -> Anthropic 400). Policy: root task + recent window kept
  verbatim; in the stale zone, large tool_result blocks become compact
  pointers ("[archived Read(src/x.ts) result: N chars omitted ->
  <artifactPath>]") and long text is snipped. Token reduction comes
  from pointer-izing the whales (tool_results dominate an agent
  transcript), so message COUNT is preserved and all pairing stays
  valid. Falls back to legacy drop-middle only if shrink isn't enough.
  Fully deterministic (no model call) -> eval-safe by construction.

M3.2b -- proactive trigger (query.ts):
  At each turn boundary, if the transcript estimate crosses
  proactiveCompactionSoftLimitRatio (default 75%) of contextBudgetTokens,
  compact down to proactiveCompactionTargetRatio (50%) BEFORE the next
  request and yield a `compaction` LoopEvent. Compacting aggressively
  (well below the soft limit) maximizes turns-between-compactions --
  each compaction is one cache miss (the amortized cache-reset framing
  in the roadmap). The reactive prompt_too_long/max_output path stays
  as the safety net. The CLI prints a `[compaction]` line.

M3.1c -- cache attribution (scoped down, with rationale):
  Within a single run the ONLY event that invalidates the rolling
  message-prefix cache is a compaction (it rewrites stale messages); a
  normal turn merely appends and keeps hitting the cache. fork.ts's
  prefixHash is a whole-list hash that changes every appended turn, so
  a literal per-turn fork-trace would mislabel every turn as a prefix
  miss. Instead we mark the real reset with a `query.cache_prefix_reset`
  profile event at the compaction site. Documented in the roadmap.

Tests: context-budget.test.ts (tiered pointer-ization, root+recent
verbatim, archive-once, under-target no-op); query.test.ts (proactive
fires past soft limit / stays quiet under it); a 6th eval task
"proactive-compaction" that drives a big-file Read past a tiny budget
and asserts the compaction event -- the eval gate's deterministic
fingerprint updated accordingly (tasks 6, turns 13, in 9600, out 555).
CLAUDE.md + roadmap updated; M3.2c (LLM summarizer) stays deferred.

Local: 197 tests, 3/3 green.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
@wusijian007 wusijian007 merged commit 60a3bef into main Jun 16, 2026
3 checks passed
@wusijian007 wusijian007 deleted the feat/m3.2-smart-compaction branch June 16, 2026 13:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant