Skip to content

feat(cache): rolling message-prefix cache breakpoint + hit-rate telem…#17

Merged
wusijian007 merged 1 commit into
mainfrom
feat/m3.1-cache-aligning
Jun 16, 2026
Merged

feat(cache): rolling message-prefix cache breakpoint + hit-rate telem…#17
wusijian007 merged 1 commit into
mainfrom
feat/m3.1-cache-aligning

Conversation

@wusijian007

Copy link
Copy Markdown
Owner

…etry (M3.1a/b)

First v3 milestone (§2 Cache Aligning). Adds the third prompt-cache breakpoint and makes cache effectiveness measurable -- the prerequisite for tuning §1 compaction without flying blind.

M3.1a -- message-prefix cache breakpoint:

  • ModelRequest gains cacheConversation?: boolean. The agent loop (query.ts -> collectModelTurnWithRetry) sets it true on every stream request; the plain chat path (which doesn't go through query()) leaves it off.
  • toAnthropicMessages(messages, { cacheLastMessage }) attaches cache_control: ephemeral to the LAST block of the LAST message, normalizing a trailing string body into a single text block first (cache_control can't ride a bare string). Both adapter call sites (create + stream) thread request.cacheConversation through.
  • Result: 3 of Anthropic's 4 breakpoints in use (system + tools + conversation prefix). Each turn is a strict extension of the prior request, so the prefix hits the incremental cache. The breakpoint rolls forward to the new last message every turn.

M3.1b -- hit-rate telemetry:

  • myagent usage <id> adds a cache hit ratio line: cache_read / (cache_read + input), with the raw fraction shown.
  • myagent eval run totals line + REPORT.md gain the same ratio (on the deterministic fixture it's a stable 43.6%).
  • Pure render layer; the tokens were already collected in M1.5a.

M3.1c (main-loop fork-trace attribution -- "why did this turn miss") is deferred to when §1 compaction lands and actually needs the alarm, per the roadmap.

Tests: prompt-caching.test.ts gains 5 cases pinning breakpoint placement (default off / last-block-only / block-form last message / empty list); query.test.ts asserts cacheConversation reaches every stream request; cli.test.ts asserts the usage hit-ratio line. Roadmap

  • CLAUDE.md updated.

Local: 192 tests, 3/3 green.

…etry (M3.1a/b)

First v3 milestone (§2 Cache Aligning). Adds the third prompt-cache
breakpoint and makes cache effectiveness measurable -- the prerequisite
for tuning §1 compaction without flying blind.

M3.1a -- message-prefix cache breakpoint:
- ModelRequest gains `cacheConversation?: boolean`. The agent loop
  (query.ts -> collectModelTurnWithRetry) sets it true on every stream
  request; the plain `chat` path (which doesn't go through query())
  leaves it off.
- toAnthropicMessages(messages, { cacheLastMessage }) attaches
  cache_control: ephemeral to the LAST block of the LAST message,
  normalizing a trailing string body into a single text block first
  (cache_control can't ride a bare string). Both adapter call sites
  (create + stream) thread request.cacheConversation through.
- Result: 3 of Anthropic's 4 breakpoints in use (system + tools +
  conversation prefix). Each turn is a strict extension of the prior
  request, so the prefix hits the incremental cache. The breakpoint
  rolls forward to the new last message every turn.

M3.1b -- hit-rate telemetry:
- `myagent usage <id>` adds a `cache hit ratio` line:
  cache_read / (cache_read + input), with the raw fraction shown.
- `myagent eval run` totals line + REPORT.md gain the same ratio
  (on the deterministic fixture it's a stable 43.6%).
- Pure render layer; the tokens were already collected in M1.5a.

M3.1c (main-loop fork-trace attribution -- "why did this turn miss")
is deferred to when §1 compaction lands and actually needs the alarm,
per the roadmap.

Tests: prompt-caching.test.ts gains 5 cases pinning breakpoint
placement (default off / last-block-only / block-form last message /
empty list); query.test.ts asserts cacheConversation reaches every
stream request; cli.test.ts asserts the usage hit-ratio line. Roadmap
+ CLAUDE.md updated.

Local: 192 tests, 3/3 green.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
@wusijian007 wusijian007 merged commit 012e6df into main Jun 16, 2026
3 checks passed
@wusijian007 wusijian007 deleted the feat/m3.1-cache-aligning branch June 16, 2026 12:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant