feat(cache): rolling message-prefix cache breakpoint + hit-rate telem…#17
Merged
Conversation
…etry (M3.1a/b)
First v3 milestone (§2 Cache Aligning). Adds the third prompt-cache
breakpoint and makes cache effectiveness measurable -- the prerequisite
for tuning §1 compaction without flying blind.
M3.1a -- message-prefix cache breakpoint:
- ModelRequest gains `cacheConversation?: boolean`. The agent loop
(query.ts -> collectModelTurnWithRetry) sets it true on every stream
request; the plain `chat` path (which doesn't go through query())
leaves it off.
- toAnthropicMessages(messages, { cacheLastMessage }) attaches
cache_control: ephemeral to the LAST block of the LAST message,
normalizing a trailing string body into a single text block first
(cache_control can't ride a bare string). Both adapter call sites
(create + stream) thread request.cacheConversation through.
- Result: 3 of Anthropic's 4 breakpoints in use (system + tools +
conversation prefix). Each turn is a strict extension of the prior
request, so the prefix hits the incremental cache. The breakpoint
rolls forward to the new last message every turn.
M3.1b -- hit-rate telemetry:
- `myagent usage <id>` adds a `cache hit ratio` line:
cache_read / (cache_read + input), with the raw fraction shown.
- `myagent eval run` totals line + REPORT.md gain the same ratio
(on the deterministic fixture it's a stable 43.6%).
- Pure render layer; the tokens were already collected in M1.5a.
M3.1c (main-loop fork-trace attribution -- "why did this turn miss")
is deferred to when §1 compaction lands and actually needs the alarm,
per the roadmap.
Tests: prompt-caching.test.ts gains 5 cases pinning breakpoint
placement (default off / last-block-only / block-form last message /
empty list); query.test.ts asserts cacheConversation reaches every
stream request; cli.test.ts asserts the usage hit-ratio line. Roadmap
+ CLAUDE.md updated.
Local: 192 tests, 3/3 green.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
…etry (M3.1a/b)
First v3 milestone (§2 Cache Aligning). Adds the third prompt-cache breakpoint and makes cache effectiveness measurable -- the prerequisite for tuning §1 compaction without flying blind.
M3.1a -- message-prefix cache breakpoint:
cacheConversation?: boolean. The agent loop (query.ts -> collectModelTurnWithRetry) sets it true on every stream request; the plainchatpath (which doesn't go through query()) leaves it off.M3.1b -- hit-rate telemetry:
myagent usage <id>adds acache hit ratioline: cache_read / (cache_read + input), with the raw fraction shown.myagent eval runtotals line + REPORT.md gain the same ratio (on the deterministic fixture it's a stable 43.6%).M3.1c (main-loop fork-trace attribution -- "why did this turn miss") is deferred to when §1 compaction lands and actually needs the alarm, per the roadmap.
Tests: prompt-caching.test.ts gains 5 cases pinning breakpoint placement (default off / last-block-only / block-form last message / empty list); query.test.ts asserts cacheConversation reaches every stream request; cli.test.ts asserts the usage hit-ratio line. Roadmap
Local: 192 tests, 3/3 green.