Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
44 changes: 44 additions & 0 deletions backend/tests/unit/test_prompt_cache_integration.py
Original file line number Diff line number Diff line change
Expand Up @@ -820,6 +820,50 @@ def test_page_context_in_dynamic_section():
assert "Meeting with team" in dynamic_suffix


# ---------------------------------------------------------------------------
# Tests: Anthropic cache_control includes TTL
# ---------------------------------------------------------------------------


def test_anthropic_cache_control_has_ttl():
"""
The cache_control dict in _run_anthropic_agent_stream must include
ttl="1h" so that interactive chat sessions (with gaps >5min between
turns) get cache hits instead of re-writing on every request.

Regression: Anthropic changed default TTL from 1h→5m on 2026-03-06.
"""
agentic_mod = _get_agentic_module()

# Inspect the source to find the system_blocks construction
import inspect

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Move inspect imports to module scope

Please move this inspect import (and the duplicate one added in the next test) to the module imports. backend/AGENTS.md applies to this file and explicitly requires “No in-function imports — all imports at module top level,” so these new tests currently violate the backend import policy.

Useful? React with 👍 / 👎.


src = inspect.getsource(agentic_mod._run_anthropic_agent_stream)
assert '"ttl": "1h"' in src or "'ttl': '1h'" in src, (
"cache_control must include ttl='1h' to avoid 5-min default "
f"(source excerpt: ...{src[src.find('cache_control'):src.find('cache_control')+120]}...)"
Comment on lines +843 to +844

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Duplicate in-function import — move import inspect to module level

import inspect appears inside both test_anthropic_cache_control_has_ttl and test_anthropic_cache_control_not_5min_default. In-function imports are against the project's backend import rules, and this one is duplicated across two functions anyway. Moving it to the top of the module removes both violations at once.

Context Used: Backend Python import rules - no in-function impor... (source)

Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!

)
assert "ephemeral" in src, "cache type must be ephemeral"


def test_anthropic_cache_control_not_5min_default():
"""
Guard against regression: ensure we are NOT relying on the 5-minute
default TTL that Anthropic introduced in March 2026.
"""
agentic_mod = _get_agentic_module()
import inspect

src = inspect.getsource(agentic_mod._run_anthropic_agent_stream)
# The old (broken) pattern was just {"type": "ephemeral"} with no ttl field
# Find the cache_control line(s)
lines_with_cache_ctrl = [l for l in src.splitlines() if "cache_control" in l]
for line in lines_with_cache_ctrl:
# Must NOT be the bare {"type": "ephemeral"} form
if '"type": "ephemeral"' in line or "'type': 'ephemeral'" in line:
assert "ttl" in line, f"cache_control line missing ttl field: {line.strip()}"

Comment on lines +857 to +865

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Per-line check silently misses multi-line cache_control dicts

lines_with_cache_ctrl collects only lines that contain the string "cache_control". The subsequent guard checks those same lines for "type": "ephemeral". If the dict is ever reformatted to span multiple lines (e.g., "cache_control": {\n "type": "ephemeral"\n}), the line with "cache_control" won't contain "type": "ephemeral", so the assert "ttl" in line branch is never reached and the test passes silently — even when ttl is absent. The first test (test_anthropic_cache_control_has_ttl) already asserts "ttl": "1h" positively and is the more reliable guard; this second test adds limited extra safety while its negative-path logic has this gap.


# ---------------------------------------------------------------------------
# Utility
# ---------------------------------------------------------------------------
Expand Down
4 changes: 3 additions & 1 deletion backend/utils/retrieval/agentic.py
Original file line number Diff line number Diff line change
Expand Up @@ -362,7 +362,9 @@ async def _run_anthropic_agent_stream(
and feeds results back until the model stops requesting tools.
"""
# System prompt with cache_control for Anthropic prompt caching
system_blocks = [{"type": "text", "text": system_prompt, "cache_control": {"type": "ephemeral"}}]
# TTL=1h: Anthropic changed default from 1h→5m on 2026-03-06; interactive chat
# sessions have gaps >5min between turns, so the 5-min default kills cache hit rate.
system_blocks = [{"type": "text", "text": system_prompt, "cache_control": {"type": "ephemeral", "ttl": "1h"}}]

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 1-hour TTL cache writes are billed at 2× the base input-token price

The Anthropic API docs confirm "ttl": "1h" is valid syntax and that cache writes with the 1h TTL are charged at 2× the standard input-token price (vs ~1.25× for the 5-minute default). The PR's cost projections ($1,273–$2,918/mo savings) are compelling when the expected hit rate jumps from 9% to 40–65%, and the math does pencil out (≥2 messages per user per hour breaks even), but the increased write price for every cold request was not mentioned in the description. Worth verifying the savings estimates account for this pricing tier when the monitoring data comes in post-deploy.


loop_iteration = 0

Expand Down
Loading