fix(desktop): enable Anthropic prompt caching for macOS chat#7951
fix(desktop): enable Anthropic prompt caching for macOS chat#7951Git-on-my-level wants to merge 2 commits into
Conversation
Greptile SummaryThis PR enables Anthropic prompt caching for the macOS desktop Rust backend by changing
Confidence Score: 4/5Safe to merge — the change is narrow, well-tested, and correctly handles the empty-prompt edge case that would cause a 400 from Anthropic. The core logic is correct: the trim + empty-check guard prevents the Anthropic 400 on blank cached blocks, the skip_serializing_if on system means the field is omitted when absent, and existing tests were updated alongside four new targeted tests. The only rough edge is that block_type and cache_type are raw strings rather than typed enums — a typo would be invisible to the compiler and only surface as a runtime API error. That is non-blocking but worth addressing before the pattern spreads to more content block types. desktop/macos/Backend-Rust/src/models/chat_completions.rs — the two new structs use plain String discriminants. Important Files Changed
Sequence DiagramsequenceDiagram
participant Client as macOS Client (OpenAI API)
participant Router as chat_completions route
participant Translator as translate_request()
participant Anthropic as Anthropic API
Client->>Router: POST /v1/chat/completions (OpenAI format w/ system message)
Router->>Translator: translate_request(req, model)
Translator->>Translator: Extract system/developer message text
Translator->>Translator: trim() → empty check
alt system prompt non-empty
Translator->>Translator: "Wrap in AnthropicSystemContentBlock { type: text, cache_control: ephemeral }"
Translator-->>Router: "AnthropicRequest { system: Some(Vec[block]), ... }"
Router->>Anthropic: "POST /messages system:[{ type, text, cache_control }]"
Anthropic-->>Router: Response (cached after first call)
else system prompt empty/whitespace/absent
Translator-->>Router: "AnthropicRequest { system: None, ... }"
Router->>Anthropic: POST /messages (no system field)
Anthropic-->>Router: Response
end
Router-->>Client: OpenAI-format response
Reviews (1): Last reviewed commit: "fix: skip caching empty/whitespace syste..." | Re-trigger Greptile |
| #[derive(Debug, Clone, Serialize, PartialEq, Eq)] | ||
| pub struct AnthropicSystemContentBlock { | ||
| #[serde(rename = "type")] | ||
| pub block_type: String, | ||
| pub text: String, | ||
| pub cache_control: AnthropicCacheControl, | ||
| } | ||
|
|
||
| #[derive(Debug, Clone, Serialize, PartialEq, Eq)] | ||
| pub struct AnthropicCacheControl { | ||
| #[serde(rename = "type")] | ||
| pub cache_type: String, | ||
| } |
There was a problem hiding this comment.
The
block_type and cache_type fields are plain String, so a typo (e.g. "Ephemeral" or "Text") compiles cleanly but produces a 400 from Anthropic at runtime. Since these fields are discriminants with a fixed, known set of valid values, typed enums would catch mistakes at compile time with zero runtime cost.
| #[derive(Debug, Clone, Serialize, PartialEq, Eq)] | |
| pub struct AnthropicSystemContentBlock { | |
| #[serde(rename = "type")] | |
| pub block_type: String, | |
| pub text: String, | |
| pub cache_control: AnthropicCacheControl, | |
| } | |
| #[derive(Debug, Clone, Serialize, PartialEq, Eq)] | |
| pub struct AnthropicCacheControl { | |
| #[serde(rename = "type")] | |
| pub cache_type: String, | |
| } | |
| #[derive(Debug, Clone, Serialize, Deserialize, PartialEq, Eq)] | |
| #[serde(rename_all = "lowercase")] | |
| pub enum AnthropicContentBlockType { | |
| Text, | |
| } | |
| #[derive(Debug, Clone, Serialize, Deserialize, PartialEq, Eq)] | |
| #[serde(rename_all = "lowercase")] | |
| pub enum AnthropicCacheControlType { | |
| Ephemeral, | |
| } | |
| #[derive(Debug, Clone, Serialize, PartialEq, Eq)] | |
| pub struct AnthropicSystemContentBlock { | |
| #[serde(rename = "type")] | |
| pub block_type: AnthropicContentBlockType, | |
| pub text: String, | |
| pub cache_control: AnthropicCacheControl, | |
| } | |
| #[derive(Debug, Clone, Serialize, PartialEq, Eq)] | |
| pub struct AnthropicCacheControl { | |
| #[serde(rename = "type")] | |
| pub cache_type: AnthropicCacheControlType, | |
| } |
Summary
Enables Anthropic prompt caching for the macOS desktop Rust OpenAI-compatible chat completions translator.
The translator previously sent the extracted
system/developerprompt to Anthropic as a plain string:Anthropic prompt caching requires cache control on content blocks, so this change serializes system prompts as:
No behavior changes when no system/developer prompt is present; the
systemfield remains omitted.Cost / production impact
This addresses a cost optimization finding:
User impact: neutral-to-positive (same behavior, lower cost after cache warmup, potentially lower latency on cache hits).
Implementation details
AnthropicRequest.systemfromOption<String>toOption<Vec<AnthropicSystemContentBlock>>.AnthropicSystemContentBlockandAnthropicCacheControl.system/developerprompt in a single text block withcache_control: { type: "ephemeral" }.Validation
Run from
desktop/macos/Backend-Rust:Results:
cargo check✅ passedRollback
Safe rollback: revert these commits to return
systemserialization to the previous plain-string shape.