Summary
When using calculate mode, ccusage prices all cache_creation_input_tokens at a single cache_creation_input_token_cost rate from LiteLLM, which corresponds to the 5-minute cache write multiplier (1.25× base input). However, Claude Code predominantly uses 1-hour caching, which Anthropic prices at 2× base input — a 60% higher rate.
Anthropic official pricing
From Anthropic's pricing page:
| Cache operation |
Multiplier |
Duration |
| 5-minute cache write |
1.25× base input price |
Cache valid for 5 minutes |
| 1-hour cache write |
2× base input price |
Cache valid for 1 hour |
| Cache read (hit) |
0.1× base input price |
Same duration as the preceding write |
Full model pricing table (relevant columns):
| Model |
Base Input |
5m Cache Writes |
1h Cache Writes |
Cache Hits |
| Claude Opus 4.6 |
$5/MTok |
$6.25/MTok |
$10/MTok |
$0.50/MTok |
| Claude Sonnet 4.6 |
$3/MTok |
$3.75/MTok |
$6/MTok |
$0.30/MTok |
| Claude Sonnet 4.5 |
$3/MTok |
$3.75/MTok |
$6/MTok |
$0.30/MTok |
| Claude Haiku 4.5 |
$1/MTok |
$1.25/MTok |
$2/MTok |
$0.10/MTok |
Data available in JSONL
Claude Code's JSONL files already include a cache_creation breakdown inside the usage object that distinguishes the two durations:
{
"usage": {
"input_tokens": 3,
"output_tokens": 10,
"cache_creation_input_tokens": 23566,
"cache_read_input_tokens": 19357,
"cache_creation": {
"ephemeral_5m_input_tokens": 0,
"ephemeral_1h_input_tokens": 23566
}
}
}
In my dataset (~40k usage records), the vast majority of cache creation tokens are 1-hour:
| Model |
5m tokens |
1h tokens |
% 1h |
| claude-opus-4-6 |
9.6M |
116.5M |
92% |
| claude-sonnet-4-5-20250929 |
0 |
7.8M |
100% |
| claude-sonnet-4-6 |
2.1M |
0 |
0% |
| claude-haiku-4-5-20251001 |
15.8M |
30.9M |
66% |
Current behavior in ccusage
In packages/internal/src/pricing.ts, calculateCostFromPricing uses a single cache_creation_input_token_cost (sourced from LiteLLM) for all cache creation tokens:
const cacheCreationCost = calculateTieredCost(
tokens.cache_creation_input_tokens,
pricing.cache_creation_input_token_cost, // 1.25x rate
pricing.cache_creation_input_token_cost_above_200k_tokens,
);
There is no reference to ephemeral_5m_input_tokens or ephemeral_1h_input_tokens anywhere in the codebase (gh search code "ephemeral" --repo ryoppippi/ccusage returns 0 results).
Impact
Cost comparison on my usage data:
| Model |
Cost (all at 1.25×) |
Cost (5m/1h split) |
Under-reported |
| claude-opus-4-6 |
$2,388 |
$2,826 |
$438 (18%) |
| claude-haiku-4-5-20251001 |
$112 |
$135 |
$23 (21%) |
| claude-sonnet-4-5-20250929 |
$46 |
$63 |
$17 (38%) |
| Total |
$2,568 |
$3,047 |
$479 (19%) |
Suggested fix
When parsing JSONL records, check if usage.cache_creation is a dict containing ephemeral_5m_input_tokens and ephemeral_1h_input_tokens. If so, apply the correct rate for each:
ephemeral_5m_input_tokens × 1.25× base input price
ephemeral_1h_input_tokens × 2× base input price
Fall back to the current single-rate behavior when the breakdown is not available.
This is partly an upstream issue in LiteLLM's pricing data (cache_creation_input_token_cost only has one rate), but ccusage can handle the split independently since the token breakdown is already in the JSONL data.
Summary
When using
calculatemode, ccusage prices allcache_creation_input_tokensat a singlecache_creation_input_token_costrate from LiteLLM, which corresponds to the 5-minute cache write multiplier (1.25× base input). However, Claude Code predominantly uses 1-hour caching, which Anthropic prices at 2× base input — a 60% higher rate.Anthropic official pricing
From Anthropic's pricing page:
Full model pricing table (relevant columns):
Data available in JSONL
Claude Code's JSONL files already include a
cache_creationbreakdown inside theusageobject that distinguishes the two durations:{ "usage": { "input_tokens": 3, "output_tokens": 10, "cache_creation_input_tokens": 23566, "cache_read_input_tokens": 19357, "cache_creation": { "ephemeral_5m_input_tokens": 0, "ephemeral_1h_input_tokens": 23566 } } }In my dataset (~40k usage records), the vast majority of cache creation tokens are 1-hour:
Current behavior in ccusage
In
packages/internal/src/pricing.ts,calculateCostFromPricinguses a singlecache_creation_input_token_cost(sourced from LiteLLM) for all cache creation tokens:There is no reference to
ephemeral_5m_input_tokensorephemeral_1h_input_tokensanywhere in the codebase (gh search code "ephemeral" --repo ryoppippi/ccusagereturns 0 results).Impact
Cost comparison on my usage data:
Suggested fix
When parsing JSONL records, check if
usage.cache_creationis a dict containingephemeral_5m_input_tokensandephemeral_1h_input_tokens. If so, apply the correct rate for each:ephemeral_5m_input_tokens× 1.25× base input priceephemeral_1h_input_tokens× 2× base input priceFall back to the current single-rate behavior when the breakdown is not available.
This is partly an upstream issue in LiteLLM's pricing data (
cache_creation_input_token_costonly has one rate), but ccusage can handle the split independently since the token breakdown is already in the JSONL data.