Summary
Health currently reports agent token/cost usage from the latest runtime session per agent. That is useful for context pressure, but it does not answer historical questions like daily totals, trends over time, or usage across multiple sessions.
This follow-up is split out from #358 so the cost-accuracy audit can stay focused on current-source correctness and honest runtime-reported cost provenance.
Problem
- Latest-session usage is not the same as a day/week/month aggregate.
- Docs/UI can accidentally imply broader history than the current data source supports.
- Operators care more about reliable token usage trends than highly precise dollar totals.
Goals
- Design a durable usage-history model for agent token counts over time.
- Track input, output, cache-read, cache-write, and total tokens consistently.
- Preserve runtime-reported cost when available, but keep token history as the primary reliable metric.
- Avoid double-counting across retries, aborted runs, deleted sessions, or reindexed runtime transcript files.
- Expose clear Health UI views for latest session vs historical windows.
Questions to answer
- Should history be derived from runtime session JSONL scans, the existing usage recorder, or a new durable usage rollup?
- What time windows should Health show first: 24h, 7d, 30d?
- What is the stable event identity used to prevent duplicate counting?
- How should deleted/rotated runtime sessions affect historical totals?
- Should cost history be shown only when runtime-reported cost exists?
Acceptance criteria
- Document the difference between latest-session context usage and historical token usage.
- Define a durable aggregation strategy and duplicate-counting rules.
- Add tests for multi-session aggregation, retries/aborts where observable, missing cost, and deleted/rotated sessions.
- Update Health UI labels so latest-session and historical usage are not conflated.
Privacy
Public/sanitized issue. Do not include private usage records, prompts, transcripts, API keys, or billing exports.
Summary
Health currently reports agent token/cost usage from the latest runtime session per agent. That is useful for context pressure, but it does not answer historical questions like daily totals, trends over time, or usage across multiple sessions.
This follow-up is split out from #358 so the cost-accuracy audit can stay focused on current-source correctness and honest runtime-reported cost provenance.
Problem
Goals
Questions to answer
Acceptance criteria
Privacy
Public/sanitized issue. Do not include private usage records, prompts, transcripts, API keys, or billing exports.