Skip to content

fix: record answer prompt size for cross-provider context-cost reporting#24

Merged
groksrc merged 1 commit into
mainfrom
fix/qa-context-size-reporting
Jun 12, 2026
Merged

fix: record answer prompt size for cross-provider context-cost reporting#24
groksrc merged 1 commit into
mainfrom
fix/qa-context-size-reporting

Conversation

@groksrc

@groksrc groksrc commented Jun 12, 2026

Copy link
Copy Markdown
Member

Summary

The claude CLI transport buries the user prompt in cache_creation_input_tokens along with ~26K of its own system overhead, so usage.input_tokens (~10/case) wildly under-reports fed context and no clean per-prompt figure is separable.

This records answer_prompt_chars per QA case plus mean_answer_prompt_chars per provider — the comparable cross-provider context-size measure (same answerer sees them all), anchoring the accuracy-vs-context-cost axis any credible comparison needs (full-context baselines will show ~10-50x the context of retrieval systems). Runner-reported token fields stay for cost telemetry.

1 new test; suite green (100), lint clean.

🤖 Generated with Claude Code

Runner token accounting is transport-dependent: the claude CLI reports
the user prompt inside cache_creation_input_tokens alongside ~26K of
its own system overhead, so usage.input_tokens (~10/case) wildly
under-reports the fed context, and no clean per-prompt figure is
separable.

Record len(answer_prompt) per case (answer_prompt_chars) and the
per-provider mean in qa-summary. Chars of assembled prompt are the
comparable cross-provider context-size measure — the same answerer
sees them all — and they anchor the accuracy-vs-context-cost axis of
any published comparison. Runner-reported token fields are kept as-is
for cost telemetry.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Signed-off-by: Drew Cain <groksrc@gmail.com>
@groksrc groksrc merged commit 209beaa into main Jun 12, 2026
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant