fix: record answer prompt size for cross-provider context-cost reporting by groksrc · Pull Request #24 · basicmachines-co/basic-memory-benchmarks

groksrc · 2026-06-12T21:59:57Z

Summary

The claude CLI transport buries the user prompt in cache_creation_input_tokens along with ~26K of its own system overhead, so usage.input_tokens (~10/case) wildly under-reports fed context and no clean per-prompt figure is separable.

This records answer_prompt_chars per QA case plus mean_answer_prompt_chars per provider — the comparable cross-provider context-size measure (same answerer sees them all), anchoring the accuracy-vs-context-cost axis any credible comparison needs (full-context baselines will show ~10-50x the context of retrieval systems). Runner-reported token fields stay for cost telemetry.

1 new test; suite green (100), lint clean.

🤖 Generated with Claude Code

Runner token accounting is transport-dependent: the claude CLI reports the user prompt inside cache_creation_input_tokens alongside ~26K of its own system overhead, so usage.input_tokens (~10/case) wildly under-reports the fed context, and no clean per-prompt figure is separable. Record len(answer_prompt) per case (answer_prompt_chars) and the per-provider mean in qa-summary. Chars of assembled prompt are the comparable cross-provider context-size measure — the same answerer sees them all — and they anchor the accuracy-vs-context-cost axis of any published comparison. Runner-reported token fields are kept as-is for cost telemetry. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com> Signed-off-by: Drew Cain <groksrc@gmail.com>

groksrc merged commit 209beaa into main Jun 12, 2026
1 check passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: record answer prompt size for cross-provider context-cost reporting#24

fix: record answer prompt size for cross-provider context-cost reporting#24
groksrc merged 1 commit into
mainfrom
fix/qa-context-size-reporting

groksrc commented Jun 12, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

groksrc commented Jun 12, 2026

Summary

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant