Skip to content

feat: configurable QA context budget; single-hit providers get full budget#26

Merged
groksrc merged 1 commit into
mainfrom
feat/qa-context-budget
Jun 12, 2026
Merged

feat: configurable QA context budget; single-hit providers get full budget#26
groksrc merged 1 commit into
mainfrom
feat/qa-context-budget

Conversation

@groksrc

@groksrc groksrc commented Jun 12, 2026

Copy link
Copy Markdown
Member

Matrix v1 finding: the fixed 2,500-char per-hit slice truncated the full-context baseline's single mega-hit to ~3K chars of a ~115K-token haystack → abstention on all 25 LME questions → meaningless baseline.

  • assemble_context per-hit cap is now max(slice, budget // hit_count) — a lone hit can use the whole budget; full hit lists keep existing slicing.
  • run qa --max-context-chars overrides the total budget (default 12,000) for full-context baseline passes; mean_answer_prompt_chars keeps the context-cost axis visible.

2 new tests + 1 updated; suite green (103), lint clean.

🤖 Generated with Claude Code

…udget

Matrix v1 exposed that the fixed 2,500-char per-hit slice gutted the
full-context baseline: its single mega-hit was truncated to ~3K chars
of a ~115K-token haystack, forcing abstention on all 25 LongMemEval
questions — making the baseline meaningless.

- assemble_context per-hit cap is now max(slice, budget // hit_count):
  a lone hit (full-context) can use the entire budget; full hit lists
  keep the existing slicing.
- run qa --max-context-chars overrides the total budget (default still
  12,000); full-context baseline runs pass a large value, reported via
  mean_answer_prompt_chars so the context-cost axis stays visible.

2 new tests; per-hit cap test updated for the intentional single-hit
semantics.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Signed-off-by: Drew Cain <groksrc@gmail.com>
@groksrc groksrc merged commit 7afda42 into main Jun 12, 2026
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant