feat: configurable QA context budget; single-hit providers get full budget#26
Merged
Conversation
…udget Matrix v1 exposed that the fixed 2,500-char per-hit slice gutted the full-context baseline: its single mega-hit was truncated to ~3K chars of a ~115K-token haystack, forcing abstention on all 25 LongMemEval questions — making the baseline meaningless. - assemble_context per-hit cap is now max(slice, budget // hit_count): a lone hit (full-context) can use the entire budget; full hit lists keep the existing slicing. - run qa --max-context-chars overrides the total budget (default still 12,000); full-context baseline runs pass a large value, reported via mean_answer_prompt_chars so the context-cost axis stays visible. 2 new tests; per-hit cap test updated for the intentional single-hit semantics. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com> Signed-off-by: Drew Cain <groksrc@gmail.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Matrix v1 finding: the fixed 2,500-char per-hit slice truncated the full-context baseline's single mega-hit to ~3K chars of a ~115K-token haystack → abstention on all 25 LME questions → meaningless baseline.
assemble_contextper-hit cap is nowmax(slice, budget // hit_count)— a lone hit can use the whole budget; full hit lists keep existing slicing.run qa --max-context-charsoverrides the total budget (default 12,000) for full-context baseline passes;mean_answer_prompt_charskeeps the context-cost axis visible.2 new tests + 1 updated; suite green (103), lint clean.
🤖 Generated with Claude Code