Skip to content

feat: stratified LongMemEval dev slices#28

Merged
groksrc merged 1 commit into
mainfrom
feat/lme-stratified-slice
Jun 12, 2026
Merged

feat: stratified LongMemEval dev slices#28
groksrc merged 1 commit into
mainfrom
feat/lme-stratified-slice

Conversation

@groksrc

@groksrc groksrc commented Jun 12, 2026

Copy link
Copy Markdown
Member

--max-questions takes the file-order prefix, and the dataset is sorted by question type — so dev25 was 25× single-session-user. Fine for smoke tests; misleading for category comparisons (matrix v1's LME QA numbers cover one category only).

convert longmemeval --stratified samples evenly across the six question types (fixed seed, composition recorded in sampling.json). Prefix behavior unchanged without the flag.

3 new tests; suite green, lint clean.

🤖 Generated with Claude Code

--max-questions previously took the file-order prefix; the dataset is
sorted by question type, so dev25 was 25x single-session-user — fine
for smoke tests, misleading for category comparisons (matrix v1's LME
QA numbers cover one category only).

convert longmemeval --stratified samples max-questions evenly across
the six question types with a fixed seed and writes the composition to
sampling.json. Prefix behavior unchanged without the flag.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Signed-off-by: Drew Cain <groksrc@gmail.com>
@groksrc groksrc merged commit b60bd2b into main Jun 12, 2026
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant