feat: filesystem-grep and full-context baseline providers by groksrc · Pull Request #18 · basicmachines-co/basic-memory-benchmarks

groksrc · 2026-06-12T18:53:52Z

Summary

Adds the two honesty-floor providers every credible comparison needs:

baseline-grep — deterministic TF (log-damped) term matching over raw corpus markdown. No index, no embeddings, no LLM. Squared coverage factor so full-coverage docs outrank single-term spam. The Letta-style argument: a memory system that can't beat grep isn't adding retrieval value.
baseline-fullcontext — no retrieval; the whole corpus is one hit (capped 600K chars) for the QA stage to answer over. Full-context repeatedly beats dedicated memory systems on window-sized corpora, so QA accuracy and token cost should be read against it. Returns no doc ids — retrieval metrics are meaningless for it by design, and it says so.

Both work in grouped (LongMemEval) and flat (LoCoMo) modes since they implement the standard provider interface.

Verification

9 new tests (ranking, coverage-vs-spam, truncation flag, factory registration, cleanup).
Smoke run on the synthetic corpus: grep recall@5 0.75 / content-hit 0.75; full-context recall 0 / content-hit 0.75 — both as designed.
Full suite green.

🤖 Generated with Claude Code

Published memory-system comparisons are only credible read against two cheap floors: - baseline-grep: deterministic TF-with-log-damping term matching over the raw corpus markdown — no index, no embeddings, no LLM. Squared coverage factor so a doc matching every distinct query term outranks a doc spamming one term. A memory system that can't beat this isn't adding retrieval value. - baseline-fullcontext: no retrieval at all; the whole corpus is returned as a single hit (capped at 600K chars ≈ 150K tokens) for the QA stage to answer over. Published results repeatedly show full-context beating dedicated memory systems on corpora that fit a context window; QA accuracy and token cost should be read against this provider's. Makes no doc-id claims, so retrieval metrics are meaningless for it by design. Both registered in the provider factory; 9 new tests; smoke-verified on the synthetic corpus (grep recall@5 0.75; full-context recall 0 / content-hit 0.75 as expected). Co-Authored-By: Claude Fable 5 <noreply@anthropic.com> Signed-off-by: Drew Cain <groksrc@gmail.com>

groksrc merged commit 9c22bdc into main Jun 12, 2026
1 check passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: filesystem-grep and full-context baseline providers#18

feat: filesystem-grep and full-context baseline providers#18
groksrc merged 1 commit into
mainfrom
feat/baseline-providers

groksrc commented Jun 12, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

groksrc commented Jun 12, 2026

Summary

Verification

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant