Skip to content

feat: filesystem-grep and full-context baseline providers#18

Merged
groksrc merged 1 commit into
mainfrom
feat/baseline-providers
Jun 12, 2026
Merged

feat: filesystem-grep and full-context baseline providers#18
groksrc merged 1 commit into
mainfrom
feat/baseline-providers

Conversation

@groksrc

@groksrc groksrc commented Jun 12, 2026

Copy link
Copy Markdown
Member

Summary

Adds the two honesty-floor providers every credible comparison needs:

  • baseline-grep — deterministic TF (log-damped) term matching over raw corpus markdown. No index, no embeddings, no LLM. Squared coverage factor so full-coverage docs outrank single-term spam. The Letta-style argument: a memory system that can't beat grep isn't adding retrieval value.
  • baseline-fullcontext — no retrieval; the whole corpus is one hit (capped 600K chars) for the QA stage to answer over. Full-context repeatedly beats dedicated memory systems on window-sized corpora, so QA accuracy and token cost should be read against it. Returns no doc ids — retrieval metrics are meaningless for it by design, and it says so.

Both work in grouped (LongMemEval) and flat (LoCoMo) modes since they implement the standard provider interface.

Verification

  • 9 new tests (ranking, coverage-vs-spam, truncation flag, factory registration, cleanup).
  • Smoke run on the synthetic corpus: grep recall@5 0.75 / content-hit 0.75; full-context recall 0 / content-hit 0.75 — both as designed.
  • Full suite green.

🤖 Generated with Claude Code

Published memory-system comparisons are only credible read against two
cheap floors:

- baseline-grep: deterministic TF-with-log-damping term matching over
  the raw corpus markdown — no index, no embeddings, no LLM. Squared
  coverage factor so a doc matching every distinct query term outranks
  a doc spamming one term. A memory system that can't beat this isn't
  adding retrieval value.
- baseline-fullcontext: no retrieval at all; the whole corpus is
  returned as a single hit (capped at 600K chars ≈ 150K tokens) for the
  QA stage to answer over. Published results repeatedly show
  full-context beating dedicated memory systems on corpora that fit a
  context window; QA accuracy and token cost should be read against
  this provider's. Makes no doc-id claims, so retrieval metrics are
  meaningless for it by design.

Both registered in the provider factory; 9 new tests; smoke-verified on
the synthetic corpus (grep recall@5 0.75; full-context recall 0 /
content-hit 0.75 as expected).

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Signed-off-by: Drew Cain <groksrc@gmail.com>
@groksrc groksrc merged commit 9c22bdc into main Jun 12, 2026
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant