docs: matrix v1.2 internal results + #994 FTS-revival impact by groksrc · Pull Request #30 · basicmachines-co/basic-memory-benchmarks

groksrc · 2026-06-13T05:05:48Z

Internal benchmark results record (run 2026-06-12, local, zero API spend).

Headlines

BM leads QA accuracy on LongMemEval-S (0.617 vs mem0 0.417) and is a close 2nd on ConvoMem (0.792 vs mem0 0.474). mem0 edges retrieval recall but abstains 2-3x as often — its retrieved chunks are less answer-bearing. Retrieval recall ≠ answer quality.
PR #994 (FTS-revival): full corrected-LoCoMo retrieval +7.9 recall@5 / +10.0 MRR (every category up); QA accuracy +3.7 on the non-adversarial q300 subset.
multi_hop stays ~0.08 — bottlenecked by BM returning bullet-level chunks that strip document-level context (the session date is in the title), not by FTS. That's the next product fix.
Full-context is a poor baseline at small-model scale (LongMemEval-S 0.217) but wins on the smaller ConvoMem cs10 (0.825) — confirms full-context only beats retrieval while the corpus fits the model's working window.

Fairness note

mem0 ran in raw-add mode (infer=false) to match the June 10 baseline; its published numbers use infer=true (LLM extraction). A future matrix should run both and document the extraction model.

Run artifacts under benchmarks/runs/ are gitignored; this is the human-readable record.

🤖 Generated with Claude Code

Records the v1.2 benchmark matrix (LongMemEval-S, ConvoMem; bm-local, mem0-local, baselines) and PR #994's measured impact. Headlines: - BM leads QA accuracy on LongMemEval-S (0.617 vs mem0 0.417) and is a close 2nd on ConvoMem (0.792 vs mem0 0.474); mem0 edges retrieval recall but abstains 2-3x as often — its chunks are less answer-bearing. - #994 (FTS-revival): retrieval +7.9 recall@5 / +10.0 MRR on the full corrected-LoCoMo set (every category up); QA accuracy +3.7 on the non-adversarial q300 subset. - multi_hop stays ~0.08 — bottlenecked by BM returning bullet-level chunks that strip document context (next product fix), not by FTS. Also ignores the local .supermemory/ server data dir. Run artifacts live under benchmarks/runs/ (gitignored); this is the human-readable record. Signed-off-by: Drew Cain <groksrc@gmail.com>

groksrc merged commit 259f72c into main Jun 13, 2026
1 check passed

groksrc deleted the results/matrix-v1.2 branch June 13, 2026 05:08

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

docs: matrix v1.2 internal results + #994 FTS-revival impact#30

docs: matrix v1.2 internal results + #994 FTS-revival impact#30
groksrc merged 1 commit into
mainfrom
results/matrix-v1.2

groksrc commented Jun 13, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

groksrc commented Jun 13, 2026

Headlines

Fairness note

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant