Context: fresh benchmark run (2026-06-10)
Full LoCoMo retrieval run via basic-memory-benchmarks (run 3e11241b9d56, 1,986 queries): Basic Memory main @ 0.21.6 (commit de53e0e) vs mem0ai 2.0.5, both at current latest with out-of-the-box defaults. Reproducible via the benchmarks repo with basic-memory-benchmarks#13 and basic-memory-benchmarks#14 applied.
Headline (LoCoMo categories 1–4):
|
recall@5 |
recall@10 |
MRR |
content-hit |
mean latency |
p95 |
| bm-local |
0.733 |
0.839 |
0.619 |
0.277 |
45ms |
53ms |
| mem0-local |
0.791 |
0.891 |
0.648 |
0.344 |
882ms |
1,603ms |
(Good news vs. the earlier benchmark issue basic-memory-benchmarks#2: content-hit went from 15.5% to ~30% overall since February.)
The finding
Head-to-head per query, BM uniquely missed 281 queries (recall@5) that mem0 got; mem0 uniquely missed 160. Decomposing BM's 281 misses:
- 135 (~48%) are ranking failures, not retrieval failures — the gold doc IS in BM's results, just at rank 6–10 (clustered at 6–8), or partially retrieved for multi-doc answers. recall@10 is already 0.843; the problem is converting it to recall@5.
- 146 are true top-10 misses (separate issue on entity boosting).
Proposal
Add an optional rerank stage over the top-N (e.g. 20) hybrid candidates before final ordering. BM answers in 45ms mean vs mem0's 882ms — a cross-encoder reranker (fastembed ships bge/jina reranker models, consistent with the existing fastembed dependency) costs roughly 50–150ms on this corpus size, leaving BM still ~5–10x faster while directly targeting the largest single bucket of quality loss. Converting even most of the 135 rank-6–10 misses flips the headline comparison.
Worth a diagnostic pass on hybrid fusion weights at the same time — near-misses clustered just below the cutoff suggest the semantic/keyword legs may fuse suboptimally for short queries.
Measurable commit-to-commit with the benchmarks repo's worktree workflow (--bm-local-path + deterministic run IDs).
🤖 Generated with Claude Code
Context: fresh benchmark run (2026-06-10)
Full LoCoMo retrieval run via
basic-memory-benchmarks(run3e11241b9d56, 1,986 queries): Basic Memory main @ 0.21.6 (commit de53e0e) vs mem0ai 2.0.5, both at current latest with out-of-the-box defaults. Reproducible via the benchmarks repo with basic-memory-benchmarks#13 and basic-memory-benchmarks#14 applied.Headline (LoCoMo categories 1–4):
(Good news vs. the earlier benchmark issue basic-memory-benchmarks#2: content-hit went from 15.5% to ~30% overall since February.)
The finding
Head-to-head per query, BM uniquely missed 281 queries (recall@5) that mem0 got; mem0 uniquely missed 160. Decomposing BM's 281 misses:
Proposal
Add an optional rerank stage over the top-N (e.g. 20) hybrid candidates before final ordering. BM answers in 45ms mean vs mem0's 882ms — a cross-encoder reranker (fastembed ships bge/jina reranker models, consistent with the existing fastembed dependency) costs roughly 50–150ms on this corpus size, leaving BM still ~5–10x faster while directly targeting the largest single bucket of quality loss. Converting even most of the 135 rank-6–10 misses flips the headline comparison.
Worth a diagnostic pass on hybrid fusion weights at the same time — near-misses clustered just below the cutoff suggest the semantic/keyword legs may fuse suboptimally for short queries.
Measurable commit-to-commit with the benchmarks repo's worktree workflow (
--bm-local-path+ deterministic run IDs).🤖 Generated with Claude Code