Context: fresh benchmark run (2026-06-10)
Same run as #950: full LoCoMo via basic-memory-benchmarks (run 3e11241b9d56, 1,986 queries), BM main @ 0.21.6 (de53e0e) vs mem0ai 2.0.5. See #950 for headline numbers and setup.
The finding
Of the 281 queries BM uniquely missed at recall@5, 146 are true retrieval misses (gold doc absent from top-10). The dominant pattern is cross-conversation entity confusion — proper nouns in the query don't pull their weight against generic semantic similarity, so documents from the wrong conversation outrank the gold doc:
- "What are Joanna's hobbies?" — gold is
locomo-c03-s01/s02; BM's top hits lead with locomo-c07-* docs (different people entirely)
- "Who is Anthony?" — gold
locomo-c04-s04; BM's top 5 are spread across c06/c01/c05, none from the right conversation
- "What symbolic gifts do Deborah and Jolene have from their mothers?" — right conversation, but generic Deborah/Jolene chatter outranks the session that answers it
This concentrates in the single-hop category (entity-attribute lookups), BM's weakest: R@5 0.486 vs mem0's 0.592.
Proposal
This is where the knowledge graph should be an unfair advantage — entities are first-class in Basic Memory. Boost search candidates whose extracted entities match proper nouns / entity mentions in the query, so "Joanna" strongly prefers documents whose entity set actually contains Joanna. Possible shapes:
- a ranking boost term when query-detected entities intersect the doc/chunk's linked entities
- or an entity-filtered first pass with semantic fallback, fused with the existing hybrid score
Complementary to the rerank stage proposed in #950: reranking fixes near-misses already in the candidate set; entity boosting fixes the candidate set itself. Both measurable commit-to-commit with the benchmarks repo's worktree workflow.
🤖 Generated with Claude Code
Context: fresh benchmark run (2026-06-10)
Same run as #950: full LoCoMo via
basic-memory-benchmarks(run3e11241b9d56, 1,986 queries), BM main @ 0.21.6 (de53e0e) vs mem0ai 2.0.5. See #950 for headline numbers and setup.The finding
Of the 281 queries BM uniquely missed at recall@5, 146 are true retrieval misses (gold doc absent from top-10). The dominant pattern is cross-conversation entity confusion — proper nouns in the query don't pull their weight against generic semantic similarity, so documents from the wrong conversation outrank the gold doc:
locomo-c03-s01/s02; BM's top hits lead withlocomo-c07-*docs (different people entirely)locomo-c04-s04; BM's top 5 are spread across c06/c01/c05, none from the right conversationThis concentrates in the single-hop category (entity-attribute lookups), BM's weakest: R@5 0.486 vs mem0's 0.592.
Proposal
This is where the knowledge graph should be an unfair advantage — entities are first-class in Basic Memory. Boost search candidates whose extracted entities match proper nouns / entity mentions in the query, so "Joanna" strongly prefers documents whose entity set actually contains Joanna. Possible shapes:
Complementary to the rerank stage proposed in #950: reranking fixes near-misses already in the candidate set; entity boosting fixes the candidate set itself. Both measurable commit-to-commit with the benchmarks repo's worktree workflow.
🤖 Generated with Claude Code