perf: warm-session reuse across groups for bm-local (3x grouped speedup) by groksrc · Pull Request #22 · basicmachines-co/basic-memory-benchmarks

groksrc · 2026-06-12T19:59:02Z

Summary

Grouped runs created a fresh bm-local provider per group — new MCP session, full poll cycle — costing ~2.3 min/group (~19h extrapolated for full LongMemEval-S). This PR introduces opt-in group reuse: one provider instance serves every group.

Changes

BenchmarkProvider.supports_group_reuse (default False): grouped executor runs all groups through one instance — ingest per group, cleanup once at end with the base run config. Non-reuse providers keep exact prior semantics (covered by existing tests, unchanged).
bm-local opts in — empirically verified a warm bm mcp session serves projects added after it started. Project-per-group namespacing unchanged; resolved names cached per run id.
Fresh isolated config dir per instance (benchmarks/.bm-homes/, gitignored). Fixes a real bug found during this work: the persistent shared benchmarks/bm-home rotted across BM versions — a dev build's alembic migrations bricked the brew bm 0.22.0 (Can't locate revision n7i8j9k0l1m2). Also drops BASIC_MEMORY_HOME from the env.
Status polling backs off from 0.25s instead of a fixed 2s floor.

Measured (real BM 0.22.0, LongMemEval dev slice, 3 groups × 54 docs)

	before	after
wall clock	~7 min (~2.3 min/group)	2:14 (~45s/group)
full 500-question extrapolation	~19 h	~6.2 h
per-query search latency	1550 ms	527 ms
recall@5 / MRR	1.0 / 1.0	1.0 / 1.0 (identical)

Remaining per-group cost is embedding compute + reindex CLI startup — basic-memory-side, not harness overhead.

Testing

2 new runner tests (single-instance reuse, per-group namespacing, end-of-run cleanup, failure isolation under reuse); full suite green (91), lint clean.

🤖 Generated with Claude Code

Grouped runs (LongMemEval, ConvoMem) previously created a fresh bm-local provider per group: new MCP session, full status-poll cycle, and a shared persistent config dir. Measured ~2.3 min/group on the LongMemEval dev slice (~19h extrapolated for the full 500 questions). - BenchmarkProvider.supports_group_reuse (default False): when True the grouped executor runs every group through ONE provider instance (ingest per group, cleanup once at end of run with the base run config). Non-reuse providers keep exact prior semantics; the shared capability-probe instance serves group 0 so no extra instances are created either way. - bm-local opts in: empirically verified that a warm bm mcp session serves projects added after it started, so one session covers all groups (project-per-group namespacing unchanged). Resolved project names are now cached per run id rather than once per instance. - Fresh isolated config dir per provider instance under benchmarks/.bm-homes/ (gitignored). The previous persistent shared benchmarks/bm-home rotted across basic-memory versions — a dev build's alembic migrations bricked the brew-installed binary ('Can't locate revision') — and leaked projects between runs. BASIC_MEMORY_HOME is also dropped from the env for the same reason. - Status polling backs off from 0.25s instead of a fixed 2s floor. Measured: LongMemEval dev slice (3 groups, 54 docs each, real BM 0.22.0 via --bm-local-path) 2:14 total (~45s/group) vs ~2.3 min/group before; identical retrieval results (recall@5 1.0, MRR 1.0). Remaining per-group cost is embedding compute + reindex CLI start, not harness overhead. 2 new runner tests cover single-instance reuse, per-group namespacing, end-of-run cleanup, and failure isolation. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com> Signed-off-by: Drew Cain <groksrc@gmail.com>

groksrc merged commit 9e868c0 into main Jun 12, 2026
1 check passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

perf: warm-session reuse across groups for bm-local (3x grouped speedup)#22

perf: warm-session reuse across groups for bm-local (3x grouped speedup)#22
groksrc merged 1 commit into
mainfrom
perf/grouped-provider-reuse

groksrc commented Jun 12, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

groksrc commented Jun 12, 2026

Summary

Changes

Measured (real BM 0.22.0, LongMemEval dev slice, 3 groups × 54 docs)

Testing

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant