Graph revision#130
Open
wangyu-ustc wants to merge 12 commits into
Open
Conversation
- New: temporal knowledge graph (entity_nodes, entity_edges, episode_nodes, involves_edges) - New: GraphMemoryManager with write path (W1-W5) and read path (R1-R4) - Toggle: MIRIX_ENABLE_GRAPH_MEMORY=true/false (default off) - LoCoMo benchmark: +3.05% LLM Judge (0.5429 → 0.5734) on 1540 questions - Zero changes to original logic when disabled
Graph memory returns {"context": "<pre-formatted str>"} instead of the
{"total_count": N, "items": [...]} shape used by other memory types, so
the existing total_count==0 short-circuit dropped graph context entirely.
Split the empty-data check from the count check and add a graph-specific
branch that reads the context string directly.
Introduces a deterministic conflict-resolution path for semantic memory
inserts, with source provenance (turn_id / chunk_id / serial / occurred_at)
flowing from /memory/add through to stored records. Enabled per meta-agent
via the new `enable_conflict_resolution` flag; legacy free-form inserts
remain the default.
Schema:
- `users.turn_counter`, `users.chunk_counter` — per-user monotonic counters
used by `/memory/add` to fill in fallback provenance when the client does
not provide source_meta.
- `episodic_memory.source_refs`, `semantic_memory.source_refs` —
provenance pointers from stored memories back to their source units.
- `semantic_memory.prior_values` — history of values that have been
superseded under the conflict-resolution path.
Services:
- `UserManager.reserve_source_ids` — atomic counter bump used by the
/memory/add fallback.
- New `semantic_memory_upsert_fact` tool gated by the agent flag.
- `MetaAgent` system prompt augmentation when the flag is on.
Docs: `docs/mab_conflict_resolution_and_provenance.md`,
`docs/mab_raw_chunk_side_channel.md`,
`docs/mab_user_id_isolation_fix.md`.
Replaces v2 single-graph memory with two independent Neo4j graphs — one
per existing MIRIX memory layer:
- G_episodic: (:Episode) + (:EpisodicEntity), with [:NEXT] temporal edges,
[:EP_RELATES] entity edges (with keywords + embedding), and [:MENTIONS]
episode→entity links. Driven by EpisodicMemoryManager.insert_event.
- G_semantic: (:Concept) + (:SemanticEntity), with [:CONCEPT_RELATES]
concept-concept edges (LLM-judged at insert time), [:SEM_RELATES] entity
edges, and [:MENTIONS]. Driven by SemanticMemoryManager.insert_semantic_item.
Retrieval (GraphRetrieverDispatcher):
- 1 LLM call to split the query into ll/hl keywords (cached in Redis)
- 1 batch embed call for both keyword sets
- Parallel asyncio.gather over EpisodicRetriever + SemanticRetriever
- Each retriever runs LightRAG dual-level vector search (ll → entity name
vector index, hl → relation keyword vector index), round-robin merges,
reverses MENTIONS to fetch items, then one-hop expands (NEXT for
episodes, CONCEPT_RELATES for concepts).
- 50/50 token budget split across the two graphs, format as a combined
"## Episodic KG / ## Semantic KG" markdown payload.
Zero-overhead default:
- All hooks gated on settings.enable_graph_memory (default False).
- Neo4j compose service is profile-gated ("graph"); mirix_api's depends_on
is required: false, so plain `docker compose up` skips Neo4j.
- Token tracker (mirix/database/token_tracker.py) is disabled by default;
record() is a no-op until enable() is called by the eval harness via
POST /debug/token_stats/reset.
Schema bootstrap (mirix/database/neo4j_client.py):
- 6 unique constraints, 2 btree indexes, 5 vector indexes (Neo4j 5.13+)
- v3 (:Entity / :Event) cleanup runs first; safe on fresh DBs
- Idempotent: re-running on existing DBs is a no-op
Removed:
- mirix/orm/graph_memory.py (v2 single-graph ORM)
- mirix/services/graph_memory_manager.py (v2 manager)
Docs:
- docs/graph_memory_v4/README.md: design overview + zero-overhead notes
- docs/graph_memory_v4/v4_graph_memory.md: per-file source + diffs
- docs/graph_memory_v4/kg_overview_{episodic,semantic}.png: top-N visualizations
- docs/graph_memory_v4/kg_subgraph_{identity,family_camping,art_creativity}.png:
paired episodic-vs-semantic zoom-ins on shared themes (conv-26)
Configuration:
- MIRIX_ENABLE_GRAPH_MEMORY=true
- MIRIX_NEO4J_URI=bolt://neo4j:7687
- MIRIX_NEO4J_USER, MIRIX_NEO4J_PASSWORD, MIRIX_NEO4J_DATABASE
- MIRIX_NEO4J_VECTOR_DIM (default 1536, match the embedding model in use)
Tested with gpt-4.1-mini + text-embedding-3-small + Neo4j 5.20-community
on LoCoMo conv-26 (154 QA non-adversarial). See docs/graph_memory_v4/.
One-shot script: runs main_eval.py (LoCoMo sample 0) followed by organize_results.py, then prints overall accuracy + per-category breakdown. Pre-flight checks that server is up on :8531 and that locomo10.json exists. Output goes to evals/results/locomo/v4_<timestamp>/.
Combines main and graph_revision: v2/v4 graph memory, dual-graph LightRAG retrieval, MAB conflict resolution, graph retrievers.
1. episodic_memory_manager: pgvector embedding-search SELECT was missing
source_refs, causing to_pydantic() to receive None for a non-nullable
List field — every episodic search threw a Pydantic ValidationError
and silently returned no memories. Add source_refs to the explicit
select() column list.
2. semantic_memory_manager: same bug on the semantic side, plus
prior_values. Add both to the embedding-search SELECT.
3. memory_tools.semantic_memory_insert: indexed item['source'] directly,
so any LLM call that omitted the source field (which it commonly
does — source is the least essential field) crashed with KeyError
and lost the whole item. Switch to item.get('source', '').
Net effect on LoCoMo conv-26 with 0201c config: 20.4% -> 80.3% (the
SELECT fix alone). The source KeyError was masking real semantic
memory writes in graph-mode LongMemEval ingest.
Also ignores evals/snapshots/ — local-only memory dumps, large and
regenerable.
…h is on retrieve_memories_by_keywords: when MIRIX_ENABLE_GRAPH_MEMORY=true, episodic and semantic retrieval is served entirely by the v4 dual-graph dispatcher. The flat PG episodic/semantic search is gated behind 'not settings.enable_graph_memory' (kept as a fallback for graph-off mode). The other four memory types (resource / procedural / knowledge_vault / core) have no graph counterpart and are always retrieved flat — unchanged. On LoCoMo conv-26, v5 (pure graph) scores 84.2% vs v4 (graph+flat side-by-side) 84.9% — effectively a wash (1 question), confirming the flat layer is redundant once the graph layer covers the same recall. graph_retriever_dispatcher.DEFAULT_MAX_TOTAL_TOKENS: 12000 -> 24000. At 12k the formatted graph context measured ~13k tokens on LongMemEval-S, i.e. already over budget — apply_budget_to_search was truncating the tail. Doubled to give counting / multi-session questions recall headroom. Still well under the 'graph should be ≤1/3 of the 128k window' discipline.
evals/longmem_eval.py (new): runner for MemoryAgentBench LongMemEval-S (longmemeval_s* split). Reuses MirixMemorySystem + TaskAgent + organize_results unchanged — only the data layer differs. Parses each context's per-session [Chat Time, messages] structure, then further splits each session into <=4096-char chunks on message boundaries so the extractor sees small blocks (a whole session is ~14k chars and measurably dilutes LightRAG recall). Every sub-chunk inherits its session's chat time as occurred_at. Also records memory_stats (stored chars across PG flat + Neo4j graph) so no-graph and graph runs can be compared on the same yardstick. evals/memory_snapshot.py (new): save / load / list / delete memory snapshots so an expensive ingest (hours, real OpenAI cost) can be reused. pg_dump for the seven memory tables + agents/messages, plus a full Cypher-based Neo4j node/relationship export to JSON. load truncates first then restores, so the snapshot is the exact state afterwards. evals/mirix_memory_system.py: add_chunk now accepts an optional occurred_at parameter and forwards it to client.add. Without it the episodic agent guesses a year (and on LongMemEval guesses the ingest year — 2026 — collapsing temporal questions). LoCoMo's runner already embeds dates in the chunk text so it didn't need this; LongMemEval's dates live in the per-session Chat Time, which the new runner now passes through.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
No description provided.