Skip to content

Graph revision#130

Open
wangyu-ustc wants to merge 12 commits into
mainfrom
graph_revision
Open

Graph revision#130
wangyu-ustc wants to merge 12 commits into
mainfrom
graph_revision

Conversation

@wangyu-ustc
Copy link
Copy Markdown
Collaborator

No description provided.

Jasonya added 12 commits April 3, 2026 21:33
- New: temporal knowledge graph (entity_nodes, entity_edges, episode_nodes, involves_edges)
- New: GraphMemoryManager with write path (W1-W5) and read path (R1-R4)
- Toggle: MIRIX_ENABLE_GRAPH_MEMORY=true/false (default off)
- LoCoMo benchmark: +3.05% LLM Judge (0.5429 → 0.5734) on 1540 questions
- Zero changes to original logic when disabled
Graph memory returns {"context": "<pre-formatted str>"} instead of the
{"total_count": N, "items": [...]} shape used by other memory types, so
the existing total_count==0 short-circuit dropped graph context entirely.
Split the empty-data check from the count check and add a graph-specific
branch that reads the context string directly.
Introduces a deterministic conflict-resolution path for semantic memory
inserts, with source provenance (turn_id / chunk_id / serial / occurred_at)
flowing from /memory/add through to stored records. Enabled per meta-agent
via the new `enable_conflict_resolution` flag; legacy free-form inserts
remain the default.

Schema:
- `users.turn_counter`, `users.chunk_counter` — per-user monotonic counters
  used by `/memory/add` to fill in fallback provenance when the client does
  not provide source_meta.
- `episodic_memory.source_refs`, `semantic_memory.source_refs` —
  provenance pointers from stored memories back to their source units.
- `semantic_memory.prior_values` — history of values that have been
  superseded under the conflict-resolution path.

Services:
- `UserManager.reserve_source_ids` — atomic counter bump used by the
  /memory/add fallback.
- New `semantic_memory_upsert_fact` tool gated by the agent flag.
- `MetaAgent` system prompt augmentation when the flag is on.

Docs: `docs/mab_conflict_resolution_and_provenance.md`,
      `docs/mab_raw_chunk_side_channel.md`,
      `docs/mab_user_id_isolation_fix.md`.
Replaces v2 single-graph memory with two independent Neo4j graphs — one
per existing MIRIX memory layer:

- G_episodic: (:Episode) + (:EpisodicEntity), with [:NEXT] temporal edges,
  [:EP_RELATES] entity edges (with keywords + embedding), and [:MENTIONS]
  episode→entity links. Driven by EpisodicMemoryManager.insert_event.
- G_semantic: (:Concept) + (:SemanticEntity), with [:CONCEPT_RELATES]
  concept-concept edges (LLM-judged at insert time), [:SEM_RELATES] entity
  edges, and [:MENTIONS]. Driven by SemanticMemoryManager.insert_semantic_item.

Retrieval (GraphRetrieverDispatcher):
- 1 LLM call to split the query into ll/hl keywords (cached in Redis)
- 1 batch embed call for both keyword sets
- Parallel asyncio.gather over EpisodicRetriever + SemanticRetriever
- Each retriever runs LightRAG dual-level vector search (ll → entity name
  vector index, hl → relation keyword vector index), round-robin merges,
  reverses MENTIONS to fetch items, then one-hop expands (NEXT for
  episodes, CONCEPT_RELATES for concepts).
- 50/50 token budget split across the two graphs, format as a combined
  "## Episodic KG / ## Semantic KG" markdown payload.

Zero-overhead default:
- All hooks gated on settings.enable_graph_memory (default False).
- Neo4j compose service is profile-gated ("graph"); mirix_api's depends_on
  is required: false, so plain `docker compose up` skips Neo4j.
- Token tracker (mirix/database/token_tracker.py) is disabled by default;
  record() is a no-op until enable() is called by the eval harness via
  POST /debug/token_stats/reset.

Schema bootstrap (mirix/database/neo4j_client.py):
- 6 unique constraints, 2 btree indexes, 5 vector indexes (Neo4j 5.13+)
- v3 (:Entity / :Event) cleanup runs first; safe on fresh DBs
- Idempotent: re-running on existing DBs is a no-op

Removed:
- mirix/orm/graph_memory.py (v2 single-graph ORM)
- mirix/services/graph_memory_manager.py (v2 manager)

Docs:
- docs/graph_memory_v4/README.md: design overview + zero-overhead notes
- docs/graph_memory_v4/v4_graph_memory.md: per-file source + diffs
- docs/graph_memory_v4/kg_overview_{episodic,semantic}.png: top-N visualizations
- docs/graph_memory_v4/kg_subgraph_{identity,family_camping,art_creativity}.png:
  paired episodic-vs-semantic zoom-ins on shared themes (conv-26)

Configuration:
- MIRIX_ENABLE_GRAPH_MEMORY=true
- MIRIX_NEO4J_URI=bolt://neo4j:7687
- MIRIX_NEO4J_USER, MIRIX_NEO4J_PASSWORD, MIRIX_NEO4J_DATABASE
- MIRIX_NEO4J_VECTOR_DIM (default 1536, match the embedding model in use)

Tested with gpt-4.1-mini + text-embedding-3-small + Neo4j 5.20-community
on LoCoMo conv-26 (154 QA non-adversarial). See docs/graph_memory_v4/.
One-shot script: runs main_eval.py (LoCoMo sample 0) followed by
organize_results.py, then prints overall accuracy + per-category breakdown.

Pre-flight checks that server is up on :8531 and that locomo10.json exists.
Output goes to evals/results/locomo/v4_<timestamp>/.
Combines main and graph_revision: v2/v4 graph memory, dual-graph LightRAG
retrieval, MAB conflict resolution, graph retrievers.
1. episodic_memory_manager: pgvector embedding-search SELECT was missing
   source_refs, causing to_pydantic() to receive None for a non-nullable
   List field — every episodic search threw a Pydantic ValidationError
   and silently returned no memories. Add source_refs to the explicit
   select() column list.

2. semantic_memory_manager: same bug on the semantic side, plus
   prior_values. Add both to the embedding-search SELECT.

3. memory_tools.semantic_memory_insert: indexed item['source'] directly,
   so any LLM call that omitted the source field (which it commonly
   does — source is the least essential field) crashed with KeyError
   and lost the whole item. Switch to item.get('source', '').

Net effect on LoCoMo conv-26 with 0201c config: 20.4% -> 80.3% (the
SELECT fix alone). The source KeyError was masking real semantic
memory writes in graph-mode LongMemEval ingest.

Also ignores evals/snapshots/ — local-only memory dumps, large and
regenerable.
…h is on

retrieve_memories_by_keywords: when MIRIX_ENABLE_GRAPH_MEMORY=true,
episodic and semantic retrieval is served entirely by the v4 dual-graph
dispatcher. The flat PG episodic/semantic search is gated behind
'not settings.enable_graph_memory' (kept as a fallback for graph-off
mode). The other four memory types (resource / procedural /
knowledge_vault / core) have no graph counterpart and are always
retrieved flat — unchanged.

On LoCoMo conv-26, v5 (pure graph) scores 84.2% vs v4 (graph+flat
side-by-side) 84.9% — effectively a wash (1 question), confirming the
flat layer is redundant once the graph layer covers the same recall.

graph_retriever_dispatcher.DEFAULT_MAX_TOTAL_TOKENS: 12000 -> 24000.
At 12k the formatted graph context measured ~13k tokens on
LongMemEval-S, i.e. already over budget — apply_budget_to_search was
truncating the tail. Doubled to give counting / multi-session questions
recall headroom. Still well under the 'graph should be ≤1/3 of the
128k window' discipline.
evals/longmem_eval.py (new): runner for MemoryAgentBench LongMemEval-S
(longmemeval_s* split). Reuses MirixMemorySystem + TaskAgent +
organize_results unchanged — only the data layer differs. Parses each
context's per-session [Chat Time, messages] structure, then further
splits each session into <=4096-char chunks on message boundaries so
the extractor sees small blocks (a whole session is ~14k chars and
measurably dilutes LightRAG recall). Every sub-chunk inherits its
session's chat time as occurred_at. Also records memory_stats
(stored chars across PG flat + Neo4j graph) so no-graph and graph runs
can be compared on the same yardstick.

evals/memory_snapshot.py (new): save / load / list / delete memory
snapshots so an expensive ingest (hours, real OpenAI cost) can be
reused. pg_dump for the seven memory tables + agents/messages,
plus a full Cypher-based Neo4j node/relationship export to JSON. load
truncates first then restores, so the snapshot is the exact state
afterwards.

evals/mirix_memory_system.py: add_chunk now accepts an optional
occurred_at parameter and forwards it to client.add. Without it the
episodic agent guesses a year (and on LongMemEval guesses the ingest
year — 2026 — collapsing temporal questions). LoCoMo's runner already
embeds dates in the chunk text so it didn't need this; LongMemEval's
dates live in the per-session Chat Time, which the new runner now
passes through.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants