Skip to content

rjalexa/ontocontext

Repository files navigation

ontocontext

Ontology-grounded memory primitive for LLM agents. Tiered salience decay, closed-world fact extraction, per-agent filtered views. Pure-stdlib storage, no vector store, no graph database.

ontocontext is a small library you wire into your own LLM agent or multi-agent system to give it selective, decaying long-term context — the kind of memory that remembers a user's allergy across an entire session, lets a "where's the bathroom?" question fade after three turns, and lets each subagent see only the slice of state it needs.

The repository also ships a museum-chatbot example as a reference integration, but the library is the product. The example is in examples/museum_chatbot/.


Why this, and not a vector store?

ontocontext Vector-store memory (mem0, etc.)
What it remembers Only what your ontology declares Whatever the LLM thinks is salient
Forgetting Declarative per-tier decay + bounded cardinality with recency/salience eviction LLM-judged DELETE; otherwise accumulates
Multi-agent slicing First-class: filter by concept prefix or class Bolt-on at the orchestrator
Auditability Read one JSON file to know what the system can remember Inspect emitted facts across thousands of turns
Cost per turn One LLM call (extraction) Two-plus (extraction + reconcile)
Storage dict[str, Fact] in process Vector DB + optional graph DB

If your domain is bounded enough that you can name the categories of things worth remembering, ontocontext is a much smaller, more predictable lever. If it isn't, pair it with a vector store — or use mem0. There's a longer comparison in docs/oc_vs_mem0.md.


Install

pip install ontocontext
# or
uv add ontocontext

Python 3.10+. Runtime dependencies: requests (used by the bundled OpenRouter extractor; swappable) and rapidfuzz (used by the optional fuzzy-merging matcher; pluggable).

from ontocontext import MemoryStore, extract_facts, filtered_block

Quickstart

The minimum per-turn loop. No museum framing — bring your own ontology.

sequenceDiagram
    autonumber
    actor User
    participant App as Your app
    participant Ext as extract_facts<br/>(extractor LLM)
    participant Mem as MemoryStore
    participant Chat as Chat LLM

    User->>App: user_msg
    App->>Ext: user_msg + ontology
    Ext-->>App: facts: [{concept, value,<br/>polarity, evidence}]
    loop for each fact
        App->>Mem: upsert(concept, value, polarity, evidence)
    end
    App->>Mem: tick() — decay all facts,<br/>prune below threshold
    Mem-->>App: context_block()
    App->>Chat: system_prompt(memory) + user_msg
    Chat-->>App: bot reply
    App-->>User: bot reply
Loading

Two LLM calls per turn: one cheap, structured extraction call (steps 2–3) and one chat completion (steps 8–9). Everything between is deterministic.

import json
from ontocontext import MemoryStore, extract_facts

with open("your_ontology.json") as f:
    ontology = json.load(f)

memory = MemoryStore(ontology)

def handle_turn(user_msg: str, llm_call) -> str:
    # 1. Extract ontology-mapped facts from the user turn.
    facts = extract_facts(
        user_msg=user_msg,
        bot_msg=None,                 # optionally pass last assistant turn for confirmations
        ontology=ontology,
        model="openai/gpt-4o-mini",   # any OpenRouter model
    )

    # 2. Upsert. The store handles negation, capacity/eviction, and reinforcement.
    for f in facts:
        memory.upsert(
            concept=f["concept"],
            value=f["value"],
            polarity=f["polarity"],
            evidence=f["evidence"],
        )

    # 3. Decay everything one turn; prune below threshold.
    memory.tick()

    # 4. Inject the current memory block into your agent's system prompt.
    system_prompt = (
        "You are <agent role>.\n\n"
        f"What we know about the user:\n{memory.context_block()}"
    )

    # 5. Hand off to whatever LLM you use.
    return llm_call(system_prompt, user_msg)

That's the whole integration. MemoryStore is pure in-process state — serialize it yourself if sessions need to survive restarts (see Persistence).


Public API

from ontocontext import (
    MemoryStore,            # tiered fact store with per-turn decay
    Fact,                   # dataclass for one stored fact
    extract_facts,          # closed-world fact extractor (OpenRouter)
    filtered_block,         # per-agent context view, filtered by concept/class
    build_concept_catalog,  # render ontology as extractor catalog text
    EXTRACTOR_SYSTEM,       # extractor system-prompt template with {catalog} slot
    DECAY,                  # dict: per-class decay multipliers
    PRUNE_THRESHOLD,        # float: salience floor for non-permanent facts
)

All other symbols (modules, helpers prefixed with _) are internal and may change without notice.


Persistence classes

Class Per-turn decay Examples
permanent 1.00 (none) disability, language preference, allergy
long_term 0.99 preferences, repeated interests
session 0.85 "with my daughter today", "I have 1 hour"
ephemeral 0.55 "I need a coffee", transient questions

Below salience = 0.10, non-permanent facts are pruned. Permanent facts are removed only by explicit negation.

stateDiagram-v2
    direction LR
    [*] --> alive : upsert() — salience = salience_weight
    alive --> alive : tick() — salience × decay
    alive --> alive : re-mention — salience + bump
    alive --> pruned : salience < 0.10
    alive --> pruned : polarity negated
    alive --> evicted : concept over cardinality
    pruned --> [*]
    evicted --> [*]

    note right of alive
        permanent ×1.00 — never pruned automatically
        long_term ×0.99 — ~230 turns to threshold
        session   ×0.85 —  ~14 turns to threshold
        ephemeral ×0.55 —   ~4 turns to threshold
        eviction — recency drops the oldest,
                   salience drops the weakest
    end note
Loading

Defining your ontology

The ontology is the contract between the extractor (what it is allowed to emit) and the memory store (how each fact decays and updates). Concept IDs are arbitrary strings; convention is Domain.Subtype.

flowchart LR
    O[("ontology.json<br/>concepts + metadata")]
    U([user message]) --> E
    E[extract_facts<br/>LLM extractor]
    S[MemoryStore<br/>upsert + tick]
    CB[/context_block /<br/>filtered_block/]

    O -. "concept catalog<br/>(label, examples)" .-> E
    O -. "per-concept rules<br/>(persistence_class,<br/>salience_weight,<br/>cardinality, eviction)" .-> S
    E -- "facts:<br/>concept, value,<br/>polarity, evidence" --> S
    S --> CB

    classDef contract fill:#fef3c7,stroke:#d97706,stroke-width:2px
    class O contract
Loading

The ontology sits between the two consumers as a single source of truth: the extractor reads it to know what it's allowed to emit, and the store reads it to know how each emitted fact should age and update. If a concept isn't in the ontology, the extractor can't emit it, and the store wouldn't know how to handle it if it did.

{
  "version": "0.2",
  "description": "Closed ontology for <your domain>",
  "concepts": {
    "Domain.SomeConcept": {
      "label": "Human-readable name shown in the LLM prompt",
      "examples": ["I want X", "give me Y"],
      "persistence_class": "permanent | long_term | session | ephemeral",
      "salience_weight": 0.0,
      "cardinality": 1,
      "eviction": "recency | salience"
    }
  }
}
Field What it controls
label Text rendered in the context block injected into your prompt. Write it for the LLM, not the developer.
examples Few-shot anchors handed to the extractor. 2–3 short, varied examples are enough.
persistence_class Which decay tier the fact lives in — when it fades.
salience_weight Starting salience when first asserted; also caps reinforcement. Higher = survives more decay turns.
cardinality How many values may coexist under this concept: a positive int (1, 5, …) or "unlimited".
eviction When the concept is over capacity, which fact loses: recency (drop the oldest) or salience (drop the weakest).

Memory shape: three orthogonal axes

Changed in 0.2.0. Memory shape is declared on three independent axes. The old update_policy field is deprecated but still honoured — see Migrating.

  • persistence_class — the time axis: how fast a fact decays. Unchanged.
  • cardinality — the count axis: how many values may live under one concept at once. 1 keeps a single value; an integer K keeps a bounded set; "unlimited" accumulates.
  • eviction — the conflict axis: when a concept exceeds its cardinality, recency drops the oldest fact and salience drops the weakest.

Two worked concepts:

concept persistence cardinality eviction behaviour
Position.CurrentArea session 1 recency only the latest place; a new area overwrites the old
ArtInterest.Medium long_term 8 salience up to eight interests; the weakest is dropped when a ninth arrives

cardinality: 1, eviction: recency is exactly the old superseded behaviour — now expressible alongside bounded plurality ("keep the strongest 5"), which update_policy could not express.

Migrating from update_policy (deprecated)

Concepts that still declare update_policy keep working and emit a DeprecationWarning at MemoryStore construction. Replace it with explicit cardinality/eviction using this mapping:

legacy update_policy cardinality eviction
superseded 1 recency
mutable "unlimited" salience
monotonic "unlimited" salience

(monotonic and mutable behaved identically in practice — both accumulated and never evicted — so both map to unlimited + salience.)

Heuristics for assigning a persistence class:

  • permanent — things you must not forget mid-session: disabilities, language preference, allergies, account tier, jurisdiction.
  • long_term — soft preferences worth carrying forward across turns and ideally across sessions: favorite styles, repeated requests, named entities the user cares about.
  • session — bounds of this conversation: time budget, current goal, who's in the room.
  • ephemeral — should vanish in a few turns if not re-mentioned: immediate needs, transient mood, just-asked questions.

When in doubt, choose the shorter lifetime. Over-retention causes prompt bloat and stale recommendations; under-retention is recoverable because the user usually re-states what matters.

A complete worked ontology lives in examples/museum_chatbot/ontology.json.


Swapping the LLM provider

extract_facts() is a thin wrapper over OpenRouter. To swap, copy it and replace just the HTTP block. The system prompt and catalog builder are part of the public API:

import json
from ontocontext import build_concept_catalog, EXTRACTOR_SYSTEM

def extract_facts_anthropic(user_msg, bot_msg, ontology, client):
    system_prompt = EXTRACTOR_SYSTEM.format(catalog=build_concept_catalog(ontology))
    convo = f"USER: {user_msg}" + (f"\nASSISTANT: {bot_msg}" if bot_msg else "")
    resp = client.messages.create(
        model="claude-haiku-4-5-20251001",
        system=system_prompt,
        messages=[{"role": "user", "content": convo}],
        temperature=0,
        max_tokens=512,
    )
    data = json.loads(resp.content[0].text)
    # Keep the defensive validation from the bundled extractor verbatim:
    return [f for f in data.get("facts", [])
            if isinstance(f, dict)
            and f.get("concept") in ontology["concepts"]
            and f.get("polarity") in ("asserted", "negated")
            and f.get("value") and f.get("evidence")]

The validation block is provider-agnostic and load-bearing — keep it whatever LLM you use.


Multi-agent / subagent patterns

memory.context_block() returns all surviving facts as one string. For a multi-agent system you typically want each subagent to see only the slice it needs. Three patterns work today.

Pattern A — Shared store, full context to every agent

Simplest. The orchestrator runs extraction once per turn, then every subagent receives the same context block. Works fine when agents are few and most concepts are cross-cutting.

shared_memory = MemoryStore(ontology)
# orchestrator does extract → upsert → tick once
context = shared_memory.context_block()

accessibility_agent.run(system=ACCESS_PROMPT + context, user=user_msg)
wayfinding_agent.run(system=WAY_PROMPT + context, user=user_msg)
flowchart TB
    U([user turn]) --> EX[extract_facts]
    EX --> MS[("MemoryStore\n(shared)")]
    MS --> CB["context_block\n(all facts)"]
    CB --> A1[agent 1]
    CB --> A2[agent 2]
    CB --> A3[agent 3]
Loading

Downside: prompt bloat scales with subagent count, and small models get distracted by irrelevant facts.

Pattern B — Memory subscriptions per agent (recommended)

Think of each subagent as subscribing to specific kinds of memories. A wayfinding agent subscribes to SpecialNeed.*. A recommendations agent subscribes to ArtInterest.* and VisitPlan.*. The subscription declaration sits with the agent definition, so the agent structurally cannot answer without seeing its relevant memory slice — no prompt wording required to remind the LLM.

This matters more than it looks. Consider a wheelchair user who asks "where is the bathroom?" In a single-agent setup (Pattern A) the LLM has the wheelchair fact in context but has to remember to use it on every directions query, which LLMs do inconsistently. With a subscribed wayfinding agent, SpecialNeed.MobilityImpairment is always in the slice that agent sees — the cross-reference is forced by the architecture, not requested by the prompt.

Use the shipped filtered_block helper to render an agent's subscription:

from ontocontext import filtered_block

wayfinding_agent.run(
    system=WAY_PROMPT + filtered_block(memory, concept_prefixes=["SpecialNeed.", "ImmediateNeed."]),
    user=user_msg,
)
recommendations_agent.run(
    system=REC_PROMPT + filtered_block(memory, concept_prefixes=["ArtInterest.", "VisitPlan."]),
    user=user_msg,
)
accessibility_agent.run(
    system=ACCESS_PROMPT + filtered_block(memory, concept_prefixes=["SpecialNeed."]),
    user=user_msg,
)

Single source of truth (one extraction pass, one decay clock) with role-relevant slices. Privacy bonus: the recommendations agent never sees disability disclosures it doesn't need.

A clean way to make subscriptions explicit is to bind them to the agent definition itself:

AGENTS = {
    "wayfinding":      {"prefixes": ["SpecialNeed.", "ImmediateNeed."], "prompt": WAY_PROMPT},
    "accessibility":   {"prefixes": ["SpecialNeed."],                   "prompt": ACCESS_PROMPT},
    "recommendations": {"prefixes": ["ArtInterest.", "VisitPlan."],     "prompt": REC_PROMPT},
    "immediate":       {"classes":  ["ephemeral"],                      "prompt": NEED_PROMPT},
}

for name, cfg in AGENTS.items():
    block = filtered_block(memory,
                           concept_prefixes=cfg.get("prefixes"),
                           classes=cfg.get("classes"))
    dispatch(name, system=cfg["prompt"] + block, user=user_msg)

The dict is the subscription contract: changing what an agent "knows about the user" is a one-line edit, reviewable in a PR, and impossible to forget in a prompt.

flowchart TB
    U([user turn]) --> EX["extract_facts\n(one pass)"]
    EX --> MS[("MemoryStore\n(shared)")]

    MS --> FB1["filtered_block\nSpecialNeed.*, ImmediateNeed.*"]
    MS --> FB2["filtered_block\nArtInterest.*, VisitPlan.*"]
    MS --> FB3["filtered_block\nSpecialNeed.*"]

    FB1 --> WA[wayfinding_agent]
    FB2 --> RA[recommendations_agent]
    FB3 --> AA[accessibility_agent]

    classDef filter fill:#ede9fe,stroke:#7c3aed,stroke-width:1px
    class FB1,FB2,FB3 filter
Loading

Pattern C — Per-agent stores with separate ontologies

For fully independent subagents (different domains, different LLMs, different lifecycles) give each its own MemoryStore and its own ontology fragment. Run extraction once per agent. More expensive, maximally isolated. Reach for it only when state really must not cross.

flowchart TB
    U([user turn])

    U --> EX1["extract_facts\n(ontology A)"]
    U --> EX2["extract_facts\n(ontology B)"]

    EX1 --> MS1[("MemoryStore A\n+ ontology A")]
    EX2 --> MS2[("MemoryStore B\n+ ontology B")]

    MS1 --> AG1[agent A]
    MS2 --> AG2[agent B]
Loading

Persistence across sessions

MemoryStore.facts is a dict[str, Fact] of plain dataclasses, so serialization is one line per direction.

from dataclasses import asdict
from ontocontext import Fact, MemoryStore

# Save (e.g. on session end)
snapshot = {
    "turn": memory.turn,
    "facts": [asdict(f) for f in memory.facts.values()
              if f.persistence_class in ("permanent", "long_term")],
}

# Load (e.g. on session start)
memory = MemoryStore(ontology)
memory.turn = snapshot["turn"]
for d in snapshot["facts"]:
    f = Fact(**d)
    memory.facts[f.id] = f
flowchart LR
    subgraph SN["Session N"]
        M1[("MemoryStore")]
    end

    SNAP["snapshot\n(permanent + long_term only)"]

    subgraph SN1["Session N+1"]
        M2[("MemoryStore\n(restored)")]
    end

    GONE(["session + ephemeral\ndropped intentionally"])

    M1 -->|"asdict()"| SNAP
    M1 -.->|"not saved"| GONE
    SNAP -->|"Fact(**d)"| M2

    classDef dropped fill:#fee2e2,stroke:#dc2626,stroke-width:1px
    class GONE dropped
Loading

Round-tripping session and ephemeral facts across sessions is almost always wrong — those tiers exist precisely because they should not survive.


Tuning knobs

The library reads two values from code:

Knob Where Effect
DECAY ontocontext.memory.DECAY Per-turn multiplier per persistence class. Lower = faster forgetting.
PRUNE_THRESHOLD ontocontext.memory.PRUNE_THRESHOLD (also reads PRUNE_THRESHOLD env var) Salience floor below which non-permanent facts are dropped.
salience_weight per-concept in your ontology JSON Starting salience and reinforcement ceiling.
Reinforcement bump memory.py (0.3 * base) How much a re-mention raises salience.

Decay defaults are tuned for one fact extraction per conversational turn. If your app extracts per websocket message, per token, or on a timer, multiply decay factors closer to 1.0 or you will prune everything.

The bundled museum example also reads CHAT_MODEL, EXTRACTOR_MODEL, and HISTORY_SIZE from the environment, but those are conventions of the example, not the library.

Fuzzy value matching (paraphrase merging)

By default, MemoryStore dedupes facts within a concept by exact lower-stripped string equality. That means value="photography" on turn 1 and value="photos" on turn 5 produce two facts under the same ArtInterest.Style concept, when you probably wanted one reinforced fact. Two kwargs on MemoryStore.__init__ enable optional fuzzy merging:

from ontocontext import MemoryStore

# Off by default: strict exact match, current behavior.
memory = MemoryStore(ontology)

# Opt in: rapidfuzz WRatio with a sane starting threshold.
memory = MemoryStore(ontology, similarity_threshold=0.80)
# Now "photography" + "photos" → 1 fact (reinforced)
#     "Pablo Picasso" + "Picasso" → 1 fact
#     "Monet" + "Picasso" → 2 facts (correctly rejected)
Knob Default Effect
similarity_threshold 1.0 Cutoff in [0, 1]. 1.0 = strict exact match; < 1.0 enables fuzzy merging within a concept. ~0.80 works well with the default matcher.
similarity_fn None Optional (a, b) -> float callable. None uses rapidfuzz.fuzz.WRatio on lower-stripped strings.

Worked example: same five turns, with and without fuzzy merging

Suppose the extractor sees a museum visitor talk about photography and Picasso across a session, naturally varying the phrasing turn over turn. With a mutable ArtInterest.Style concept (one user can have several styles), each new value is normally treated as a new style. That's the right semantics for genuinely different styles — but morphological variants of the same one should be a single reinforced fact.

The snippet below runs both configurations against the same input and prints memory.dump() after each:

from ontocontext import MemoryStore

ontology = {"concepts": {"ArtInterest.Style": {
    "label": "Art style or medium",
    "examples": ["I love X", "I'm into X"],
    "persistence_class": "long_term",
    "salience_weight": 0.7,
    "cardinality": "unlimited",
    "eviction": "salience",
}}}

# Values the extractor emits over five conversational turns:
turns = [
    ("photography",   "I really love photography, especially mid-century stuff."),
    ("Picasso",       "Picasso is incredible."),
    ("photos",        "Honestly, I just adore old photos."),
    ("Pablo Picasso", "Anything by Pablo Picasso, really."),
    ("photographs",   "These vintage photographs are amazing."),
]

def run(store):
    for value, evidence in turns:
        store.upsert("ArtInterest.Style", value, "asserted", evidence)
        store.tick()
    return store.dump()

print(run(MemoryStore(ontology)))                              # default
print(run(MemoryStore(ontology, similarity_threshold=0.80)))   # fuzzy

Default (similarity_threshold=1.0) — five separate facts, each at low salience because none reinforced the others:

(turn 5) 5 fact(s):
  • [ArtInterest.Style] photographs (sal=0.69, long_term)    [from: "These vintage photographs are amazing."]
  • [ArtInterest.Style] Pablo Picasso (sal=0.69, long_term)  [from: "Anything by Pablo Picasso, really."]
  • [ArtInterest.Style] photos (sal=0.68, long_term)         [from: "Honestly, I just adore old photos."]
  • [ArtInterest.Style] Picasso (sal=0.67, long_term)        [from: "Picasso is incredible."]
  • [ArtInterest.Style] photography (sal=0.67, long_term)    [from: "I really love photography, especially mid-century stuff."]

Fuzzy (similarity_threshold=0.80) — two reinforced canonical facts. "photography" absorbed "photos" and "photographs" (now near-ceiling salience); "Picasso" absorbed "Pablo Picasso":

(turn 5) 2 fact(s):
  • [ArtInterest.Style] photography (sal=0.99, long_term)  [from: "These vintage photographs are amazing."]
  • [ArtInterest.Style] Picasso (sal=0.88, long_term)      [from: "Anything by Pablo Picasso, really."]

Three things worth noticing:

  1. The original value survives. "photography" was asserted first, so it stays — even though the most recent mention used "photographs". This keeps the context block stable across turns.
  2. The latest evidence wins. The audit string updates on every reinforcement so you can trace back to the most recent supporting utterance.
  3. Salience accumulates correctly. "photography" was reinforced three times in five turns and is now at ceiling, exactly the signal a downstream agent needs.

The first canonical form wins — when "photos" reinforces an existing "photography", salience and evidence update but the stored value stays "photography". Negation also goes through the fuzzy match: upsert(..., "photos", polarity="negated") removes the stored "photography" fact if it's within threshold.

Embedding-based merging without changing the library. Swap similarity_fn for any callable that returns a [0, 1] similarity score — sentence-transformers, OpenAI embeddings, Qwen3 cosine, your own model. The library never sees the embedding stack:

from sentence_transformers import SentenceTransformer
import numpy as np

model = SentenceTransformer("all-MiniLM-L6-v2")
cache: dict[str, np.ndarray] = {}

def embed_cosine(a: str, b: str) -> float:
    for s in (a, b):
        if s not in cache:
            cache[s] = model.encode(s, normalize_embeddings=True)
    return float(np.dot(cache[a], cache[b]))

memory = MemoryStore(ontology, similarity_threshold=0.70, similarity_fn=embed_cosine)

This is the documented extension point for true synonym handling (car/automobile, cross-lingual paraphrases) — the bundled rapidfuzz default catches character-overlap cases (morphology, typos, name partials) at zero install cost.


What this system is not

  • Not a vector store. Retrieval is "everything currently above threshold," not query-conditioned. If a user asks "where can I get one?" three turns after mentioning coffee, the coffee fact must still be alive on its own salience — there is no semantic recall of pruned facts. Pair with a vector store if you need that.
  • Not persistent by default. State lives in process memory; serialize yourself.
  • Not multi-user. One MemoryStore instance per user/conversation. User isolation is your orchestrator's job.
  • Not concurrency-safe. No locks; wrap with your own if you have async writers.
  • Closed-world by design. The extractor will silently drop anything outside the ontology. This is the feature, not a bug — but it means evolving the ontology is a deliberate, reviewed act.

Example: museum chatbot

A complete reference integration lives in examples/museum_chatbot/ — a CLI chatbot that greets visitors at a modern art museum and selectively remembers what matters (special needs, art interests, time budgets, immediate needs).

# from the repo root
cp .secrets.example .secrets        # paste your OPENROUTER_API_KEY
uv sync
make run
you> Hi! I'm visiting today, and I should mention I'm deaf.
you> /memory
you> I really love photography, especially mid-century stuff.
you> I only have about an hour.
you> I could really use a coffee.
you> /memory
you> Where can I have one?
you> /memory                  # coffee should already be decaying
you> [...a few more turns...]
you> /memory                  # coffee is gone, deafness and photography remain

Read examples/museum_chatbot/README.md for the full walkthrough. Nothing in examples/ is imported by the library; you do not need any of it to use ontocontext.


Contributing

Issues and PRs are very welcome — especially:

  • New example integrations (a different domain, a different LLM provider, a multi-agent orchestrator).
  • Ontology evolution tooling: scripts that mine production logs for unanticipated concepts and propose ontology extensions.
  • Persistence adapters (SQLite, Redis, Postgres) for the permanent and long_term tiers across sessions.
  • A second extraction pass on bot replies to capture clarifications and confirmations.
  • Better decay calibration for non-per-turn applications (token-based, time-based).

Open an issue to discuss design changes before sending a PR, or just open a PR for bug fixes, doc improvements, and obvious wins. If you build something on top of ontocontext, drop a link — I'd love to see it.


Roadmap

  • Per-user persistence layer (SQLite) for the permanent and long_term tiers.
  • Optional vector index over fact value strings for query-conditioned retrieval (so "where can I have one?" lights up the still-warm coffee fact even if it weren't already in the always-on block).
  • A /forget API and a "what do you remember about me?" affordance — important for trust given that we're storing things like disability status.
  • A second extraction pass on bot replies.

License

MIT — see LICENSE.

About

Ontology-based context memory for LLM agents — tiered salience decay, closed-world fact extraction, multi-agent filtering

Resources

License

Stars

Watchers

Forks

Packages

 
 
 

Contributors