justrach · justrach · May 21, 2026 · May 21, 2026 · May 21, 2026
diff --git a/docs/design/mnemon-takeaways.md b/docs/design/mnemon-takeaways.md
@@ -0,0 +1,120 @@
+# Takeaways from mnemon-dev/mnemon
+
+**Reviewed:** 2026-05-21 — [mnemon-dev/mnemon](https://github.com/mnemon-dev/mnemon) at HEAD
+**Author:** justrach (review session notes, not a roadmap commitment)
+
+## What mnemon is
+
+Persistent cross-session memory for LLM agents. Single Go binary + SQLite WAL. Four-graph knowledge store (temporal / entity / causal / semantic), intent-aware recall, importance-with-decay, automatic deduplication. Integrates with Claude Code, Codex, OpenClaw, Nanobot, NanoClaw via a markdown-installable harness.
+
+272 stars, Go, MIT, actively maintained.
+
+**The category is different from codedb.** Mnemon is *agent memory* (insights, decisions, context across sessions). Codedb is *code search* (sub-ms index over a single project's source tree). They're stack-complementary, like ACE in the previous spec at `docs/design/ace-integration.md`.
+
+## The design idea worth stealing
+
+### 1. LLM-Supervised vs LLM-Embedded — same pattern codedb already uses
+
+> "Most memory tools embed their own LLM inside the pipeline. Mnemon takes a different approach: **your host LLM is the supervisor.** The binary handles deterministic computation (storage, graph indexing, search, decay); the LLM makes judgment calls (what to remember, how to link, when to forget). No middleman, no extra inference cost."
+
+This is exactly the shape codedb has: codedb does the deterministic index work (trigram / word / outline / deps), and the agent makes judgment calls about what to query for. The shape is validated — mnemon explicitly contrasts it with Mem0/Letta (LLM-embedded) and Claude Code Memory (file injection).
+
+**Takeaway:** keep the LLM-Supervised pattern as codedb's identifying architecture. Resist the temptation to bake an LLM into codedb (e.g., for the reader.md regeneration loop — leave that to the host agent).
+
+### 2. Intent-native protocol — `remember / link / recall`
+
+Mnemon's three primary verbs are:
+- `remember` (write)
+- `link` (graph edge)
+- `recall` (read)
+
+…not `INSERT`, `UPSERT`, `SELECT`. The argument is that command names should map to the LLM's cognitive vocabulary so the agent can use them without translation.
+
+Codedb has a mix today: `codedb_search` and `codedb_outline` are operation-named; `codedb_callers` and `codedb_context` lean cognition-named.
+
+**Takeaway:** when adding new MCP tools, prefer cognition-named verbs over operation-named ones. E.g. a future "who-calls-this-API-from-outside-this-package" tool should be `codedb_external_callers` (intent) not `codedb_xref_filter` (operation).
+
+### 3. Effective Importance (EI) decay formula
+
+```
+EI = base_weight(importance) × access_factor × decay_factor × edge_factor
+
+base_weight:   imp 5 → 1.0, … 1 → 0.15
+access_factor: max(1.0, log(1 + access_count))
+decay_factor:  0.5 ^ (days_since_access / 30)  — half-life of 30 days
+edge_factor:   1.0 + 0.1 × min(edge_count, 5)  — up to +0.5
+```
+
+Auto-pruning fires at >1000 active insights; immunity for importance ≥4 or access_count ≥3.
+
+**Takeaway for codedb's reader.md staleness model:** today reader.md is binary `ready | stale | missing | malformed`. A graceful-decay analog would be:
+
+```
+freshness = 1.0 × decay(time_since_generation) × structural_match(source_hash_partial)
+
+structural_match: 1.0 if hash exact match, 0.9 if same files but small edits,
+                  0.5 if same files with significant edits, 0.0 if files renamed/removed
+```
+
+This would let reader.md remain "useful but aging" for a while instead of cliff-edging into stale on the first whitespace change. **Not a current priority** — the binary hash is simpler and conservative — but worth keeping in the design folder.
+
+### 4. Hybrid extraction (regex + tech dictionary + LLM-assisted)
+
+For entity extraction (binding insights to common terms like `Qdrant`, `Kubernetes`, `React`), mnemon uses:
+
+1. Regex patterns (CamelCase, ALL_CAPS, file paths, URLs)
+2. A 200+ entry technical dictionary
+3. User-provided `--entities` flag
+4. LLM-assisted causal-edge candidacy
+
+Codedb's `extractContextCandidates` (in `handleContext`) already does (1) via CamelCase / snake_case / quoted-string heuristics. It could borrow (2) — a small technical-term dictionary would catch keywords like `WSGI`, `JIT`, `IPC`, `TLS` that the current pattern misses.
+
+**Takeaway:** consider augmenting the keyword extractor with a small (~100-entry) curated tech dictionary. Cheap, deterministic, no LLM call. File as a P3 enhancement.
+
+### 5. Lifecycle hooks — Prime / Remind / Nudge / Compact
+
+Mnemon installs hooks at four phases:
+
+| phase | trigger | mnemon action |
+|---|---|---|
+| Prime | session start | make skill, guideline, active store visible |
+| Remind | user prompt arrives | decide whether recall could change this task |
+| Nudge | mid-conversation | suggest writing important moments |
+| Compact | before context compression | persist what would be lost |
+
+**Takeaway for codedb:** ship a `codedb hooks install` mode that writes `.claude/hooks.json` entries for:
+- `SessionStart`: print `codedb status` + reader.md staleness summary
+- `Stop`: if reader.md was marked stale during the session, prompt the agent to regenerate before context-compact
+
+Closes critical-review I06 (`codedb_status` doesn't surface reader.md state). Small, concrete, follow-up.
+
+### 6. Skill + Guideline split
+
+Mnemon ships **two** markdown files for agent integration:
+- `SKILL.md` — the commands (what)
+- `GUIDELINE.md` — the judgment (when, why)
+
+The split is intentional: SKILL teaches syntax, GUIDELINE teaches taste. Pasting both into an agent's prompt is the markdown-installable harness.
+
+Codedb has `docs/skills.md` (similar to SKILL.md) but no separate GUIDELINE. A short `docs/guideline.md` could codify things like:
+- When to use `codedb_context` vs `codedb_search`
+- When the reader.md prepend is helping vs noise (and how to tell)
+- How to interpret "stale" hints
+- When to write a new `.codedb/reader.md` vs let the existing one stay
+
+**Takeaway:** add `docs/guideline.md` as a v0.2.5818 follow-up. ~150 lines max.
+
+## What NOT to steal
+
+- **Knowledge graph storage** (the four-graph model). Code is structural — it already has graphs (`codedb_callers`, `codedb_deps`). Adding a temporal/causal/semantic memory graph on top is the wrong shape for a code-search tool.
+- **Auto-pruning + soft-delete**. Codedb's snapshot is a snapshot of the current source tree; "pruning" old code paths doesn't make sense.
+- **The `remember / link / recall` API verbatim**. Codedb doesn't write user-authored facts; the agent doesn't author code memories via codedb. Skip.
+
+## Concrete v0.2.5818 candidates (ranked by ROI)
+
+1. **Lifecycle hooks installer** — `codedb hooks install` writes `.claude/hooks.json` with SessionStart + Stop checks. Closes I06. ~50 LOC + a tiny JSON template. **High value, low risk.**
+2. **`docs/guideline.md`** — separate from skills.md, teaches when/why. **Pure docs.**
+3. **Tech-dictionary keyword extraction** — augment `extractContextCandidates` with a 100-entry dict for terms regex misses. **~30 LOC.**
+4. **Decay-style reader.md staleness** — design only for now; the binary hash protocol is fine for v0. **Design doc, no code.**
+
+None of these are urgent. Tracking them here so the option stays open.
diff --git a/docs/design/perf-context-token-cut-eval.md b/docs/design/perf-context-token-cut-eval.md
@@ -0,0 +1,90 @@
+# perf(context) token-cut — eval against v0.2.5817
+
+**Date:** 2026-05-21 (after token-cut commit `1276bd4` + mnemon doc `858a8d5`)
+**Branch:** `perf/codedb-context-token-cut`
+**Question:** Does the deterministic 49% byte reduction on T1-shape `codedb_context` output translate to fewer agent calls in end-to-end use?
+
+## Deterministic byte count
+
+Same task, same corpus, both binaries:
+
+```
+$ codedb_context "find before_request decorator" /Users/.../flask
+```
+
+| | bytes | approx tokens |
+|---|---:|---:|
+| v0.2.5817 release | 2993 | ~750 |
+| this branch (token-opt) | **1525** | **~380** |
+| Δ | **−1468 B** | **−49%** |
+
+Where the bytes came from: the entire "## Top sites (with ±2 lines of context)" section + 2 entries from "## Most-relevant files." Verified byte-level — the change is deterministic.
+
+## Agent eval (n=3 per task, Sonnet 4.6)
+
+### T1 flask "find before_request decorator" — *gate fires* (3 sym_refs)
+
+| sample | token-opt | main_baseline (earlier eval) |
+|---|---:|---:|
+| A | **5** | 4 |
+| B | **6** | 5 |
+| C | **5** | 5 |
+| **mean** | **5.33** | 4.67 |
+| **median** | **5** | 5 |
+| **best** | **5** | 4 |
+| **worst** | **6** | 5 |
+| **spread (max−min)** | **1** | 1 |
+
+**Reading:** mean is 0.66 calls worse than main, but distribution is tighter (5/6/5 vs 4/5/5 — same spread, both bounded). Median ties. The 49% byte reduction did NOT cause the agent to need more calls — every sample landed at 5±1. This is **at-parity-or-noise**, with a real byte saving.
+
+The earlier "post-callers" eval on `experiment/reader-md` had 4/4/7 (mean 5.0, one wild 7) — the token-opt has tighter variance, which is a positive sign.
+
+### T2 regex "where is a pattern compiled" — *gate does NOT fire* (6+ sym_refs from NFA/DFA matches)
+
+| sample | token-opt |
+|---|---:|
+| A | 19 |
+| C | 16 |
+| mean (n=2) | 17.5 |
+
+Gate doesn't fire (verified by inspecting codedb_context output for T2: sym_refs.items.len = 6, > 3 threshold). Output is byte-identical to v0.2.5817 here, so any variance is pure agent noise. Comparable to v0.2.5817 baseline.
+
+### T3 react "passive effects flush" — *gate does NOT fire* (many useEffect/useLayoutEffect matches)
+
+| sample | token-opt |
+|---|---:|
+| A | 7 |
+| B | 15 |
+| C | 16 |
+| mean | 12.67 |
+| median | 15 |
+
+Same situation as T2 — gate doesn't fire, output identical to v0.2.5817. The wide spread (7 to 16) is the same agent-variance pattern we've seen on T3 across all branches.
+
+## Conclusion
+
+The **−49% byte saving on T1-shape tasks is real and deterministic** (same input → same shorter output). The end-to-end agent eval shows:
+
+- **T1 (where the gate fires)**: at-parity-or-noise with main. Median ties (5=5), mean 0.66 worse but with tighter spread. The cut byte content was redundant on this task shape — the agent didn't need it.
+- **T2/T3 (gate doesn't fire)**: byte-identical to v0.2.5817 → only sampling noise differentiates. Numbers vary as before.
+
+The token cut is a free win on narrow-symbol tasks. For agents on small-context models (Haiku, Sonnet on tight context), this matters more than the n=3 agent-call eval can show — the saved tokens stay in the agent's context window for the rest of the session.
+
+## Correctness
+
+9/9 runs across the matrix returned correct answers (decorator name, file, execution site, function — all matched across every sample). No quality regression.
+
+## Threats to validity
+
+- n=3 is still small. Confidence interval on T1 mean is ±~1 call.
+- Sonnet 4.6 only; no Haiku or Opus comparison.
+- The T2/T3 numbers are essentially measuring agent variance, not the branch — they're "doesn't get worse" sanity checks, not headlines.
+- The 49% byte figure was measured on a single T1 task; other T1-shape tasks in real workloads may see different ratios depending on how many Top sites snippets the composer would have emitted.
+
+## Recommendation
+
+Ship the token cut. It's a deterministic, opt-out-free improvement that:
+- Cuts 49% of output bytes on the most common narrow-lookup task shape
+- Causes no measurable harm at n=3 on the same task
+- Cannot affect wider tasks (gate is symbol-count-conditional)
+- Pairs with the mnemon-takeaways doc to round out PR #491
diff --git a/src/mcp.zig b/src/mcp.zig
@@ -1952,14 +1952,27 @@ fn handleContext(io: std.Io, alloc: std.mem.Allocator, args: *const std.json.Obj
         out.appendSlice(alloc, "\n(no content matches — try codedb_search or codedb_word for narrower queries)\n") catch {};
         return;
     }
+    // Token-efficiency gate: when symbol_definitions already inlined ≥3 bodies
+    // (the inline_bodies branch in the Symbol definitions section), the agent
+    // has the function bodies in-band. "Top sites" snippets then duplicate
+    // information at high token cost. Trim Most-relevant to 3 entries and
+    // skip the snippet body entirely in that case.
+    const have_inline_bodies = sym_refs.items.len > 0 and sym_refs.items.len <= 3;
+    const display_top_n = if (have_inline_bodies) @min(top_n, @as(usize, 3)) else top_n;
     w.print("\n## Most-relevant files\n", .{}) catch {};
-    for (ranked.items[0..top_n]) |f| {
+    for (ranked.items[0..display_top_n]) |f| {
         w.print("- {s}  ({d} matches)\n", .{ f.path, f.hits }) catch {};
     }
+    if (have_inline_bodies) {
+        // Symbol definitions + Callers already give the agent enough — skip
+        // the snippet body to save ~500-1000 tokens per call. This is the
+        // T1 flask shape (3 symbol defs, all with inline bodies).
+        return;
+    }
     w.print("\n## Top sites (with ±2 lines of context)\n", .{}) catch {};
     explorer.mu.lockShared();
     defer explorer.mu.unlockShared();
-    for (ranked.items[0..top_n]) |f| {
+    for (ranked.items[0..display_top_n]) |f| {
         // Fetch full file content once per file, then slice ±2 lines around
         // each hit. Indexed cache hits common files in ~µs; arena owns the
         // dupe so we don't leak.