Skip to content

feat(recall): session-scoped re-injection cooldown (ADR 0023)#6

Merged
dakl merged 4 commits into
mainfrom
claude/adr-0023-recall-cooldown
Jun 23, 2026
Merged

feat(recall): session-scoped re-injection cooldown (ADR 0023)#6
dakl merged 4 commits into
mainfrom
claude/adr-0023-recall-cooldown

Conversation

@dakl

@dakl dakl commented Jun 23, 2026

Copy link
Copy Markdown
Owner

What

Stops the recall hook from re-injecting the same memory on every on-topic
prompt within a session. A memory injected via recall earlier in the session
(within 30 min) is suppressed, keyed off a new session_id carried on each
retrieval row. See ADR 0023.

Changes

  • MemoryStore — additive session_id column + index;
    recentlyInjectedInSession, recallReinjectionCooldown; recordRetrieval
    gains an optional sessionID.
  • Recall hook (engram CLI) — filters confident hits through the cooldown.
  • engram-eval — replays Resources/sessions.json through the gate + the
    real cooldown and reports redundant re-injection rate (with/without) and
    first-touch coverage (RetrievalMetrics.SessionInjectionReport).
  • Migration fix — the session_id index is created in
    addMissingRetrievalColumns, not createSchema, so existing DBs upgrade
    cleanly instead of failing on open with no such column: session_id.
    Regression test migratesPre0023RetrievalsTableMissingSessionID added.

Also in this PR — precision retune (gate 0.10 → 0.09)

Reviewing gate precision while building the session metric, the contextual
distance ceiling is tightened 0.10 → 0.09: on the eval that drops the
negative false-positive rate 13% → 0% and lifts injection precision
0.47 → 0.54 with unchanged gate recall (93%) — the lexical (≥2-token) leg
holds recall while the tighter distance leg sheds off-topic injections. Recall is
precision-first (the hook runs every prompt, so a false positive bloats context
repeatedly while a miss is recoverable), so 0.09 — the knee, where neg-FP hits 0
before recall gives — is the operating point.

  • RecallGate.proposed maxDistance 0.10 → 0.09 (+ rationale); ADR 0021
    gets a dated addendum (decision text unchanged).
  • engram-eval — finer calib-0.09/0.08 rows + calib-0.08-lex0 (shows
    recall collapsing without the lexical floor); --dump-scores for ROC/PR.
  • scripts/plot_threshold.py — ROC + precision/recall-vs-threshold plot
    marking the shipped + legacy thresholds (regenerable; dumps gitignored).

Verification

  • swift build (all products incl. engram-eval) clean; make test green (100
    tests, incl. cooldown, session-metric, and the migration regression test).

Notes

  • Includes eval/runs/*.json A/B run records (per the repo's --record
    convention).
  • The migration fix is the piece that was breaking app/CLI startup on existing
    stores; it's required for this feature to land safely.

🤖 Generated with Claude Code

dakl and others added 4 commits June 23, 2026 21:41
Stop the recall hook re-injecting the same memory on every on-topic prompt
within a session. A memory injected via recall earlier in the session (within
30 min) is now suppressed, keyed off a new session_id carried on each
retrieval row.

- MemoryStore: additive session_id column + index, recentlyInjectedInSession,
  recallReinjectionCooldown; recordRetrieval takes an optional sessionID.
- Recall hook (engram main) filters confident hits through the cooldown.
- engram-eval: replays Resources/sessions.json through the gate + real cooldown
  and reports redundant re-injection rate (with/without) + first-touch coverage
  (RetrievalMetrics.SessionInjectionReport).
- Migration fix: build the session_id index in addMissingRetrievalColumns (not
  createSchema) so existing DBs upgrade cleanly instead of failing with
  "no such column: session_id"; regression test added.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
The Consequences bullet called the session-aware eval metric future work, but
this change implements it. Describe what ships (redundant re-injection rate
with/without cooldown + first-touch coverage via SessionInjectionReport) and
note that concrete numbers live in eval/runs, not the ADR.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…ling

The eval sweep shows 0.10 → 0.09 drops the negative false-positive rate 13% → 0%
and lifts injection precision 0.47 → 0.54 with unchanged gate recall (93%): the
lexical (≥2-token) leg holds recall while the tighter distance leg sheds off-topic
injections. Engram's recall is precision-first (the hook runs every prompt, so a
false positive bloats context repeatedly while a miss is recoverable), so 0.09 —
the knee, where neg-FP hits 0 before recall gives — is the right operating point.

- RecallGate.proposed: maxDistance 0.10 → 0.09 (+ rationale comment)
- ADR 0021: dated addendum recording the recalibration (decision text unchanged)
- engram-eval: finer calib-0.09/0.08 rows + calib-0.08-lex0 (shows recall
  collapse without the lexical floor); --dump-scores for the ROC/PR analysis
- scripts/plot_threshold.py: ROC + precision/recall-vs-threshold plot marking the
  shipped + legacy thresholds; precision left NaN where nothing is injected
- gitignore the regenerable score/threshold dumps + plot png

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…rrent

The session-aware metric replayed sessions through .current (the loose legacy
gate), inflating the redundancy baseline and contradicting its own 'shipped gate'
comment. Use config(forEmbedderSignature:) — the gate the recall hook actually
runs — so the numbers reflect production (33→21 injections, redundant 36%→0%).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@dakl dakl merged commit 84f25d9 into main Jun 23, 2026
1 check passed
@dakl dakl deleted the claude/adr-0023-recall-cooldown branch June 23, 2026 20:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant