feat(recall): session-scoped re-injection cooldown (ADR 0023)#6
Merged
Conversation
Stop the recall hook re-injecting the same memory on every on-topic prompt within a session. A memory injected via recall earlier in the session (within 30 min) is now suppressed, keyed off a new session_id carried on each retrieval row. - MemoryStore: additive session_id column + index, recentlyInjectedInSession, recallReinjectionCooldown; recordRetrieval takes an optional sessionID. - Recall hook (engram main) filters confident hits through the cooldown. - engram-eval: replays Resources/sessions.json through the gate + real cooldown and reports redundant re-injection rate (with/without) + first-touch coverage (RetrievalMetrics.SessionInjectionReport). - Migration fix: build the session_id index in addMissingRetrievalColumns (not createSchema) so existing DBs upgrade cleanly instead of failing with "no such column: session_id"; regression test added. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
The Consequences bullet called the session-aware eval metric future work, but this change implements it. Describe what ships (redundant re-injection rate with/without cooldown + first-touch coverage via SessionInjectionReport) and note that concrete numbers live in eval/runs, not the ADR. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…ling The eval sweep shows 0.10 → 0.09 drops the negative false-positive rate 13% → 0% and lifts injection precision 0.47 → 0.54 with unchanged gate recall (93%): the lexical (≥2-token) leg holds recall while the tighter distance leg sheds off-topic injections. Engram's recall is precision-first (the hook runs every prompt, so a false positive bloats context repeatedly while a miss is recoverable), so 0.09 — the knee, where neg-FP hits 0 before recall gives — is the right operating point. - RecallGate.proposed: maxDistance 0.10 → 0.09 (+ rationale comment) - ADR 0021: dated addendum recording the recalibration (decision text unchanged) - engram-eval: finer calib-0.09/0.08 rows + calib-0.08-lex0 (shows recall collapse without the lexical floor); --dump-scores for the ROC/PR analysis - scripts/plot_threshold.py: ROC + precision/recall-vs-threshold plot marking the shipped + legacy thresholds; precision left NaN where nothing is injected - gitignore the regenerable score/threshold dumps + plot png Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…rrent The session-aware metric replayed sessions through .current (the loose legacy gate), inflating the redundancy baseline and contradicting its own 'shipped gate' comment. Use config(forEmbedderSignature:) — the gate the recall hook actually runs — so the numbers reflect production (33→21 injections, redundant 36%→0%). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What
Stops the recall hook from re-injecting the same memory on every on-topic
prompt within a session. A memory injected via recall earlier in the session
(within 30 min) is suppressed, keyed off a new
session_idcarried on eachretrieval row. See ADR 0023.
Changes
MemoryStore— additivesession_idcolumn + index;recentlyInjectedInSession,recallReinjectionCooldown;recordRetrievalgains an optional
sessionID.engramCLI) — filters confident hits through the cooldown.engram-eval— replaysResources/sessions.jsonthrough the gate + thereal cooldown and reports redundant re-injection rate (with/without) and
first-touch coverage (
RetrievalMetrics.SessionInjectionReport).session_idindex is created inaddMissingRetrievalColumns, notcreateSchema, so existing DBs upgradecleanly instead of failing on open with
no such column: session_id.Regression test
migratesPre0023RetrievalsTableMissingSessionIDadded.Also in this PR — precision retune (gate 0.10 → 0.09)
Reviewing gate precision while building the session metric, the contextual
distance ceiling is tightened 0.10 → 0.09: on the eval that drops the
negative false-positive rate 13% → 0% and lifts injection precision
0.47 → 0.54 with unchanged gate recall (93%) — the lexical (≥2-token) leg
holds recall while the tighter distance leg sheds off-topic injections. Recall is
precision-first (the hook runs every prompt, so a false positive bloats context
repeatedly while a miss is recoverable), so 0.09 — the knee, where neg-FP hits 0
before recall gives — is the operating point.
RecallGate.proposedmaxDistance 0.10 → 0.09 (+ rationale); ADR 0021gets a dated addendum (decision text unchanged).
engram-eval— finercalib-0.09/0.08rows +calib-0.08-lex0(showsrecall collapsing without the lexical floor);
--dump-scoresfor ROC/PR.scripts/plot_threshold.py— ROC + precision/recall-vs-threshold plotmarking the shipped + legacy thresholds (regenerable; dumps gitignored).
Verification
swift build(all products incl.engram-eval) clean;make testgreen (100tests, incl. cooldown, session-metric, and the migration regression test).
Notes
eval/runs/*.jsonA/B run records (per the repo's--recordconvention).
stores; it's required for this feature to land safely.
🤖 Generated with Claude Code