Skip to content

Releases: cipher813/mnemon

0.7.0rc6

27 May 20:35

Choose a tag to compare

0.7.0rc6 Pre-release
Pre-release

[0.7.0rc6] - 2026-05-27

Phase 2 / 3 salience tier + Phase B / C capture-attention substrate

Substantial roadmap-closure release consolidating the 2026-05-27 sweep.
Every change is gated default-off or operator-explicit — Phase A
capture-attention auto-firing stays default-off pending the re-soak
gate (see "Re-soak prereq" below).

Salience tier

  • Phase 2 promotion signals (#178): new documents.correction_count
    • documents.contradiction_win_count columns + Store.salience_report
    • mnemon salience-report CLI. Bumps on operator correction_of=
      gestures and NLI contradiction-win events respectively.
  • Phase 3 observability (#179): new documents.last_injected_at
    column; Store.list_standing bumps it on every injection event;
    mnemon standing list renders aging table with ⚠ stale marker
    for members not injected in ≥90d.
  • Vault-derived auto-exemplars in scripts/build_standing_set.py
    (#168): --exemplar-source {hybrid, vault, hand-tuned}. Default
    hybrid; samples high-confidence preference/decision/antipattern
    as positives and recent handoffs as negatives.
  • LLM-judge opt-in (#175): --judge anthropic, requires
    ANTHROPIC_API_KEY + pip install anthropic. 4-dim rubric.
    Default --judge embedding unchanged.
  • memory_promote coherence check (#173): post-promote NLI
    bidirectional classification against existing standing-tier
    members; conflicts surface as a warning (not blocking — NLI
    false-neg on numeric updates is a known limitation).

Capture attention

  • Phase B access-count feedback loop (#177):
    documents.access_count now increments on every memory_search
    hit; new Store.attention_report ranks by access × recency;
    mnemon attention-report CLI.
  • Phase C operator-reviewed consolidation (#183):
    Store.find_clusters + mnemon consolidate [--apply <idx>] with
    y/N confirmation gate. Operator-review only per plan invariant.
  • Retroactive contradiction sweep (#180):
    contradiction.sweep_contradictions + mnemon sweep-contradictions [--max-pairs N] [--dry-run]. Closes the save-time miss gap.

memory_save / explicit supersession

  • correction_of is now a structural relation (#171). When set,
    Store.save inserts 'supersedes' (new → target) + bumps the
    target's correction_count. Raises ValueError on missing target.
    memory_save MCP tool exposes the parameter.

mnemon CLI

  • status / search / save honor remote mode (#176). When
    MNEMON_REMOTE_URL is set, routes through call_tool_sync to
    the remote vault. Closes the 2026-05-21 Layer-3 silent-fallback gap.
  • attention-status --strict (#167) exits 1 when boost-rate >
    ceiling — for periodic health-check wiring.

Operator tooling

  • scripts/mnemon_ops.sh (#172): cleanup-test-apps,
    recover-token, restart-machine, vault-stats,
    changelog-extract.
  • scripts/ smoke-test CI (#170): pytest parametrized over
    scripts/*.py --help.

Release engineering

  • TestPyPI integration (#181): mnemon upgrade web --testpypi
    • promote_stable.sh testpublish subcommand. Enables true
      pre-publish validation of candidate code rather than the
      latest-published proxy.
  • promote_stable.sh harness expansion (#182):
    MNEMON_VENV_BIN env-var override; trap destroy retries + stderr
    capture; step-2 remote_url isolation regression test.
  • Drop _fly_dump_vault inline-Python script (#169) —
    mnemon sync push is now the canonical primitive.

Polish + fixes

  • context_surfacing balances dangling ** mid-bold truncation
    (#167).
  • scripts/calibrate_capture_threshold.py --use-fixture falls back
    to .example.json on fresh clones (#167).
  • NLI cache resolution docs in Dockerfile + nli.py (#167).
  • cli.py coverage 62% → 85% (#174).
  • README cross-client recall guidance (#163).

Re-soak prereq

CAPTURE_ATTENTION_ENABLED stays default-off in this rc. Operator-
side workflow to start the Phase A re-soak:

  1. Publish rc6 to PyPI (twine upload).
  2. mnemon upgrade web --app-name mnemon-memory --mnemon-version 0.7.0rc6.
  3. Verify mnemon doctor 7/7 green.
  4. flyctl ssh console -a mnemon-memory -C 'mnemon attention-status'
    — confirm Flag enabled: False, baseline boost-rate.
  5. flyctl secrets set MNEMON_CAPTURE_ATTENTION_ENABLED=true
    starts the re-soak clock.
  6. Soak ≥1 week; pass condition boost_rate ≤ 0.25 per the ROADMAP
    gate.

Suite 875 → 996 across the sweep (+121 tests).

v0.7.0rc4

24 May 14:53
dfb18d2

Choose a tag to compare

v0.7.0rc4 Pre-release
Pre-release

Capture-attention Phase A — activation infrastructure

  • New MNEMON_CAPTURE_ATTENTION_ENABLED env-var override on the
    Phase A feature flag. Mirrors the standing-tier pattern
    (MNEMON_STANDING_TIER_ENABLED) — operators can flip activation on
    Fly via flyctl secrets set without a code change + redeploy, and
    the next save picks it up without restarting the server. Accepts
    1/true/yes/on (truthy) or 0/false/no/off (falsy);
    unset / unrecognized falls back to config.CAPTURE_ATTENTION_ENABLED
    (still default False through soak). New
    store._capture_attention_enabled() helper called at request time
    from Store.save and cli attention-status. 5 new tests.
  • mnemon attention-status now reports the effective flag value
    with the env-var override applied — a Fly secret flip shows up here
    immediately instead of misleading the operator with the unchanged
    config default.

Calibration fixture privacy hardening

  • tests/fixtures/capture_attention_pairs.json is now gitignored.
    PR #153 shipped this path tracked with a placeholder schema —
    intended as a seed, but every operator calibration run overwrites
    it with real vault titles + snippets (personal context, in-flight
    work, etc.) that must not land in a public-repo commit. The
    placeholder schema moves to
    tests/fixtures/capture_attention_pairs.example.json (tracked) so
    future contributors still see the format; the operator output stays
    local-only.

Calibration script fixes (scripts/calibrate_capture_threshold.py)

  • VecStore.get(vec_id) -> np.ndarray | None added — mirrors the
    has / delete single-id shape; returns a defensive copy. The
    calibration script's vs.get(vec_id) call site failed on first
    invocation because the method did not exist. 3 new tests (returns
    vector, missing → None, defensive-copy invariant).
  • Near-neighbor pair sampling replaces uniform-random. The previous
    random sample across a 2510-memory vault produced pair cosines
    clustered at 0.1-0.4 (clearly-different topics) — operator verdicts
    carried no information about whether the threshold cut should be
    0.80 or 0.85. New sampler picks anchors, takes each one's top
    non-self neighbor above cosine 0.55 (well below the lowest
    calibration threshold so edge-negatives survive), and sorts
    descending so the operator tags high-confidence near-dupes first.
    Verified against the 2026-05-24 prod snapshot: 20-pair sample spans
    cosine 0.751-0.999, entirely in the calibration-relevant range.
    Calibration on that snapshot recommended
    CAPTURE_ATTENTION_THRESHOLD = 0.85 — matches the existing default,
    so no config change needed.

v0.7.0rc3 — soak-substrate (test trio + coverage gate)

22 May 17:35
4e459bb

Choose a tag to compare

[0.7.0rc3] - 2026-05-22

Test coverage

  • CI now enforces ≥80% test coverage. pyproject.toml gains
    [tool.coverage.run] + [tool.coverage.report] config with
    fail_under = 80; ci.yml runs pytest --cov so a PR that drops
    coverage below the floor fails the build. Excluded modules
    (dashboard/*, __main__.py, upgrade.py, downgrade.py,
    llm.py) are under-testable-by-design and documented in the
    config — Streamlit UI / entry-point shim / release-engineering
    scripts requiring real Fly+AWS / deprecated optional-LLM module
    the deployed product doesn't use.
  • Current coverage: 86% (suite 850 → 855 passing).
  • README coverage badge added: coverage-86%-brightgreen.
    Static, manually updated on each release (matches the existing
    static-badge pattern for Status / Python / License / MCP).
  • New tests/test_nli.py additions cover: _ensure_loaded HF
    download failure → NLIUnavailableError; _ensure_loaded
    unexpected label-set rejection; prewarm() swallows
    unavailability per acceptable-secondary-observability category;
    classify_pair softmax + input-building path with stubbed session.

CI / release tooling

  • New .github/workflows/ci-server-extras.yml workflow. Installs
    mnemon-memory[server] ONLY (the production-equivalent install
    used by the Fly Docker image) plus pytest as a separate test
    runner, and runs the full suite under that minimal install. Catches
    the failure class that bit memory_check_contradictions on
    2026-05-22 — production code that imports something from [llm] /
    [ui] would pass ci.yml (full [dev] extras installed) but
    fail this workflow. Includes a guard assertion that
    llama-cpp-python is NOT installed under [server] — so a future
    PR can't accidentally move it across without flipping the
    intentional "mnemon is LLM-free by default" posture.

  • scripts/promote_stable.sh layer3 --exercise-all-tools. New
    opt-in flag that, after the test Fly app is up but before the
    downgrade step, iterates every registered MCP tool against the
    remote and asserts each returns cleanly (no opaque error envelope,
    no unhandled exception, no NLI/embedder/baked-model breakage).
    Composes with tests/test_tools_integration.py (PR #158, local-
    process Python-level canary): this Fly-level probe catches the
    failure modes the local canary can't see (missing baked models,
    Anthropic MCP proxy timeouts, transport regressions). Tool list
    resolved dynamically from mcp._tool_manager._tools so tools
    added in future PRs are exercised automatically. Adds ~30-60s to
    the layer3 run; opt-in so non-NLI-touching releases aren't taxed.

  • scripts/_layer3_remote_helper.py gains an exercise-all-tools
    subcommand wired through the FastMCP tool manager. Two regression-
    lock tests added to tests/test_promote_stable.sh harness (15
    passing, was 13) covering helper dispatch + flag plumbing through
    the bash dispatcher.

v0.7.0rc1 — salience tier Phase 1 + capture attention Phase A

22 May 15:15
cadd59b

Choose a tag to compare

[0.7.0rc1] - 2026-05-22

Fixes

  • build_standing_set.py exemplar bias — added declarative-posture
    patterns.
    The pre-fix CONSTRAINT_EXEMPLARS list leaned heavily
    imperative ("never," "always," "must," "default to"). Surfaced
    2026-05-22: against the real prod vault the auto-selected top-10
    was 100% engineering rules — career / lifestyle / posture
    constraints spanning multi-year load-bearing facts (runway,
    recruiter posture, start-date framing, job-search mode) did not
    surface despite being equally durable, because the user encodes
    them declaratively ("Brian's stance," "current preference,"
    "passive/selective mode") rather than imperatively. Added 10
    declarative-posture exemplars representing the same constraint
    class in declarative shape. Exemplar list 22 → 30; imperative /
    declarative split now roughly balanced. ROADMAP audit-finding
    follow-up per feedback_audit_findings_become_roadmap_followups.
    Operator should re-run scripts/salience_phase0.sh snapshot && scripts/salience_phase0.sh score to verify the bias fix surfaces
    career-context memories alongside the engineering rules in the
    top-10.

Features

  • Salience tier Phase 1 — first-class standing-context recall
    (default-off, soak-gated).
    Memories explicitly promoted via
    memory_promote are injected into every <mnemon-context>
    envelope on every prompt, regardless of query similarity. The cap
    is the contract: default 15, hard ceiling 20. Plan:
    private/mnemon-salience-tier-plan-260521.md.
    • Reframed validation gate (2026-05-22). Phase 1 IS the
      validation. Earlier plan called for a synthetic A/B against the
      Phase 0 env-var-flagged form before committing to schema +
      tooling. Reframed because the injection mechanism is identical
      between the two forms — an A/B of the gated env-var path
      carries no marginal information once Phase 1 ships gated behind
      STANDING_TIER_ENABLED=false. Operator promotes ~5 career-
      context memories, flips the flag, observes ≥1 week soak for
      runway-style under-weighting recurrence vs absence. Per
      feedback_phase_gated_soak_consumer_must_be_ready: ship the
      substrate gated, flip activation at a separate milestone.
    • Schema migration: documents.tier TEXT NOT NULL DEFAULT 'situational' via _migrate_tier(). Index idx_documents_tier
      on live rows for the cap-count probe + search exclusion filter.
      Additive + harmless if STANDING_TIER_ENABLED stays off.
    • Store.promote_to_standing(id) + demote_to_situational(id)
      • list_standing() + standing_tier_status(). Promote
        raises StandingTierCapReached at the runtime cap,
        StandingTierProvenanceRejected when source_client is in
        STANDING_TIER_BLOCKED_SOURCE_CLIENTS (Layer 4 composition —
        hook-sourced memories cannot be promoted; operator-explicit
        gesture only), and StandingTierError on missing /
        invalidated docs. Idempotent re-promote returns True; demote
        of a situational doc returns False (no-op).
    • Store.search_bm25 + Store.search_vector gain
      include_standing: bool = False keyword param. Standing-tier
      docs excluded from ranked retrieval by default — they're
      injected unconditionally already; ranking them too would
      double-count and crowd the situational signal. Threaded through
      search.search() so the higher-level entry respects the
      invariant.
    • MCP tools: memory_promote(id), memory_demote(id),
      memory_list_standing() — both stdio (server.py) and
      Streamable HTTP (server_remote.py reuses the same mcp
      object). 14 → 17 registered tools.
    • CLI: mnemon standing list / promote <id> / demote <id>.
      mnemon status gains a Standing tier: N/CAP line.
    • build_context integration: when STANDING_TIER_ENABLED
      (config constant OR MNEMON_STANDING_TIER_ENABLED env override,
      accepting 1/true/yes/on), build_context calls
      memory_list_standing via the remote client in a single
      round-trip and renders the result as the "Standing context"
      sub-section ahead of "Situational recall." Phase 0 env-var path
      (MNEMON_STANDING_TIER_FILE → standing.json → standing-rendered.md
      cache) is preserved as fallback so operators retain a
      per-session override mechanism.
    • Composability invariants (all preserved):
      • Layer 0 (is_well_shaped) — capture rejection runs before
        anything reaches the standing-tier promotion path
      • Layer 1 envelope — standing block sits inside the same
        <mnemon-context> data-marking + nonce as situational
      • Layer 4 (HOOK_SOURCE_CONFIDENCE_CEILING + provenance) —
        hook-sourced memories cannot be promoted; explicit
        StandingTierProvenanceRejected rejection
      • rc16 source_key upsert — unchanged; tier orthogonal
      • Capture attention Phase A — recurrence_count accretes
        against canonical situational memories; standing-tier
        promotion is operator-gated on top of that signal
    • Soak gates for flipping default-on: (a) ≥1 week with the
      flag on; (b) observed reduction in runway-style under-weighting
      recurrence on real career-strategy conversations; (c) zero
      spurious-injection complaints from operator review of every
      promoted memory.
    • 22 new tests in tests/test_standing_tier.py covering: promote
      success / cap-rejection (cap=2 in test, 3rd raises) /
      hook-sourced rejection / invalidated rejection / missing rejection
      / cap respects invalidated (freed slot reclaimable) / demote
      round-trip / demote idempotent on situational / demote frees cap
      slot / list_standing ordering + content / search excludes by
      default / search includes when requested / build_context
      flag-off no-fetch / flag-on memory_list_standing call /
      env-var truthy value parsing. Suite 814 → 836 passing
      (test_server_remote.py tool-count assertions bumped 14 → 17).

Schema

  • documents.tier TEXT NOT NULL DEFAULT 'situational'
    additive migration in _migrate_tier() after the existing
    _migrate_recurrence_count. Index idx_documents_tier scoped to
    live rows for cap-count + search-filter queries.

  • Capture attention Phase A — recurrence-weighted memory convergence
    (default-off, soak-gated).
    When a new save's content is semantically
    close to ≥2 prior memories spanning distinct sessions, capture
    attention preserves the new memory + inserts 'restates' relations
    to each cluster member + boosts the canonical neighbor's confidence

    • increments the canonical's new recurrence_count column. The
      cluster of restatements stays discoverable; the load-bearing signal
      accretes on the canonical; MMR diversity at recall naturally
      suppresses near-duplicates without us dropping them at capture.
      Plan: private/mnemon-capture-attention-plan-260522.md. Driver: the
      2026-05-22 finding that load-bearing facts stated across many
      sessions land as fragmented memories rather than a single canonical
      assertion (the operator was implicitly substituting for a missing
      mechanism).
    • SOTA invariant: preserve+relate+boost, never skip-the-save.
      Earlier draft considered "boost canonical + skip the new save"
      as the auto-apply path — rejected because each restatement
      carries different framing and discarding it throws away the very
      signal the recurrence detector is honoring. The institutional
      pattern is preserve the data, link via relations, accrete the
      importance signal — operator-reviewed merge is Phase C of the
      plan, not Phase A's job.
    • Embedding-only (no LLM dependency). Same SOTA-for-public-
      release-constraint logic that drove build_standing_set.py's
      embedding-based scorer (the roadmapped LLM-judge opt-in P2 item
      composes as an advanced mode but isn't required).
    • Feature flag CAPTURE_ATTENTION_ENABLED default-off through
      soak. Two acceptance criteria to flip default-on (per plan
      §"Soak acceptance criteria"): (1) boost_rate ≤ 0.25 over a 7-day
      window measured via mnemon attention-status; (2) ≥80% precision
      on a 20-canonical manual review.
    • correction_of parameter on Store.save() (forward-compat
      for salience-tier Phase 2 promotion signals). When set, capture
      attention is skipped — operator explicit gesture beats automated
      recurrence detection.
    • mnemon attention-status CLI — soak monitor: boost-rate
      ratio over 7 days, recurrence-count distribution, top-10
      canonicals, last-10 'restates' relations audit trail.
    • scripts/calibrate_capture_threshold.py — data-tuned
      threshold selection. Samples N pairs from the operator's vault
      snapshot, prompts for same/different tagging, computes
      precision-recall at {0.70, 0.75, 0.80, 0.85, 0.90}, recommends
      the precision-leaning sweet spot. Persists tagged pairs to
      tests/fixtures/capture_attention_pairs.json for regression
      locking.
    • Failure mode: named exception + WARN swallow. Embedder /
      vecstore unavailability raises CaptureAttentionUnavailableError
      from apply_capture_attention(); Store.save() catches +
      logger.warnings + continues (the new memory is saved; only the
      recurrence-boost side effect is skipped). Acceptable swallow per
      feedback_no_silent_fails category (b) — secondary observability
      hung off a primary save path that records the failure.
    • Composes with the existing layered defenses unchanged.
      Capture attention runs AFTER Layer 0 (is_well_shaped rejects
      scaffolding before the path is reached) + AFTER Layer 4 ceiling
      (HOOK_SOURCE_CONFIDENCE_CEILING clamp survives the boost).
      'restates' is a new relation type — doesn't collide with the
      existing 'supersedes' / 'contradicts' / 'related'.
    • 13 new tests in tests/test_capture_attention.py covering: the
      preserve-everything invariant, feature-flag-off no-behavio...
Read more

v0.6.0

21 May 22:43
ba6714c

Choose a tag to compare

[0.6.0] - 2026-05-21

Release

  • Promotion from 0.6.0rc18 to 0.6.0 stable. Closes the rc cycle
    that ran from 0.6.0rc1 (2026-04-21, the simplification arc →
    two-product split) through 0.6.0rc18 (2026-05-18, the layered
    stored-injection defense). 0.6.0 is rc18 plus several
    upgrade/downgrade-correctness fixes surfaced 2026-05-21 while
    exercising the pre-promote Layer-3 web test for the first time
    (see the Fixes section below).

Fixes

  • mnemon upgrade web now forwards MNEMON_S3_PREFIX and
    MNEMON_VAULT_NAME to the Fly container
    as secrets, so a
    non-default operator override propagates to the container's
    mnemon sync pull seed step. Previously the Fly side fell back
    to sync.S3_PREFIX_DEFAULT (mnemon/vaults) regardless of what
    the operator set locally — which broke the runbook's
    "test against an isolated S3 prefix" ritual. Not user-affecting
    for normal prod redeploys (both sides default identically); only
    affects ad-hoc test deploys where the operator overrides the
    prefix. Surfaced + fixed during the 0.6.0 Layer-3 attempt; 4
    regression tests cover the forwarding contract.

  • mnemon downgrade local now dumps the current Fly vault to S3
    before pulling.
    Previously, downgrade did S3 → local only and
    silently skipped the Fly → S3 step, so any memory added via
    remote between upgrade time and downgrade time was lost — the
    local vault was seeded from a stale S3 snapshot. For ad-hoc
    testing this manifested as "docs added post-upgrade missing
    after downgrade." For prod operators it would have been a quiet,
    severe data-loss bug: weeks of remote-added memories vanishing
    on the first mnemon downgrade local call. Now SSHes into the
    Fly machine and runs mnemon sync push before the local
    mnemon sync pull — mirror of upgrade._fly_seed_vault in the
    opposite direction. New --skip-fly-push flag as an operator
    escape hatch (e.g., when the Fly machine is unreachable);
    default behavior is to fail-loud if the dump SSH errors out,
    rather than silently fall through to the stale-pull data loss.
    5 regression tests cover the call order (fly_dump → s3_pull),
    the override flag, the fail-loud on SSH error, the
    custom-domain skip, and the SSH command shape.

  • mnemon sync push now uses SQLite's online-backup API as the
    canonical cross-host transfer primitive
    (replaces an earlier
    WAL-checkpoint approach that was wrong cross-process). Raw
    aws s3 cp default.sqlite uploads only the main sqlite file's
    bytes — for short-lived CLI processes that's fine because SQLite
    auto-checkpoints WAL on connection close, but for long-running
    mnemon serve-remote the WAL accumulates indefinitely (default
    auto-checkpoint at 1000 pages). The natural-seeming fix —
    PRAGMA wal_checkpoint(TRUNCATE) from a transient connection — is
    silently broken when another process holds the connection open:
    it returns (busy=0, total=0, checkpointed=0), reports success,
    flushes zero frames. Verified against a long-lived holder + 3
    commits: PRAGMA reported success, main file stayed at 8KB (just
    schema), no frames moved; Connection.backup() from the same
    position captured all 3 rows. push() now snapshots via the
    online-backup API to a transient .sqlite.snapshot file beside
    the source, aws s3 cp's the snapshot, then removes it. The
    online-backup API uses SQLite's WAL-aware backup protocol and
    produces a consistent atomic snapshot even with concurrent
    writers. Vec store (default.vec.npz) is a binary numpy file with
    no SQLite semantics — uploaded directly. 6 regression tests
    cover the snapshot helper, the cross-process write capture, the
    source-is-read-only contract, the error-string contract for
    invalid sources, the snapshot-before-cp call order, the transient
    cleanup, and the vec.npz direct-upload path.

  • mnemon downgrade local (_fly_dump_vault) uses SQLite's
    online-backup API directly
    for the version-skew bootstrap.
    When Layer-3 runs Pre-publish validation, the Fly container is
    pinned to the latest-published mnemon (e.g. 0.6.0rc18) which
    predates the sync.push backup-API fix above — so SSHing
    flyctl ssh -C "mnemon sync push" would invoke the older
    broken push. To handle this version-skew, _fly_dump_vault
    SSHes a stdlib Python script that does its own Connection.backup()

    • aws s3 cp of /data/default.sqlite (plus default.vec.npz
      best-effort), independent of installed mnemon version. Once the
      Fly side is reliably on 0.6.0+, this can simplify to
      flyctl ssh -C "mnemon sync push" and rely on the canonical
      primitive — tracked as a follow-up.

    The rc cycle delivered:

    • The simplification arc — mnemon local (stdio + single-file vault)
      and mnemon web (Fly + S3 backup) as one codebase, symmetric
      upgrade web / downgrade local, single source of truth
      invariant. (rc1rc7.)
    • Runtime hardening from rc11-deploy observations: fresh-session
      deadlock fix (json_response=True), _session_creation_lock
      narrowing, periodic expire_old() + decay sweeps in the lifespan
      task, OAuth refresh-token rotation grace, warm-keeper +
      persistent sessions. (rc8rc14.)
    • Auto-mirror discipline: shape gate (is_well_shaped) + confidence
      cap to keep transcript fragments from outranking deliberate
      user-authored memories; upsert-by-slug (source_key) to stop the
      multi-edit duplication pattern. (rc15, rc16.)
    • The five-layer stored-injection defense end-to-end: token defang
      allowlist (Layer 2), capture-time scaffolding rejection (Layer 0
      — root cause), provenance trust-tiering (Layer 4), spotlighting
      data envelope at recall (Layer 1, Claude Code path). (rc17,
      rc18.)

    0.7.0 will open the salience-tier work — separating standing
    constraints (capped, unconditionally injected) from situational
    recall.

v0.6.0rc18

18 May 21:16
e916799

Choose a tag to compare

[0.6.0rc18] - 2026-05-18

Security

  • Layered hardening of the rc17 stored-injection fix. rc17's
    defang_control_markup neutralizes control-plane tokens only at the
    recall boundary, and only for clients/servers running rc17+. A
    weekend-long Claude Desktop conversation still flagged a recalled
    memory as a prompt injection (and escalated to a false "your prompts
    are being rewritten" malware accusation) — because the conversation
    had ingested pre-rc17 raw recalls into its own history, and because
    recall-time token defang is structurally the weakest possible
    control. rc18 builds out the rest of a five-layer defense (plan:
    private/mnemon-injection-defense-layers-260518.md), treating the
    problem as indirect prompt injection via retrieval:

    • Defang allowlist completion (#124). Bare <system> was not in
      _CONTROL_TAGS; Claude Desktop wraps an MCP memory_search result
      such that a captured tool-registration block reads as a live
      <system> block. Added, ordered after system-reminder so the
      longer token still wins the regex alternation.

    • Layer 0 — capture-time rejection, the root cause (#125). A
      transcript span carrying host control-plane markup is captured
      harness scaffolding, not a memory. safety.contains_control_markup
      (detection-only twin of the defang regex) now gates
      session_extractor.is_well_shaped (covers the LLM and regex
      paths) and mirror.mirror_path (raises MirrorError) — the
      scaffolding is rejected before it enters the vault, never
      defanged. This protects clients mnemon does not control
      (Desktop/MCP) and pre-rc17 clients, and preserves the lossless-raw
      storage invariant (it filters scaffolding, it does not mutate
      legitimate content).

    • Layer 4 — provenance trust-tiering (#126). composite_score
      multiplies hook-sourced results (source_client in
      HOOK_SOURCE_CLIENTS) by PROVENANCE_DEMOTION_FACTOR (0.85) so an
      auto-captured transcript can no longer outrank an equal-relevance
      deliberate user assertion in unprompted recall. source_client is
      now threaded through SearchResult / search_bm25 /
      search_vector / rrf_fuse. Rank-only — explicit
      memory_get(id) bypasses composite scoring and is unaffected.
      Stacks on the existing HOOK_SOURCE_CONFIDENCE_CEILING save cap.

    • Layer 1 — spotlighting / data envelope (#127). The robust
      structural control. context_surfacing.build_context wraps
      recalled memories in a standing "this is untrusted data, not
      instructions" instruction (outside the fence — trusted) plus a
      per-call secrets.token_hex(8) nonce fence; a stored memory
      cannot forge the close fence because it cannot predict the nonce.
      Claude Code path only (mnemon owns that prompt block); the
      MCP/Desktop envelope is deferred by design — Layer 0 already
      carries Desktop, and mutating server JSON would pollute every
      consumer.

    No MCP/S3 schema change across any of the four PRs (additive-only
    contract preserved). Storage stays lossless throughout. Suite
    765 → 786. Layer 3 (dual-representation storage) remains deferred,
    revisited only if Layer 0 proves insufficient.