Skip to content

[Epic] E4 — Confidence + provenance instrumentation across LML, Backend, semantic-index #664

@jakebromberg

Description

@jakebromberg

Summary

Add confidence and provenance instrumentation everywhere identity is computed or stored: §3.4 columns on the new library_identity + library_identity_source tables (mandatory confidence + method from day one, retroactively backfilled per §3.4.1's matrix); LML returns confidence per the §3.2.2 write contract; semantic-index's reconciliation_log (53,849 rows, all confidence currently NULL) gets backfilled.

Today, identity decisions vanish without an audit trail. No confidence is stored on reconciliation_log. LML's matcher emits a confidence internally but doesn't always persist it. Reruns silently overwrite without supersedure tracking. This epic plugs every leak.

Scope

Backend-Service

  • Mandatory confidence + method on library_identity (NOT NULL with CHECK).
  • library_identity_history populated on every supersedure with superseded_reason (§3.2.0 retention policy).
  • §3.4.1 confidence matrix locked: >=0.85 → write authoritatively, 0.70-0.85 → write but flag, <0.70 → history-only (don't promote to live).
  • §3.4.1.1 composition rules (Rules 1-6 + 7 worked examples) implemented in the Backend writer's main-row recompute path.

library-metadata-lookup

  • Every /lookup response carries per-source confidence + method (per §3.2.2).
  • entity.identity.reconciliation_log populated on every reconciliation (53,849 existing NULL-confidence rows backfilled where evidence permits).
  • Sentry tracing on cache_stats already in flight (LML#213, BS#646 — both 2026-04-29) — provides runtime observability to validate confidence telemetry.

semantic-index

  • reconciliation_log.confidence no longer permitted NULL on new writes.
  • Existing rows backfilled where matcher state permits; otherwise marked method='inherited' with the source row's confidence.

Dependencies

  • Independent of E1, E2, E3. Can run in parallel with all of them.
  • Coordinates with E2-BS and E2-LML (the contract carries the confidence values).

Existing issues folded in

  • WXYC/library-metadata-lookup#213 — Project cache_stats onto Sentry transaction for E2E tracing → child. Already shipping; provides observability for confidence telemetry.
  • WXYC/Backend-Service#646 — Wrap lookupMetadata in a Sentry span and project LML cache_stats onto it → child. Same wrap-at-chokepoint + project-onto-span pattern.

Phase

Phase 1+: instrumentation lands incrementally as each surface (LML response, BS writer, semantic-index reconciler) gains its column or refactor.

Acceptance

  • Zero NULL confidence rows on any new identity write.
  • §3.4.1 thresholds enforced at write time (sanity-check rejection per §3.2.2).
  • Sentry traces show cache_stats projected onto LML span; reachable from the library-metadata-lookup project's trace explorer.
  • Composition rules tested (per §8.5apps/backend/tests/integration/library-identity-composition.spec.js covering all five Rules + 7 worked examples).

Plan reference

plans/library-hook-canonicalization.md §3.4 (full subsection), §3.4.1 (confidence matrix), §3.4.1.1 (composition rules), §3.2.2 (LML write contract).

Metadata

Metadata

Assignees

No one assigned

    Labels

    concern:observabilitySentry, breadcrumbs, audit reports, time-series viewscross-cache-identityProject tag for the cross-cache-identity initiative (library hook + identity record + normalization)enhancementNew feature or requestepic:e4-instrumentationParent epic E4 — confidence + provenance instrumentationkind:epicParent epic issuephase:1Mojibake phase 1 — fix tubafrenzy

    Type

    No type
    No fields configured for issues without a type.

    Projects

    Status
    Done

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions