Goal
Replace the retired source-leg backfill job (jobs/library-identity-backfill/) with a thin consumer of LML's new POST /api/v1/identity/bulk-resolve-libraries endpoint. Per the architecture pivot (#800), Backend stops reading LML's discogs-cache PG directly; it caches LML's verdict via the HTTP contract.
Scope
Build a one-shot ETL job (in the same jobs/ shape as the existing artist-identity-etl) that:
- Selects libraries needing identity refresh:
WHERE library.canonical_entity_id IS NOT NULL OR library.id IN (SELECT library_id FROM library_identity WHERE last_refreshed_at < NOW() - interval '7 days') — incremental, bounded.
- POSTs them to LML in batches of 500 with
(artist_name, album_title) denormalized.
- UPSERTs the response into
library_identity (one row per library_id) and library_identity_source (one row per provenance entry).
- Emits Sentry-traced metrics:
rows_resolved / rows_unresolved / rows_skipped / lml_latency.
What we keep from the retired job
- The
library_identity + library_identity_source schemas (Backend's cache of LML's verdict).
- The
artists table mirror shape (for in-process reads from the request hot path).
- The DRY_RUN env var pattern (locked JSON output for stage/prod parity).
What we remove
jobs/library-identity-backfill/ and its source-leg readers.
- The
BACKFILL_LEG dispatcher.
- All cross-DB connection setup (
DATABASE_URL_DISCOGS).
Acceptance criteria
Sequencing
Blocked by:
Cannot ship until both are in place.
Related
Goal
Replace the retired source-leg backfill job (
jobs/library-identity-backfill/) with a thin consumer of LML's newPOST /api/v1/identity/bulk-resolve-librariesendpoint. Per the architecture pivot (#800), Backend stops reading LML's discogs-cache PG directly; it caches LML's verdict via the HTTP contract.Scope
Build a one-shot ETL job (in the same
jobs/shape as the existingartist-identity-etl) that:WHERE library.canonical_entity_id IS NOT NULL OR library.id IN (SELECT library_id FROM library_identity WHERE last_refreshed_at < NOW() - interval '7 days')— incremental, bounded.(artist_name, album_title)denormalized.library_identity(one row per library_id) andlibrary_identity_source(one row per provenance entry).rows_resolved/rows_unresolved/rows_skipped/lml_latency.What we keep from the retired job
library_identity+library_identity_sourceschemas (Backend's cache of LML's verdict).artiststable mirror shape (for in-process reads from the request hot path).What we remove
jobs/library-identity-backfill/and its source-leg readers.BACKFILL_LEGdispatcher.DATABASE_URL_DISCOGS).Acceptance criteria
jobs/library-identity-consumer/(name TBD) wired intoManual Build & Deploy.POST /api/v1/identity/bulk-resolve-librariesper the v0.7 contract.jobs/library-identity-backfill/deleted in the same PR (no tombstone scripts).Sequencing
Blocked by:
Cannot ship until both are in place.
Related