You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Step 2 of the two-step plan in compilation-track-schema-design.md. Generalize compilation_track_artist → library_track so the track model covers all releases (not just V/A), supports a release-identity correction UI for compilations, and accommodates the librarian-artist link as a first-class concept.
Blocked by #792 (step 1 — library_track_identity_source sibling). Do not start until #792 ships and the sibling identity table is exercised in production.
Aligned with the cross-cache-identity architecture pivot (2026-05-09). Per BS#800, Backend doesn't read LML's PG; identity flows in via POST /api/v1/identity/bulk-resolve-libraries. The "non-V/A track data" deliverable below uses bulk-resolve-libraries' kind: single_artist track listings rather than a new direct-write LML script. The legacy match_compilations.py / merge_cta.py scripts retire under pivot Phase 4-5; the per-track matcher work moves into LML (library-metadata-lookup#271).
Why later, not now
Today's compilation_track_artist covers only V/A library rows (~6,300 of 64,676), was machine-populated by an offline LML backfill (not librarian-curated), and uses a unique key on (library_id, artist_name, track_title) — wrong for the future where track_position is the right key but is currently nullable in some rows.
The librarian directive (2026-05-09) opens onto a UI plan we can't reach without restructuring:
Tracklisting view for compilations.
Search artists + tracks across all releases, not just comps.
Release-identity correction UI for compilations — at the release level, not the track level. Two failure modes: (a) LML matched the comp to the wrong upstream release; (b) the correct release isn't captured in any upstream system at all.
Goals 1 and partial 2 work for V/A on current data once #792 lands. Goal 2 for non-V/A and goal 3 require this ticket.
Scope (tracking-level; sub-issues to be carved out closer to execution)
Schema
Rename compilation_track_artist → library_track, OR keep the V/A table and add a parallel library_track for non-V/A. Decide in the design phase.
Add track_artist_id integer REFERENCES wxyc_schema.artists(id) (nullable) for internal canonicalization — the "library-artist link" question from the design doc.
Add track_artist_link_confidence real CHECK (>= 0 AND <= 1) and track_artist_link_method text for the link itself.
Tighten track_position to NOT NULL where data permits; backfill or null-segregate the rest before adding the constraint.
Refresh the CDC trigger (currently cdc_compilation_track_artist from migration 0046) to cover the renamed/replacement table.
Population — non-V/A track data
Under the pivot, non-V/A per-track data flows through bulk-resolve-libraries: extend the response shape so kind: single_artist results carry tracks too (when the caller asks for them). LML matches per-track; Backend writes the response into library_track. The pre-pivot framing of "new direct-write LML script mirroring match_compilations.py" is obsolete.
Cardinality estimate: 64K library rows × N tracks ≈ 500K–1M new track rows.
Backend-side: a one-shot backfill job (cf. jobs/library-canonical-entity-backfill/ or jobs/library-artwork-url-backfill/ patterns) that pages through library rows, calls bulk-resolve-libraries, and UPSERTs the tracks block into library_track (and identity into library_track_identity_source).
library-etl reconciliation
Today's .onConflictDoNothing() keyed on (library_id, artist_name, track_title) (Backend-Service/jobs/library-etl/job.ts:735-743) creates orphan rows when upstream artist_name is corrected. Fine today (no upstream correction UI), real bug once corrections land.
Either grow real upsert semantics keyed on (library_id, track_position) (requires the NOT NULL change above), or detect upstream deletions and propagate.
Decide cascade semantics for library_track_identity_source when the underlying track is deleted vs. when an artist_name is corrected (true delete → cascade is right; correction → identity may still be valid, depends on whether the correction changed the artist).
Reader updates
Backend-Service/apps/backend/services/suggest.service.ts:30-100 reads compilation_track_artist as a secondary autocomplete source. Update to read library_track.
Catalog views in dj-site that display tracklists (don't exist yet; add as part of this work or as a downstream UI ticket).
Any flowsheet → library track resolution that needs the per-track artist link.
Release-identity correction UI
The librarian-facing correction is at the release level, not the track level. When LML matches a comp to a Discogs/MB release, the tracklist is downstream of that release pointer; if the pointer is wrong, the entire tracklist is wrong. Track-level corrections aren't the primary need.
Two failure modes the UI must handle:
Wrong upstream release matched. LML picked Discogs#X for this comp, but Discogs#Y (or a different MBID, or "none of these") is the correct one. The librarian re-points the comp's release-identity pointer; the tracklist + per-track external identity gets re-fetched and replaced wholesale (sidecar rows from [E2] Compilations / Various Artists treatment under library_identity (§3.2 design decision) #792 deleted and re-inserted against the new release).
No upstream release exists. The comp exists in WXYC's catalog but isn't captured in Discogs / MB / any upstream system. The librarian needs to declare this — a positive "looked, doesn't exist, won't exist" state — distinct from "LML hasn't tried yet" and from "LML tried and failed." For these cases the tracklist would have to be librarian-entered (or stay empty); they'd be the only librarian-curated rows in library_track.
Schema implications:
library_identity (the parent library_id's row) needs a state for case 2. Likely a library_identity_source row with method = 'manual_no_upstream' or similar — captures the librarian's positive declaration so the matcher won't keep retrying.
library_track_identity_source (the step-1 sidecar) stays empty for case-2 comps — there's no upstream identity to source.
The CDC trigger and sync interactions need to recognize "release pointer changed" as a wholesale-replace event for the dependent track + identity rows.
Open question: lives in tubafrenzy/webapps/wxycdb (extending the existing JSP catalog) or a new dj-site / BS-served surface? Either path has different sync implications. Decide before implementing.
Tubafrenzy schema
Open question: does step 2 require schema change in tubafrenzy MySQL, or BS-side-only?
BS-only is cleaner but means BS becomes the source of truth for track data, inverting today's tubafrenzy → BS direction.
Tubafrenzy-also keeps the source-of-truth direction but means tubafrenzy needs the same schema and the librarian UI gets the new fields.
The librarian invariants still apply — V/A filing conventions stay invariant from the librarian's perspective. See #792's "Hard constraint" section and the design doc. Specifically:
library.artist_name / library.album_artist rendering for V/A rows is unchanged.
Z-prefixed call letters file as today.
Catalog navigation surfaces bucket-specific shelves, not a conflated __VARIOUS__ page.
The step 1 sidecar (library_track_identity_source) keys on (library_id, track_position, source); step 2 must preserve that keying or migrate the sidecar in lockstep.
End state
library_track (renamed or parallel) covers all library rows.
Per-track artist link to artists, nullable + fuzzy + with confidence/method.
Non-V/A track data populated from a new LML pull.
library-etl handles upstream corrections without orphaning rows.
Release-identity correction UI for compilations exists somewhere (TBD which repo).
Step 1 sidecar continues to work unchanged or is migrated cleanly.
Plan §3.2 in the parent canonicalization doc updated to reflect the post-step-2 model.
Acceptance (tracking-level)
Per-deliverable acceptance lives with each sub-issue when this gets broken down. At the tracking-issue level:
All deliverables above either shipped or explicitly carved out with follow-ups.
No regression in librarian-facing V/A behavior (cf. project_librarian_va_invariant).
Track search across all releases works in dj-site or whichever consumer is the target.
The librarian-correction UI handles the wrong-release-match scenario the design doc names.
Open questions (must resolve before breaking into sub-issues)
Where does the librarian-correction UI live — tubafrenzy wxycdb or a new BS-served / dj-site surface?
Population strategy for non-V/A track data — fold into library-etl or run as a separate one-shot backfill?
library-etl upsert reconciliation: (library_id, track_position)-keyed upsert, or upstream-deletion propagation, or both?
Tubafrenzy MySQL schema impact: also-update or BS-only?
Rename vs parallel-table: structural choice for the schema migration.
Summary
Step 2 of the two-step plan in
compilation-track-schema-design.md. Generalizecompilation_track_artist→library_trackso the track model covers all releases (not just V/A), supports a release-identity correction UI for compilations, and accommodates the librarian-artist link as a first-class concept.Blocked by #792 (step 1 —
library_track_identity_sourcesibling). Do not start until #792 ships and the sibling identity table is exercised in production.Aligned with the cross-cache-identity architecture pivot (2026-05-09). Per BS#800, Backend doesn't read LML's PG; identity flows in via
POST /api/v1/identity/bulk-resolve-libraries. The "non-V/A track data" deliverable below uses bulk-resolve-libraries'kind: single_artisttrack listings rather than a new direct-write LML script. The legacymatch_compilations.py/merge_cta.pyscripts retire under pivot Phase 4-5; the per-track matcher work moves into LML (library-metadata-lookup#271).Why later, not now
Today's
compilation_track_artistcovers only V/A library rows (~6,300 of 64,676), was machine-populated by an offline LML backfill (not librarian-curated), and uses a unique key on(library_id, artist_name, track_title)— wrong for the future wheretrack_positionis the right key but is currently nullable in some rows.The librarian directive (2026-05-09) opens onto a UI plan we can't reach without restructuring:
Goals 1 and partial 2 work for V/A on current data once #792 lands. Goal 2 for non-V/A and goal 3 require this ticket.
Scope (tracking-level; sub-issues to be carved out closer to execution)
Schema
compilation_track_artist→library_track, OR keep the V/A table and add a parallellibrary_trackfor non-V/A. Decide in the design phase.track_artist_id integer REFERENCES wxyc_schema.artists(id)(nullable) for internal canonicalization — the "library-artist link" question from the design doc.track_artist_link_confidence real CHECK (>= 0 AND <= 1)andtrack_artist_link_method textfor the link itself.track_positionto NOT NULL where data permits; backfill or null-segregate the rest before adding the constraint.(library_id, artist_name, track_title). The forward-compatible key — and the one [E2] Compilations / Various Artists treatment under library_identity (§3.2 design decision) #792'slibrary_track_identity_sourcesibling uses — is(library_id, track_position).cdc_compilation_track_artistfrom migration 0046) to cover the renamed/replacement table.Population — non-V/A track data
kind: single_artistresults carrytrackstoo (when the caller asks for them). LML matches per-track; Backend writes the response intolibrary_track. The pre-pivot framing of "new direct-write LML script mirroringmatch_compilations.py" is obsolete.jobs/library-canonical-entity-backfill/orjobs/library-artwork-url-backfill/patterns) that pages through library rows, calls bulk-resolve-libraries, and UPSERTs thetracksblock intolibrary_track(and identity intolibrary_track_identity_source).library-etl reconciliation
.onConflictDoNothing()keyed on(library_id, artist_name, track_title)(Backend-Service/jobs/library-etl/job.ts:735-743) creates orphan rows when upstreamartist_nameis corrected. Fine today (no upstream correction UI), real bug once corrections land.(library_id, track_position)(requires the NOT NULL change above), or detect upstream deletions and propagate.library_track_identity_sourcewhen the underlying track is deleted vs. when anartist_nameis corrected (true delete → cascade is right; correction → identity may still be valid, depends on whether the correction changed the artist).Reader updates
Backend-Service/apps/backend/services/suggest.service.ts:30-100readscompilation_track_artistas a secondary autocomplete source. Update to readlibrary_track.Release-identity correction UI
The librarian-facing correction is at the release level, not the track level. When LML matches a comp to a Discogs/MB release, the tracklist is downstream of that release pointer; if the pointer is wrong, the entire tracklist is wrong. Track-level corrections aren't the primary need.
Two failure modes the UI must handle:
library_track.Schema implications:
library_identity(the parentlibrary_id's row) needs a state for case 2. Likely alibrary_identity_sourcerow withmethod = 'manual_no_upstream'or similar — captures the librarian's positive declaration so the matcher won't keep retrying.library_trackrows for case-2 comps would havetrack_artist_link_method = 'librarian'(vs.'lml_backfill'for the default machine-populated case). The dual-author concern that was framed away under [E2] Compilations / Various Artists treatment under library_identity (§3.2 design decision) #792 returns here, but only for a small minority of rows.library_track_identity_source(the step-1 sidecar) stays empty for case-2 comps — there's no upstream identity to source.Open question: lives in
tubafrenzy/webapps/wxycdb(extending the existing JSP catalog) or a new dj-site / BS-served surface? Either path has different sync implications. Decide before implementing.Tubafrenzy schema
Hard constraints (carried forward from #792)
The librarian invariants still apply — V/A filing conventions stay invariant from the librarian's perspective. See #792's "Hard constraint" section and the design doc. Specifically:
library.artist_name/library.album_artistrendering for V/A rows is unchanged.Z-prefixed call letters file as today.__VARIOUS__page.The step 1 sidecar (
library_track_identity_source) keys on(library_id, track_position, source); step 2 must preserve that keying or migrate the sidecar in lockstep.End state
library_track(renamed or parallel) covers all library rows.artists, nullable + fuzzy + with confidence/method.Acceptance (tracking-level)
Per-deliverable acceptance lives with each sub-issue when this gets broken down. At the tracking-issue level:
project_librarian_va_invariant).Open questions (must resolve before breaking into sub-issues)
wxycdbor a new BS-served / dj-site surface?(library_id, track_position)-keyed upsert, or upstream-deletion propagation, or both?Related
WXYC/wiki/plans/compilation-track-schema-design.md.library_track_identity_sourcesibling).library-hook-canonicalization.md§3.2.tubafrenzy/docs/database-schema.md(LIBRARY_CODE/LIBRARY_RELEASE/COMPILATION_TRACK_ARTIST).Backend-Service/jobs/library-etl/job.ts:701-748— etl.Backend-Service/apps/backend/services/suggest.service.ts:30-100— reader.Backend-Service/shared/database/src/migrations/0037_etl-schema-sync.sql— existing schema.Backend-Service/shared/database/src/migrations/0046_cdc_notify_triggers.sql— CDC trigger.library-metadata-lookup/scripts/match_compilations.pylibrary-metadata-lookup/scripts/merge_cta.py