feat(registry): unify community catalogs with publisher pages#5380
Conversation
|
Preview deployment for your docs. Learn more about Mintlify Previews.
💡 Tip: Enable Workflows to automatically generate PRs for you. |
There was a problem hiding this comment.
Clean. Projects the community-mirror catalog into the existing publishers read model instead of letting /api/registry/mirrors stay a parallel universe — the right architectural call: one provenance lattice (adagents_json > brand_json > community > discovered), not two.
Things I checked
- Self-host takeover protection holds.
publishersON CONFLICT (domain) DO UPDATE guards every column withWHEN publishers.source_type = 'adagents_json' THEN <existing> ELSE EXCLUDED(publisher-db.ts:1280-1313, mirrored in migration 508). The loop readsRETURNING source_typeandcontinues without writing properties when the row is already first-party (publisher-db.ts:1324). A community catalog cannot overwrite, retype, or re-provenance a publisher-attested row.security-reviewerandcode-reviewerboth confirmed isolation. - Migration 508 deletes are scoped. Every DELETE keys on
created_by LIKE 'community_adagents:%'/source_type='community'(508:980-1004). Non-community publisher rows are never touched. ON CONFLICT targets exist (discovered_propertiesUNIQUE(publisher_domain, name, property_type);catalog_identifiersUNIQUE(identifier_type, identifier_value)). No data-loss path. - Transactions are leak-free. community-mirrors.ts PUT (209-238) and DELETE (257-277):
getClient()outside thetry, every early-return (404/409) explicitly ROLLBACKs,finally { client.release() }covers all paths, catch usesROLLBACK().catch(()=>undefined)so it can't mask the original error. No double-release. hasValidAdagentscorrectly excludes community rows (federated-index-db.ts) — community catalogs are valid registry documents but not origin validation, so they must not suppress crawl-on-view or fliphosting.modetoself. Behavior change is deliberate and documented.- Auth gate change is stronger, not weaker. http.ts:2690 swaps an inline
isWebUserAAOAdmin403 for therequireAuth, requireAdminchain (imported http.ts:57, used on ~30 other admin routes).isWebUserAAOAdminis still referenced elsewhere — no orphaned import.security-reviewer: equivalent-or-stronger, no behavior gap. - No XSS surface in the new render path.
firstLogoUrlenforces^https://(registry-api.ts:1740),collectBrandColorsenforces^#[0-9A-Fa-f]{6}$(1731), and publisher-home.html runs brand name/description/industries, format chips, and the hex swatchstyle="background:"throughescapeHtml. Attacker-controlled adagents.jsonparamsreach the chips escaped. - Wire changes are additive. New enum members (
community_catalogondiscovery_method,communityon propertysourceandfiles.adagents_json.status) and new optional fields (registry_url,brand,formats) — no removals, renames, or required flips.ad-tech-protocol-expert: sound-with-caveats, zod/openapi parity clean across every named-schema field. - Test-plan honesty. Validation section runs the two integration suites,
test:migrations,build:openapi, and a Docker e2e with Puppeteer assertions covering the primary user-facing changes (community state, scoped format cards, raw registry URL, no-agent caveat, mobile overflow). No unchecked manual item on the path this PR changes.
Follow-ups (non-blocking — file as issues)
- Changeset declares nothing.
wide-chairs-agree.mdis---\n---— no package, no bump, no summary. It passes the bot (empty changesets are the supported escape hatch) and nostatic/schemas/source/**changed, so it doesn't block. Butad-tech-protocol-expertreads the registry-API additions as a textbookminor(new optional response fields + new response enum members). Confirm "no protocol-spec impact" is the intent; if the registry API counts as versioned surface, declare"adcontextprotocol": minorwith a one-line note. A changeset that declares nothing documents nothing. - Consumer caveat for the new enum members. Buy/sell-side agents that switch-case on
source/discovery_method/statusand treat unknown values as hard errors will choke oncommunity/community_catalog. Additive, but worth a release note so tolerant-reader behavior is the documented expectation. - identifier_type case parity. App insert writes
catalog_identifiers.identifier_typeasnorm.type(possibly mixed-case, publisher-db.ts:1543-1551); migration 508:1159 writeslower(ident->>'type'). The unique constraint lowercases onlyvalue, so a case split between the two write paths can dodge dedup. Lowercaseident.typein the app insert for parity. - Host-header fallback in
publicBaseUrl(registry-api.ts:1749) readsreq.get("host")to buildregistry_url.PUBLIC_BASE_URL/BASE_URLshort-circuits in prod and the handler sets noCache-Control, sosecurity-reviewergraded it Low/self-inflicted. Defense-in-depth: drop the Host branch or pin the canonical origin.
Minor nits (non-blocking)
- No-op self-assignment. migration 508:1135
last_validated = discovered_properties.last_validatedon conflict reads like a mistake; community rows intentionally keep NULL. Comment it or drop it. - Import split. registry-api.ts:1862 leaves
import { AAO_UA_COMPLIANCE }stranded below the new ~170-line helper/type block inserted at L128. Move the new code below the import region.
Approving on the strength of the takeover-protection guard plus leak-free transaction wrapping. Follow-ups noted.
There was a problem hiding this comment.
Approving. The architectural call is right: community catalogs become a source_type='community' projection into the existing publishers read model instead of a parallel /api/registry/mirrors concept, and first-party adagents_json always wins the merge. Fail-closed on first-party data, additive on the wire.
Things I checked
- First-party-wins is load-bearing and correct. Both the backfill (
508_backfill_community_mirror_publisher_projection.sql:1070-1104) and the app path (publisher-db.ts:1280-1313, guard +continueat:1324) preserve every column of asource_type='adagents_json'row viaCASE WHEN publishers.source_type='adagents_json' THEN publishers.X ELSE EXCLUDED.X. Retire (publisher-db.ts:1407-1412) deletes onlysource_type='community'publishers, so a self-host takeover survives. The takeover integration test confirms it (community-mirrors.test.ts:2260). hasValidAdagentsexcludes community rows (federated-index-db.ts:869-882) — now requiresadagents_json IS NOT NULL AND source_type='adagents_json' AND review_status='approved'. Community catalogs can't suppress crawl-on-view or fliphosting.modetoself. src_priority ranks community=2 vs adagents=0.- Transactions don't leak a client. PUT/DELETE in
community-mirrors.ts:209-279pairgetClient()withclient.release()infinallyon every path, ROLLBACK on the 404/409 early returns and the catch. - Dynamic
$NDELETE indexing is correct (publisher-db.ts:1461-1468) —params.length-2/-1/lengthmap to the just-pushed triple; no placeholder desync.security-reviewerindependently confirmed. - Auth change is a tightening.
/api/brands/discoveredswapping the inlineisWebUserAAOAdmincheck forrequireAdmin(http.ts:2566) is a strict superset — same admin assertion plus tenant-scoped cross-org refusal. Not a loosening. - Server is the validating boundary for the new publisher-page data.
firstLogoUrlis https-only (registry-api.ts:1740),collectBrandColorsenforces^#[0-9A-Fa-f]{6}$, andpublisher-home.htmlescapeHtmls brand-controlled strings intosrc=/style=. No stored-XSS, nojavascript:logo. The<img onerror>fallback atpublisher-home.html:562is a code-controlled DOM listener, not an injected attribute. - OpenAPI matches the zod source. The 74-line
registry.yamldiff mirrors the 34-lineregistry.tsdiff field-for-field. Every change is an enum insertion (discovery_method+community_catalog,status+community,source+community) or a new optional field (registry_url,brand,formats[]). No removals, renames, or required↔optional flips — additive to response shapes, minor under tolerant-reader.
Follow-ups (non-blocking — file as issues)
- The changeset is empty.
.changeset/wide-chairs-agree.mdis---\n---— no package bump, no summary, auto-generated name. This isn't a block:static/openapi/registry.yamlis the registry service API, notstatic/schemas/source/**, and the change is purely additive. But the PR adds enum values and response objects to a published API, and the rest of the registry changesets in this repo carry content. Confirm the empty changeset is the intentional "no-bump" marker and not an oversight — if your changeset CI rejects empty entries, this needs a realminorline before merge. seller_preferencedrifts from the canonical format shape.PublisherFormatSummarySchema.seller_preferenceisz.string()(registry.ts), but the canonicalProductFormatDeclaration(static/schemas/source/core/product-format-declaration.json) types it as the enum["preferred","accepted","discouraged"]. It's a derived summary, so this is defensible, but a freeform string invents a looser contract than the wire it summarizes. Tighten to the canonical enum.- A community
catalog_propertiesrow can land with zero identifiers. When a community identifier collides with a verified publisher's,ON CONFLICT (identifier_type, identifier_value) DO NOTHING(508_...sql:1171,publisher-db.ts:1543-1551) correctly refuses the hijack — but thecatalog_propertiesinsert already ran, leaving a property with no resolvable identifier. Not orphaned (retire cleans bycreated_by), but it's dead weight. Consider gating the property insert on identifier insertability.
Minor nits (non-blocking)
last_validatedasymmetry between backfill and PUT. Migration 508 sets thediscovered_propertiesON CONFLICT tolast_validated = discovered_properties.last_validated(preserve) and insertsNULL, whileupsertCommunityCatalogProperty(publisher-db.ts:1501-1509) writesNOW(). Harmless, but first-deploy backfill and subsequent re-publishes will disagree on the timestamp for the same row. Align them or note the intent.
security-reviewer: clean. ad-tech-protocol-expert: sound-with-caveats (additive, no yaml/zod drift; seller_preference is the one shape drift). code-reviewer: first-party guard, transactions, and $N indexing all verified correct.
LGTM. Follow-ups noted above — confirm the empty changeset is deliberate.
There was a problem hiding this comment.
Right architecture — collapsing /api/registry/mirrors into the publishers read model instead of carrying a parallel concept is the correct call, and the source_type-preservation invariant (community writes never clobber an adagents_json row) holds in every CASE branch I checked. But two write paths share an unhandled constraint mismatch that 500s the publish endpoint and can abort the deploy migration. Blocking on that.
MUST FIX
-
Unhandled second unique index on
discovered_properties→ 500 / migration abort.discovered_propertieshas two unique constraints:UNIQUE(publisher_domain, name, property_type)and the partial indexidx_properties_unique_property_id ON (publisher_domain, property_id) WHERE property_id IS NOT NULL(026_discovered_properties.sql:33). Both new write paths —upsertCommunityCatalogProperty(server/src/db/publisher-db.ts:1495) and the backfill INSERT (508_backfill_community_mirror_publisher_projection.sql, theINSERT INTO discovered_properties) — only specifyON CONFLICT (publisher_domain, name, property_type). A community property whose(publisher_domain, property_id)collides with an existing row (a crawler-discovered property already on that domain, or a second platform's mirror naming the sameproperty_id) but whosenamediffers hits the other index, which theON CONFLICTclause does not catch → uncaughtunique_violation. In the PUT path that rolls back the whole transaction and returns 500; in migration 508 it aborts the entire backfill on deploy. Handle both conflict targets, or key the upsert onproperty_idwhen present. -
Backfill canonicalizes
publisher_domaindifferently from the live writer → zombie publisher rows. Migration 508 derives the domain withlower(trim(coalesce(...))). The live PUT/retire path usescanonicalizePublisherDomain(server/src/services/publisher-domain.ts:21), which additionally stripshttp(s)://and trailing//.. That function's own comment calls this exact divergence "a security boundary (writer-vs-validator parity) — two strings that refer to the same publisher MUST canonicalize identically." A manifest withpublisher_domain: "https://x.com/"backfills a row keyedhttps://x.com/; the next PUT/retire operates onx.comand never reconciles it — leaving a public, un-retireable community publisher page. Canonicalize identically in the migration.
Follow-ups (non-blocking — file as issues)
- Merge dedup keys on the first identifier only.
registry-api.tspropertyKeyusesproperty.identifiers?.[0], whose ordering isn't guaranteed across sources. Two distinct properties sharing a leading identifier collapse to one; the same property listed with a different leading identifier in brand.json vs community appears twice. Key onproperty_idwhen present, or on a stable min over all identifiers. - Orphaned
catalog_propertieson identifier collision.catalog_identifiershas a globalUNIQUE(identifier_type, identifier_value)(335_property_catalog.sql:44). The parentcatalog_propertiesrow is inserted unconditionally with a freshproperty_ridbefore theON CONFLICT ... DO NOTHINGidentifier insert, so a property whose identifier is already claimed lands a parent row with zero linked identifiers — unreachable by lookup. Skip the parent insert when the identifier would conflict. - Empty changeset.
.changeset/wide-chairs-agree.mdis---\n---with no summary line. The repo convention (e.g.5013-check-publisher-authorization-cache.md) keeps the empty frontmatter but adds a one-line description. Fill it — this PR addscommunity/community_catalogenum values to a response contract, and downstream maintainers with strict response-enum codegen want that regen signal (ad-tech-protocol-expert: additive-and-sound, but worth the note).
Minor nits (non-blocking)
- Community write overwrites a verified property's identifiers.
upsertCommunityCatalogProperty(publisher-db.ts:1495) preservessource_typeforadagents_json/aao_hostedon(publisher_domain, name, property_type)collision but still overwritesidentifiers,tags, andlast_validated. A moderator who matches the exact triple of a verified property could replace its identifiers (row still renders as publisher-attested). Trusted role + exact-triple knowledge, so low — but guard the identifier/tag overwrite behind the samesource_typeCASE. (security-reviewer: Low.) - Host-header fallback.
publicBaseUrl(registry-api.ts) falls back toreq.get('host')when neitherPUBLIC_BASE_URLnorBASE_URLis set. Production setsBASE_URL, so not exploitable today; drop the fallback so it can't regress into theregistry_urlhref if the env var is ever removed. (security-reviewer: Low.)
Things I checked
hasValidAdagentstightening (AND review_status = 'approved') does not regress existing publisher-attested rows:publishers.review_statusdefaults to'approved'(432_publishers_overlay.sql:37). Confirmed.- source_type-preservation invariant: every
ON CONFLICT (domain) DO UPDATEbranch in bothreplaceCommunityAdagentsCatalogWithClientand migration 508 keeps an existingadagents_jsonrow's columns; community data only fillscommunity/null rows. Holds. - DELETE ordering in migration 508 (catalog_identifiers → catalog_properties → discovered_properties → publishers) respects the only real FK (
catalog_identifiers.property_rid → catalog_properties). /api/brands/discoveredauth change is equivalent-or-stronger:requireAuth, requireAdmin(http.ts:2693) subsumes the old inlineisWebUserAAOAdmin403 (auth.ts:1466). (security-reviewer.)- Public-page XSS surface:
escapeHtmlon every interpolated sink inpublisher-home.html;logo_urlis https-only (firstLogoUrl), colors strict#RRGGBB(collectBrandColors) before thestyle="background:"swatch. (security-reviewer: no unescaped sink.) - Wire shape:
static/openapi/registry.yamlis regenerated in the diff (CItest:openapigate satisfied); enum additions on response fields are forward-compatible. (ad-tech-protocol-expert: additive-and-sound.)
The test plan runs a Docker e2e plus four integration suites; none of them feed a manifest with a repeated property_id, which is the single input that trips both write paths above.
Fix the two MUST FIX items and this is a clean merge.
There was a problem hiding this comment.
Approving. Community catalogs fold into the publishers read model instead of living as a parallel /api/registry/mirrors concept — and first-party attestation strictly wins on every column, which is the invariant this whole change rests on.
Things I checked
- Transaction safety in
community-mirrors.ts(PUT and DELETE): single client acquired once,BEGIN/COMMITon the happy path,ROLLBACKon every early-return (404/409) and in the catch,client.release()infinally. No leaked client, no double-release. (code-reviewer) - First-party clobber guard holds on every column. Both the app-code
publishersupsert (publisher-db.ts:577-611) and migration 508's ON CONFLICT preserve each column whenpublishers.source_type = 'adagents_json'; thediscovered_propertiesupserts guard onsource_type IN ('adagents_json','aao_hosted').publishers.source_typeonly admitsadagents_json|community|enriched(migration 432), soadagents_jsonis the only first-party value and it is protected. Reverse direction is also safe: the crawler write (publisher-db.ts:399) unconditionally promotes a matching row back toadagents_json, so a real origin always reclaims a community-held domain. (security-reviewer) hasValidAdagentstightened to requiresource_type='adagents_json' AND review_status='approved'(federated-index-db.ts) — a community row can never fliphosting.modetoselfor suppress crawl-on-view. Right shape: a registry document is not origin validation.- Migration 508 domain derivation converges with the app path. The SQL
regexp_replace(...)is functionally identical tocanonicalizePublisherDomain(publisher-domain.ts), so an app-code re-publish lands on the samepublishers(domain)rows. DELETE-then-INSERT scoped tocreated_by LIKE 'community_adagents:%',jsonb_agg ... ORDER BY— idempotent and deterministic. - Auth swap on
POST /api/brands/discovered:requireAdminis imported (http.ts:57) and is a superset of the removed inlineisWebUserAAOAdmincheck (auth.ts:1321— same check plusADMIN_EMAILSand the static/WorkOS admin paths). Tightening, not loosening. - zod ↔ OpenAPI agree.
registry.ts(source,status,registry_url,discovery_method, the newbrand/formatsschemas) matchstatic/openapi/registry.yamlfield-by-field and enum-by-enum after regeneration. Every wire change is additive — new optional fields, widened enums — so non-breaking/minor for existing consumers. (ad-tech-protocol-expert: sound-with-caveats) - Frontend injection surface. Community-derived brand
logo_urlis gated to^https://and colors to#RRGGBB, withescapeHtmlon the rendered strings (registry-api.ts:1820+,publisher-home.htmlprofile render) — no XSS via the swatchstyleattr or logosrc.
Follow-ups (non-blocking — file as issues)
- Empty changeset frontmatter.
.changeset/wide-chairs-agree.mdhas an empty---\n---block, so it bumps nothing. Repo convention tags even server-only changes —.changeset/2368(a CSRF middleware fix, no schema touch) carries"adcontextprotocol": patch. This is a server feature that adds optional fields and enum values to the published registry OpenAPI. Either tag itpatch/minorto match convention or confirm the empty changeset is deliberate. - Residual
discovery_methodenum drift. This PR fixes the component schema to all five values, but two hand-written inline path schemas inregistry.yaml(~3881, ~4033) and the inline zod atregistry-api.ts:1299still enumerate only the three legacy values. Pre-existing divergence, only partially closed here. Reconcile or annotate as scoped to endpoints that cannot emit the new values. - Orphan
catalog_propertieson identifier collision.upsertCommunityCatalogProperty(publisher-db.ts:~1647) inserts thecatalog_propertiesrow unconditionally, then the identifier insert usesON CONFLICT (identifier_type, identifier_value) DO NOTHING. If an identifier is already owned by anotherproperty_rid, the new row keeps zero identifiers — an identifier-less orphan (bounded; cleaned on retire). The first-partyprojectPropertyToCatalogpath pre-checks ownership; mirror that, or skip thecatalog_propertiesinsert when all identifiers conflict.
Minor nits (non-blocking)
- DELETE early-return ROLLBACK is unguarded.
community-mirrors.ts:~1767and:~1775doawait client.query('ROLLBACK')bare on the 404/409 branches; if ROLLBACK itself threw, control would fall to the catch and return 500 instead of the intended status. The catch already uses.catch(() => undefined)— mirror it here. Practically harmless.
The AAO → AgenticAdvertising.org find-replace rides along through the templates and tests; an interesting amount of surface area for one PR, but it is a pure string swap and the embed-route test was updated to match.
Approving on the strength of three clean expert verdicts plus the first-party-attestation guard holding on every column and in both write directions. Follow-ups noted above.
Summary
Unifies approved community adagents catalogs with the publisher registry read model instead of treating
/api/registry/mirrorsas a parallel concept.publishers,discovered_properties, and catalog identifier tables transactionally.community_catalogdiscovery provenance and backfills existing community mirrors./api/registry/publisherso community adagents files surface asfiles.adagents_json.status = community, with an absolute registry URL, brand summary, scoped format summaries, and community property provenance.Expert review follow-up
Addressed product/UX/CSS review items:
Checking nowjust because auto-crawl fires.Validation
npm run typecheck -- --pretty falsenpx vitest run server/tests/integration/registry-publisher-brand-json-hydration.test.ts --config server/vitest.config.tsnpx vitest run server/tests/integration/community-mirrors.test.ts --config server/vitest.config.tsnpm run test:migrationsnpm run build:openapigit diff --checkCONDUCTOR_PORT=55020 docker compose -p adcp_kelowna_e2e up --build -de2e-social.example, with Puppeteer assertions for community state, scoped format cards, raw registry URL, no-agent caveat, and mobile overflow.