Skip to content

feat(registry): unify community catalogs with publisher pages#5380

Merged
bokelley merged 5 commits into
mainfrom
aao-key-rbac-403
Jun 6, 2026
Merged

feat(registry): unify community catalogs with publisher pages#5380
bokelley merged 5 commits into
mainfrom
aao-key-rbac-403

Conversation

@bokelley
Copy link
Copy Markdown
Contributor

@bokelley bokelley commented Jun 6, 2026

Summary

Unifies approved community adagents catalogs with the publisher registry read model instead of treating /api/registry/mirrors as a parallel concept.

  • Projects community mirror PUT/DELETE into publishers, discovered_properties, and catalog identifier tables transactionally.
  • Adds migrations for community_catalog discovery provenance and backfills existing community mirrors.
  • Updates /api/registry/publisher so community adagents files surface as files.adagents_json.status = community, with an absolute registry URL, brand summary, scoped format summaries, and community property provenance.
  • Makes the publisher page a richer profile: brand identity, stats, no-agent caveat, creative format preview cards, properties, agents, and clearer community copy.
  • Keeps community formats domain-scoped so a platform catalog does not overclaim formats for a property/page that does not support them.

Expert review follow-up

Addressed product/UX/CSS review items:

  • Community pages no longer appear as Checking now just because auto-crawl fires.
  • Community hero uses the registry-served raw adagents URL and adds a builder CTA.
  • Community hero styling works in full and embed pages.
  • No-agent catalog pages show a visible caveat before the format previews.
  • Raw enum labels/IDs are humanized where visible on the publisher page.
  • Mobile layout stacks cleanly and avoids horizontal overflow.
  • Logo rendering only uses HTTPS logos and falls back safely.

Validation

  • npm run typecheck -- --pretty false
  • npx vitest run server/tests/integration/registry-publisher-brand-json-hydration.test.ts --config server/vitest.config.ts
  • npx vitest run server/tests/integration/community-mirrors.test.ts --config server/vitest.config.ts
  • npm run test:migrations
  • npm run build:openapi
  • git diff --check
  • CONDUCTOR_PORT=55020 docker compose -p adcp_kelowna_e2e up --build -d
  • Docker API/browser seed for e2e-social.example, with Puppeteer assertions for community state, scoped format cards, raw registry URL, no-agent caveat, and mobile overflow.
  • Precommit hook on both commits: 51 unit test files / 942 tests, dynamic import lint, callapi state-change lint, typecheck.

@mintlify
Copy link
Copy Markdown

mintlify Bot commented Jun 6, 2026

Preview deployment for your docs. Learn more about Mintlify Previews.

Project Status Preview Updated (UTC)
adcp 🟢 Ready View Preview Jun 6, 2026, 1:55 PM

💡 Tip: Enable Workflows to automatically generate PRs for you.

Comment thread server/src/db/publisher-db.ts Fixed
@bokelley bokelley marked this pull request as ready for review June 6, 2026 14:32
aao-release-bot[bot]
aao-release-bot Bot previously approved these changes Jun 6, 2026
Copy link
Copy Markdown
Contributor

@aao-release-bot aao-release-bot Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Clean. Projects the community-mirror catalog into the existing publishers read model instead of letting /api/registry/mirrors stay a parallel universe — the right architectural call: one provenance lattice (adagents_json > brand_json > community > discovered), not two.

Things I checked

  • Self-host takeover protection holds. publishers ON CONFLICT (domain) DO UPDATE guards every column with WHEN publishers.source_type = 'adagents_json' THEN <existing> ELSE EXCLUDED (publisher-db.ts:1280-1313, mirrored in migration 508). The loop reads RETURNING source_type and continues without writing properties when the row is already first-party (publisher-db.ts:1324). A community catalog cannot overwrite, retype, or re-provenance a publisher-attested row. security-reviewer and code-reviewer both confirmed isolation.
  • Migration 508 deletes are scoped. Every DELETE keys on created_by LIKE 'community_adagents:%' / source_type='community' (508:980-1004). Non-community publisher rows are never touched. ON CONFLICT targets exist (discovered_properties UNIQUE(publisher_domain, name, property_type); catalog_identifiers UNIQUE(identifier_type, identifier_value)). No data-loss path.
  • Transactions are leak-free. community-mirrors.ts PUT (209-238) and DELETE (257-277): getClient() outside the try, every early-return (404/409) explicitly ROLLBACKs, finally { client.release() } covers all paths, catch uses ROLLBACK().catch(()=>undefined) so it can't mask the original error. No double-release.
  • hasValidAdagents correctly excludes community rows (federated-index-db.ts) — community catalogs are valid registry documents but not origin validation, so they must not suppress crawl-on-view or flip hosting.mode to self. Behavior change is deliberate and documented.
  • Auth gate change is stronger, not weaker. http.ts:2690 swaps an inline isWebUserAAOAdmin 403 for the requireAuth, requireAdmin chain (imported http.ts:57, used on ~30 other admin routes). isWebUserAAOAdmin is still referenced elsewhere — no orphaned import. security-reviewer: equivalent-or-stronger, no behavior gap.
  • No XSS surface in the new render path. firstLogoUrl enforces ^https:// (registry-api.ts:1740), collectBrandColors enforces ^#[0-9A-Fa-f]{6}$ (1731), and publisher-home.html runs brand name/description/industries, format chips, and the hex swatch style="background:" through escapeHtml. Attacker-controlled adagents.json params reach the chips escaped.
  • Wire changes are additive. New enum members (community_catalog on discovery_method, community on property source and files.adagents_json.status) and new optional fields (registry_url, brand, formats) — no removals, renames, or required flips. ad-tech-protocol-expert: sound-with-caveats, zod/openapi parity clean across every named-schema field.
  • Test-plan honesty. Validation section runs the two integration suites, test:migrations, build:openapi, and a Docker e2e with Puppeteer assertions covering the primary user-facing changes (community state, scoped format cards, raw registry URL, no-agent caveat, mobile overflow). No unchecked manual item on the path this PR changes.

Follow-ups (non-blocking — file as issues)

  • Changeset declares nothing. wide-chairs-agree.md is ---\n--- — no package, no bump, no summary. It passes the bot (empty changesets are the supported escape hatch) and no static/schemas/source/** changed, so it doesn't block. But ad-tech-protocol-expert reads the registry-API additions as a textbook minor (new optional response fields + new response enum members). Confirm "no protocol-spec impact" is the intent; if the registry API counts as versioned surface, declare "adcontextprotocol": minor with a one-line note. A changeset that declares nothing documents nothing.
  • Consumer caveat for the new enum members. Buy/sell-side agents that switch-case on source / discovery_method / status and treat unknown values as hard errors will choke on community / community_catalog. Additive, but worth a release note so tolerant-reader behavior is the documented expectation.
  • identifier_type case parity. App insert writes catalog_identifiers.identifier_type as norm.type (possibly mixed-case, publisher-db.ts:1543-1551); migration 508:1159 writes lower(ident->>'type'). The unique constraint lowercases only value, so a case split between the two write paths can dodge dedup. Lowercase ident.type in the app insert for parity.
  • Host-header fallback in publicBaseUrl (registry-api.ts:1749) reads req.get("host") to build registry_url. PUBLIC_BASE_URL/BASE_URL short-circuits in prod and the handler sets no Cache-Control, so security-reviewer graded it Low/self-inflicted. Defense-in-depth: drop the Host branch or pin the canonical origin.

Minor nits (non-blocking)

  1. No-op self-assignment. migration 508:1135 last_validated = discovered_properties.last_validated on conflict reads like a mistake; community rows intentionally keep NULL. Comment it or drop it.
  2. Import split. registry-api.ts:1862 leaves import { AAO_UA_COMPLIANCE } stranded below the new ~170-line helper/type block inserted at L128. Move the new code below the import region.

Approving on the strength of the takeover-protection guard plus leak-free transaction wrapping. Follow-ups noted.

@bokelley bokelley changed the title [codex] Unify community catalogs with publisher registry feat(registry): unify community catalogs with publisher pages Jun 6, 2026
aao-release-bot[bot]
aao-release-bot Bot previously approved these changes Jun 6, 2026
Copy link
Copy Markdown
Contributor

@aao-release-bot aao-release-bot Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approving. The architectural call is right: community catalogs become a source_type='community' projection into the existing publishers read model instead of a parallel /api/registry/mirrors concept, and first-party adagents_json always wins the merge. Fail-closed on first-party data, additive on the wire.

Things I checked

  • First-party-wins is load-bearing and correct. Both the backfill (508_backfill_community_mirror_publisher_projection.sql:1070-1104) and the app path (publisher-db.ts:1280-1313, guard + continue at :1324) preserve every column of a source_type='adagents_json' row via CASE WHEN publishers.source_type='adagents_json' THEN publishers.X ELSE EXCLUDED.X. Retire (publisher-db.ts:1407-1412) deletes only source_type='community' publishers, so a self-host takeover survives. The takeover integration test confirms it (community-mirrors.test.ts:2260).
  • hasValidAdagents excludes community rows (federated-index-db.ts:869-882) — now requires adagents_json IS NOT NULL AND source_type='adagents_json' AND review_status='approved'. Community catalogs can't suppress crawl-on-view or flip hosting.mode to self. src_priority ranks community=2 vs adagents=0.
  • Transactions don't leak a client. PUT/DELETE in community-mirrors.ts:209-279 pair getClient() with client.release() in finally on every path, ROLLBACK on the 404/409 early returns and the catch.
  • Dynamic $N DELETE indexing is correct (publisher-db.ts:1461-1468) — params.length-2/-1/length map to the just-pushed triple; no placeholder desync. security-reviewer independently confirmed.
  • Auth change is a tightening. /api/brands/discovered swapping the inline isWebUserAAOAdmin check for requireAdmin (http.ts:2566) is a strict superset — same admin assertion plus tenant-scoped cross-org refusal. Not a loosening.
  • Server is the validating boundary for the new publisher-page data. firstLogoUrl is https-only (registry-api.ts:1740), collectBrandColors enforces ^#[0-9A-Fa-f]{6}$, and publisher-home.html escapeHtmls brand-controlled strings into src=/style=. No stored-XSS, no javascript: logo. The <img onerror> fallback at publisher-home.html:562 is a code-controlled DOM listener, not an injected attribute.
  • OpenAPI matches the zod source. The 74-line registry.yaml diff mirrors the 34-line registry.ts diff field-for-field. Every change is an enum insertion (discovery_method+community_catalog, status+community, source+community) or a new optional field (registry_url, brand, formats[]). No removals, renames, or required↔optional flips — additive to response shapes, minor under tolerant-reader.

Follow-ups (non-blocking — file as issues)

  • The changeset is empty. .changeset/wide-chairs-agree.md is ---\n--- — no package bump, no summary, auto-generated name. This isn't a block: static/openapi/registry.yaml is the registry service API, not static/schemas/source/**, and the change is purely additive. But the PR adds enum values and response objects to a published API, and the rest of the registry changesets in this repo carry content. Confirm the empty changeset is the intentional "no-bump" marker and not an oversight — if your changeset CI rejects empty entries, this needs a real minor line before merge.
  • seller_preference drifts from the canonical format shape. PublisherFormatSummarySchema.seller_preference is z.string() (registry.ts), but the canonical ProductFormatDeclaration (static/schemas/source/core/product-format-declaration.json) types it as the enum ["preferred","accepted","discouraged"]. It's a derived summary, so this is defensible, but a freeform string invents a looser contract than the wire it summarizes. Tighten to the canonical enum.
  • A community catalog_properties row can land with zero identifiers. When a community identifier collides with a verified publisher's, ON CONFLICT (identifier_type, identifier_value) DO NOTHING (508_...sql:1171, publisher-db.ts:1543-1551) correctly refuses the hijack — but the catalog_properties insert already ran, leaving a property with no resolvable identifier. Not orphaned (retire cleans by created_by), but it's dead weight. Consider gating the property insert on identifier insertability.

Minor nits (non-blocking)

  1. last_validated asymmetry between backfill and PUT. Migration 508 sets the discovered_properties ON CONFLICT to last_validated = discovered_properties.last_validated (preserve) and inserts NULL, while upsertCommunityCatalogProperty (publisher-db.ts:1501-1509) writes NOW(). Harmless, but first-deploy backfill and subsequent re-publishes will disagree on the timestamp for the same row. Align them or note the intent.

security-reviewer: clean. ad-tech-protocol-expert: sound-with-caveats (additive, no yaml/zod drift; seller_preference is the one shape drift). code-reviewer: first-party guard, transactions, and $N indexing all verified correct.

LGTM. Follow-ups noted above — confirm the empty changeset is deliberate.

Copy link
Copy Markdown
Contributor

@aao-release-bot aao-release-bot Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right architecture — collapsing /api/registry/mirrors into the publishers read model instead of carrying a parallel concept is the correct call, and the source_type-preservation invariant (community writes never clobber an adagents_json row) holds in every CASE branch I checked. But two write paths share an unhandled constraint mismatch that 500s the publish endpoint and can abort the deploy migration. Blocking on that.

MUST FIX

  1. Unhandled second unique index on discovered_properties → 500 / migration abort. discovered_properties has two unique constraints: UNIQUE(publisher_domain, name, property_type) and the partial index idx_properties_unique_property_id ON (publisher_domain, property_id) WHERE property_id IS NOT NULL (026_discovered_properties.sql:33). Both new write paths — upsertCommunityCatalogProperty (server/src/db/publisher-db.ts:1495) and the backfill INSERT (508_backfill_community_mirror_publisher_projection.sql, the INSERT INTO discovered_properties) — only specify ON CONFLICT (publisher_domain, name, property_type). A community property whose (publisher_domain, property_id) collides with an existing row (a crawler-discovered property already on that domain, or a second platform's mirror naming the same property_id) but whose name differs hits the other index, which the ON CONFLICT clause does not catch → uncaught unique_violation. In the PUT path that rolls back the whole transaction and returns 500; in migration 508 it aborts the entire backfill on deploy. Handle both conflict targets, or key the upsert on property_id when present.

  2. Backfill canonicalizes publisher_domain differently from the live writer → zombie publisher rows. Migration 508 derives the domain with lower(trim(coalesce(...))). The live PUT/retire path uses canonicalizePublisherDomain (server/src/services/publisher-domain.ts:21), which additionally strips http(s):// and trailing //.. That function's own comment calls this exact divergence "a security boundary (writer-vs-validator parity) — two strings that refer to the same publisher MUST canonicalize identically." A manifest with publisher_domain: "https://x.com/" backfills a row keyed https://x.com/; the next PUT/retire operates on x.com and never reconciles it — leaving a public, un-retireable community publisher page. Canonicalize identically in the migration.

Follow-ups (non-blocking — file as issues)

  • Merge dedup keys on the first identifier only. registry-api.ts propertyKey uses property.identifiers?.[0], whose ordering isn't guaranteed across sources. Two distinct properties sharing a leading identifier collapse to one; the same property listed with a different leading identifier in brand.json vs community appears twice. Key on property_id when present, or on a stable min over all identifiers.
  • Orphaned catalog_properties on identifier collision. catalog_identifiers has a global UNIQUE(identifier_type, identifier_value) (335_property_catalog.sql:44). The parent catalog_properties row is inserted unconditionally with a fresh property_rid before the ON CONFLICT ... DO NOTHING identifier insert, so a property whose identifier is already claimed lands a parent row with zero linked identifiers — unreachable by lookup. Skip the parent insert when the identifier would conflict.
  • Empty changeset. .changeset/wide-chairs-agree.md is ---\n--- with no summary line. The repo convention (e.g. 5013-check-publisher-authorization-cache.md) keeps the empty frontmatter but adds a one-line description. Fill it — this PR adds community / community_catalog enum values to a response contract, and downstream maintainers with strict response-enum codegen want that regen signal (ad-tech-protocol-expert: additive-and-sound, but worth the note).

Minor nits (non-blocking)

  1. Community write overwrites a verified property's identifiers. upsertCommunityCatalogProperty (publisher-db.ts:1495) preserves source_type for adagents_json/aao_hosted on (publisher_domain, name, property_type) collision but still overwrites identifiers, tags, and last_validated. A moderator who matches the exact triple of a verified property could replace its identifiers (row still renders as publisher-attested). Trusted role + exact-triple knowledge, so low — but guard the identifier/tag overwrite behind the same source_type CASE. (security-reviewer: Low.)
  2. Host-header fallback. publicBaseUrl (registry-api.ts) falls back to req.get('host') when neither PUBLIC_BASE_URL nor BASE_URL is set. Production sets BASE_URL, so not exploitable today; drop the fallback so it can't regress into the registry_url href if the env var is ever removed. (security-reviewer: Low.)

Things I checked

  • hasValidAdagents tightening (AND review_status = 'approved') does not regress existing publisher-attested rows: publishers.review_status defaults to 'approved' (432_publishers_overlay.sql:37). Confirmed.
  • source_type-preservation invariant: every ON CONFLICT (domain) DO UPDATE branch in both replaceCommunityAdagentsCatalogWithClient and migration 508 keeps an existing adagents_json row's columns; community data only fills community/null rows. Holds.
  • DELETE ordering in migration 508 (catalog_identifiers → catalog_properties → discovered_properties → publishers) respects the only real FK (catalog_identifiers.property_rid → catalog_properties).
  • /api/brands/discovered auth change is equivalent-or-stronger: requireAuth, requireAdmin (http.ts:2693) subsumes the old inline isWebUserAAOAdmin 403 (auth.ts:1466). (security-reviewer.)
  • Public-page XSS surface: escapeHtml on every interpolated sink in publisher-home.html; logo_url is https-only (firstLogoUrl), colors strict #RRGGBB (collectBrandColors) before the style="background:" swatch. (security-reviewer: no unescaped sink.)
  • Wire shape: static/openapi/registry.yaml is regenerated in the diff (CI test:openapi gate satisfied); enum additions on response fields are forward-compatible. (ad-tech-protocol-expert: additive-and-sound.)

The test plan runs a Docker e2e plus four integration suites; none of them feed a manifest with a repeated property_id, which is the single input that trips both write paths above.

Fix the two MUST FIX items and this is a clean merge.

Copy link
Copy Markdown
Contributor

@aao-release-bot aao-release-bot Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approving. Community catalogs fold into the publishers read model instead of living as a parallel /api/registry/mirrors concept — and first-party attestation strictly wins on every column, which is the invariant this whole change rests on.

Things I checked

  • Transaction safety in community-mirrors.ts (PUT and DELETE): single client acquired once, BEGIN/COMMIT on the happy path, ROLLBACK on every early-return (404/409) and in the catch, client.release() in finally. No leaked client, no double-release. (code-reviewer)
  • First-party clobber guard holds on every column. Both the app-code publishers upsert (publisher-db.ts:577-611) and migration 508's ON CONFLICT preserve each column when publishers.source_type = 'adagents_json'; the discovered_properties upserts guard on source_type IN ('adagents_json','aao_hosted'). publishers.source_type only admits adagents_json|community|enriched (migration 432), so adagents_json is the only first-party value and it is protected. Reverse direction is also safe: the crawler write (publisher-db.ts:399) unconditionally promotes a matching row back to adagents_json, so a real origin always reclaims a community-held domain. (security-reviewer)
  • hasValidAdagents tightened to require source_type='adagents_json' AND review_status='approved' (federated-index-db.ts) — a community row can never flip hosting.mode to self or suppress crawl-on-view. Right shape: a registry document is not origin validation.
  • Migration 508 domain derivation converges with the app path. The SQL regexp_replace(...) is functionally identical to canonicalizePublisherDomain (publisher-domain.ts), so an app-code re-publish lands on the same publishers(domain) rows. DELETE-then-INSERT scoped to created_by LIKE 'community_adagents:%', jsonb_agg ... ORDER BY — idempotent and deterministic.
  • Auth swap on POST /api/brands/discovered: requireAdmin is imported (http.ts:57) and is a superset of the removed inline isWebUserAAOAdmin check (auth.ts:1321 — same check plus ADMIN_EMAILS and the static/WorkOS admin paths). Tightening, not loosening.
  • zod ↔ OpenAPI agree. registry.ts (source, status, registry_url, discovery_method, the new brand/formats schemas) match static/openapi/registry.yaml field-by-field and enum-by-enum after regeneration. Every wire change is additive — new optional fields, widened enums — so non-breaking/minor for existing consumers. (ad-tech-protocol-expert: sound-with-caveats)
  • Frontend injection surface. Community-derived brand logo_url is gated to ^https:// and colors to #RRGGBB, with escapeHtml on the rendered strings (registry-api.ts:1820+, publisher-home.html profile render) — no XSS via the swatch style attr or logo src.

Follow-ups (non-blocking — file as issues)

  • Empty changeset frontmatter. .changeset/wide-chairs-agree.md has an empty ---\n--- block, so it bumps nothing. Repo convention tags even server-only changes — .changeset/2368 (a CSRF middleware fix, no schema touch) carries "adcontextprotocol": patch. This is a server feature that adds optional fields and enum values to the published registry OpenAPI. Either tag it patch/minor to match convention or confirm the empty changeset is deliberate.
  • Residual discovery_method enum drift. This PR fixes the component schema to all five values, but two hand-written inline path schemas in registry.yaml (~3881, ~4033) and the inline zod at registry-api.ts:1299 still enumerate only the three legacy values. Pre-existing divergence, only partially closed here. Reconcile or annotate as scoped to endpoints that cannot emit the new values.
  • Orphan catalog_properties on identifier collision. upsertCommunityCatalogProperty (publisher-db.ts:~1647) inserts the catalog_properties row unconditionally, then the identifier insert uses ON CONFLICT (identifier_type, identifier_value) DO NOTHING. If an identifier is already owned by another property_rid, the new row keeps zero identifiers — an identifier-less orphan (bounded; cleaned on retire). The first-party projectPropertyToCatalog path pre-checks ownership; mirror that, or skip the catalog_properties insert when all identifiers conflict.

Minor nits (non-blocking)

  1. DELETE early-return ROLLBACK is unguarded. community-mirrors.ts:~1767 and :~1775 do await client.query('ROLLBACK') bare on the 404/409 branches; if ROLLBACK itself threw, control would fall to the catch and return 500 instead of the intended status. The catch already uses .catch(() => undefined) — mirror it here. Practically harmless.

The AAOAgenticAdvertising.org find-replace rides along through the templates and tests; an interesting amount of surface area for one PR, but it is a pure string swap and the embed-route test was updated to match.

Approving on the strength of three clean expert verdicts plus the first-party-attestation guard holding on every column and in both write directions. Follow-ups noted above.

@bokelley bokelley merged commit 549c38f into main Jun 6, 2026
16 checks passed
@bokelley bokelley deleted the aao-key-rbac-403 branch June 6, 2026 16:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant