diff --git a/.changeset/identitymatch-fcap-architecture-spec.md b/.changeset/identitymatch-fcap-architecture-spec.md new file mode 100644 index 0000000000..fe04a4b888 --- /dev/null +++ b/.changeset/identitymatch-fcap-architecture-spec.md @@ -0,0 +1,36 @@ +--- +"adcontextprotocol": patch +--- + +IdentityMatch & frequency capping architecture, with the wire-spec change and the data-flow boundary contract landing as authoritative protocol docs. Counting and policy live in the buyer's impression tracker; the IdentityMatch service consumes only cap-fire events at the boundary. + +**Wire spec changes** (`identity-match-response.json`): +- Adds `serve_window_sec` (integer, 1–300, default 60) — per-package single-shot fcap window. After serving the user one impression on each eligible package within this window, the publisher MUST re-query Identity Match before serving from those packages again. Not a router response cache TTL. +- Removes `ttl_sec`. Originally documented as a router cache TTL but operationally functioned as a per-package serve throttle. TMP is pre-launch (experimental, pre-3.0.0 GA) and not subject to deprecation cycles, so the field is removed outright. + +**Doc updates:** +- `docs/trusted-match/specification.mdx` — adds `serve_window_sec` field, removes `ttl_sec`, adds normative conformance invariants for IdentityMatch eligibility (audience intersection; cap-state presence check; active state; audience freshness). Updates the caching section for the new contract. +- `docs/trusted-match/identity-match-implementation.mdx` (new page) — frequency-cap data flow (boundary contract): the cap-fire event the impression tracker writes into the IdentityMatch cap-state store, and how the IdentityMatch service consumes it at query time. The protocol does not constrain how the impression tracker counts impressions, evaluates windows, or decides when a cap fires — those concerns live entirely in the buyer's impression-tracking pipeline. +- `docs/trusted-match/buyer-guide.mdx` — updates frequency-cap management to reflect the impression-tracker / IdentityMatch split, and the serve-window contract section. +- `docs/trusted-match/migration-from-axe.mdx` — adds OpenRTB 2.6 `User.eids[]` cross-walk for buyers bridging from OpenRTB-shaped pipelines. + +**Three-layer model:** +- Wire spec (normative) — what crosses an agent boundary. +- Conformance invariants (normative) — backend-agnostic eligibility logic, including a presence check against cap-state. +- Boundary contract (normative for the cap-state store API) — what events flow from the impression tracker into the IdentityMatch cap-state store. Storage backend is implementer choice; the reference store ships in `adcp-go/targeting/fcap` (Valkey 9 hashes with HSETEX). + +**Cap-state store surface:** `RecordCap(userIdentity, fields, expireAt)` and `IsCapped(userIdentity, field)`, where `field` is `{seller_agent_url, package_id}`. v1 keys cap-state at `(user_identity, seller_agent_url, package_id)`; broader-dimension caps (advertiser, campaign, creative, line item) are a future extension to the boundary contract. + +**Architecture history** preserved at `specs/identitymatch-fcap-architecture.md` — captures design decisions, deferred security/privacy follow-ups, the rollout plan, and consolidated Slack/PR-review threads. Earlier iterations of the design (counter-based exposure tracking, log-based tracking with `impression_id` dedup, `fcap_keys` label model) were unwound — counting, dedup, and policy evaluation depend on buyer-internal concerns the protocol shouldn't constrain. + +All TMP surfaces remain `x-status: experimental`. Per the experimental-status contract, fields on this surface are not subject to deprecation cycles until 3.0.0 GA. + +**Tracked deferred follow-ups** (not in this PR): +- TMPX harvest → competitor-suppression attack +- Eligibility-as-audience-membership oracle (honeypot package_ids) +- Consent revocation between IdentityMatch and impression +- Side-channel via eligibility deltas +- `hashed_email` in TMPX leak surface +- DoS amplification via large `package_ids[]` +- Cap-state extensions for advertiser/campaign/creative dimensions +- Identity-graph plug-point in the impression tracker diff --git a/.changeset/tmp-data-protection-roles-doc.md b/.changeset/tmp-data-protection-roles-doc.md new file mode 100644 index 0000000000..93bd2b6a6a --- /dev/null +++ b/.changeset/tmp-data-protection-roles-doc.md @@ -0,0 +1,4 @@ +--- +--- + +Add `docs/trusted-match/data-protection-roles.mdx` mapping TMP architecture to GDPR controller/processor roles. Covers router (processor), buyer agent (processor conditional on operational discipline), publisher (controller, may delegate join to SSP), pre-negotiated pricing model, and explicit out-of-scope statement for post-impression flows. Includes risks requiring DPA scrutiny (cross-publisher exposure histories, retargeting audience construction, proprietary scoring) and publisher configuration choices with data protection implications (identity provider selection, cache semantics). diff --git a/docs.json b/docs.json index f079034a16..ab9d14d1f4 100644 --- a/docs.json +++ b/docs.json @@ -474,6 +474,7 @@ "docs/trusted-match/specification", "docs/trusted-match/router-architecture", "docs/trusted-match/privacy-architecture", + "docs/trusted-match/data-protection-roles", "docs/trusted-match/migration-from-axe", { "group": "Surface Guides", diff --git a/docs/reference/privacy-considerations.mdx b/docs/reference/privacy-considerations.mdx index 6a306e813f..5fb45b71de 100644 --- a/docs/reference/privacy-considerations.mdx +++ b/docs/reference/privacy-considerations.mdx @@ -80,6 +80,8 @@ Who is a controller and who is a processor depends on the deployment. AdCP does - A seller may be a controller for its own inventory data and a processor for buyer-scoped campaign data. - A TMP Router operator is typically a processor for both sides, operating under the separation guarantees described in [TMP privacy architecture](/docs/trusted-match/privacy-architecture). +For TMP-specific deployments, see [TMP Data Protection Roles](/docs/trusted-match/data-protection-roles) — a deeper analysis covering the buyer agent's conditional processor position, the SSP's role when the context+identity join is delegated, identity provider risk shapes, and post-impression flows that fall outside TMP's separation guarantees. + Operators MUST document their role for each data flow and carry a DPA with each counterparty that reflects it. ### Subprocessors and LLM providers diff --git a/docs/trusted-match/buyer-guide.mdx b/docs/trusted-match/buyer-guide.mdx index e0534f059c..1dc0b7fa6b 100644 --- a/docs/trusted-match/buyer-guide.mdx +++ b/docs/trusted-match/buyer-guide.mdx @@ -16,7 +16,7 @@ A buyer agent exposes two HTTP/2 endpoints under a single base URL — `POST /co | Message type | Receives | Returns | |---|---|---| | `context_match_request` | Page/content signals, placement, geo | Offers with creative manifests | -| `identity_match_request` | Seller agent URL, identity tokens, optional package ID list | Eligible package IDs + TTL | +| `identity_match_request` | Seller agent URL, identity tokens, optional package ID list | Eligible package IDs + `serve_window_sec` | Each endpoint handles one message type. Both must respond in under 50ms. The router enforces this budget and will skip slow providers. @@ -121,11 +121,11 @@ The router sends you the seller's `seller_agent_url` and one or more identity to "type": "identity_match_response", "request_id": "id-9c4e", "eligible_package_ids": ["acme-outdoor-q2", "acme-loyalty-retarget"], - "ttl_sec": 60 + "serve_window_sec": 60 } ``` -Return only the package IDs that pass your eligibility checks. Packages not in the list are treated as ineligible. The `ttl_sec` tells the router how long to cache this response — during that window, the router returns cached eligibility without re-querying you. The publisher uses cached eligibility to allocate across whatever placements exist. Set the TTL based on how quickly your eligibility state changes (frequency caps, audience updates, etc.). +Return only the package IDs that pass your eligibility checks. Packages not in the list are treated as ineligible. The `serve_window_sec` is a **per-package single-shot fcap**: after the publisher serves the user one impression on each eligible package within this window, the publisher MUST re-query Identity Match before serving from those packages again. Default 60s, max 300s. This is not a router response cache TTL — see [The serve-window contract](#the-serve-window-contract). **What you never receive** in Identity Match: page URLs, content topics, keywords, article text, or any content signal. You cannot determine what the user is looking at. @@ -144,21 +144,25 @@ You have no role in this step. The publisher controls activation. ## Frequency Cap Management -Cross-publisher frequency capping is the primary use case for Identity Match. Your agent maintains frequency state per user token: +Cross-publisher frequency capping is the primary use case for Identity Match. Cap policy and counting live in your **impression tracker**; the Identity Match service consumes only cap-fire signals at query time. The split: -- **Count impressions** by user token + package ID -- **Track recency** — when was the last impression for this token? -- **Apply caps** from the media buy: `max_impressions` per `window`, minimum `recency` between exposures -- **Exclude the package** from `eligible_package_ids` when a cap is hit -- **Set `ttl_sec`** to reflect how long this eligibility is valid — a shorter TTL means the router re-checks sooner, which is useful when a cap is close to being reached +- **Impression tracker** receives pixel fires, decodes the TMPX token, and applies whatever fcap policies you maintain — counting impressions across whatever dimensions you cap on (package, campaign, advertiser, creative, line item) for each resolved user identity, with whatever windowing and dedup logic your policy engine uses. +- **On the impression that exhausts a cap**, the impression tracker writes a cap-fire entry — `(user_identity, package) capped until ` — into the Identity Match cap-state store. +- **Identity Match service** at query time excludes any package with a cap-fire entry against any of the request's identities from `eligible_package_ids`. + +The protocol does not constrain how you count impressions, where policies live, or how you dedup across identities. It only defines the boundary: cap-fire events flow into the cap-state store; the IdentityMatch service checks presence at query time. See [Frequency-Cap Data Flow](/docs/trusted-match/identity-match-implementation) for the boundary contract and the reference cap-state store. + +When an fcap rule changes — a window shortens or lengthens, a `max_count` rises or falls, a policy is paused or removed, a package is reassigned — you MUST re-evaluate the affected `(user_identity, package)` cap-state entries against the new policy and push the appropriate updates: **delete** entries for users no longer over-cap, **extend** (overwrite with a new `expire_at`) entries that are still over-cap but whose window changed. The cap-state store doesn't store counts and can't re-evaluate on its own; the buyer's policy owner is the source of truth. See [Policy updates and cap-state re-evaluation](/docs/trusted-match/identity-match-implementation#policy-updates-and-cap-state-re-evaluation) for the event shapes. Because Identity Match runs across all publishers using TMP, a user who saw your ad on Publisher A will correctly show as over-frequency on Publisher B — even though you can't see which publisher sent the request. ### How Buyers Learn About Exposures -The `tmpx` field on the Identity Match response carries a TMPX token — an HPKE-encrypted blob containing the user's resolved identity tokens. The publisher substitutes `{TMPX}` into creative tracking URLs. When the ad serves, your impression pixel receives the encrypted token. Your cluster master decrypts it, logs the exposure against the user, and replicates updated frequency state to read replicas. This gives you real-time per-user exposure signals without the publisher seeing user identity. +The `tmpx` field on the Identity Match response carries a TMPX token — an HPKE-encrypted blob containing the user's resolved identity tokens. The publisher substitutes `{TMPX}` into creative tracking URLs. When the ad serves, your impression pixel receives the encrypted token. Your impression tracker decrypts it, applies your fcap policy logic against the resolved identities, and (when a cap fires) writes a cap-fire entry to the Identity Match cap-state store. Most production deployments separate decode (synchronous, at intake) from policy evaluation and cap-state writes (asynchronous, behind a queue) for buffering. + +This gives you real-time per-user exposure signals without the publisher seeing user identity. -See [TMPX Exposure Tokens](/docs/trusted-match/specification#tmpx-exposure-tokens) for the encryption format and binary token structure. +See [TMPX Exposure Tokens](/docs/trusted-match/specification#tmpx-exposure-tokens) for the encryption format and binary token structure, and [Frequency-Cap Data Flow](/docs/trusted-match/identity-match-implementation) for the cap-state store boundary contract. ## Provider Registration @@ -201,16 +205,18 @@ Common scenarios: - **Internal failure**: Return an error response. The router skips your provider and proceeds with other providers. - **Timeout**: If you can't respond within the latency budget, the router skips you. No error response needed — the router handles this. -## The TTL Caching Contract +## The serve-window contract + +The `serve_window_sec` field on Identity Match responses is a **per-package single-shot fcap** between the buyer and the publisher: + +- For each package in `eligible_package_ids`, the publisher MAY serve the user **at most one impression** on that package within `serve_window_sec` seconds. +- After the publisher has served one impression on each eligible package, the publisher MUST re-query Identity Match before serving any of those packages to the same user again. +- Multi-impression frequency capping (5/day, 100/month, etc.) is separate. It lives in your buyer-side state and is updated out-of-band via TMPX impression callbacks regardless of `serve_window_sec`. The serve window is the protocol-level throttle; multi-impression caps are buyer-internal policy. -The `ttl_sec` field on Identity Match responses is a caching contract between the buyer and the router: +The router MAY apply an internal deduplication cache keyed by `{identities_hash, provider_id, package_ids_hash, consent_hash}` (see spec for canonical bytes), but the publisher's binding contract is the serve-window throttle, not the router's cache window. -- The router caches the response for `ttl_sec` seconds, keyed by `{identities_hash, provider_id, package_ids_hash, consent_hash}` (see spec for canonical bytes). `identities_hash` is computed over the per-provider filtered subset you received — your cache partition is scoped to the identity types you resolve. -- During that window, the router returns cached eligibility without re-querying the buyer -- The publisher uses cached eligibility to allocate across whatever placements exist — a single pre-roll, a CTV ad pod, or a web page with multiple ad units -- The buyer doesn't need to know how many placements exist or how the publisher allocates +**Choosing a serve_window_sec value**: Default 60 seconds. Range 1–300. Anything longer than 300 makes per-package fcap too coarse for typical campaigns. Anything shorter than your IdentityMatch round-trip just adds load. 60 is a good default; tune downward if eligibility state shifts faster (close to a cap, audience just changed) or upward (max 300) if your IdentityMatch service is at load and the campaigns are tolerant of coarser fcap. -**Choosing a TTL**: Set the TTL based on how quickly your eligibility state changes. If frequency caps reset hourly, a 300-second TTL is reasonable. If a user is close to a cap limit, return a shorter TTL (e.g., 30 seconds) so the router re-checks sooner. ## Performance Requirements @@ -234,7 +240,7 @@ Buyers receive real-time per-user exposure signals via the `{TMPX}` macro. The I | | OpenRTB | TMP | |---|---|---| | **You receive** | Full bid request (user + content + device) | Either content OR identity, never both | -| **You return** | Bid price | Offer (creative manifest) or eligible package IDs + TTL | +| **You return** | Bid price | Offer (creative manifest) or eligible package IDs + serve window | | **Auction** | Exchange runs auction | No auction — publisher joins locally | | **Frequency** | Per-DSP only | Cross-publisher via Identity Match | | **Integration** | Per-exchange SSP adapter | Two endpoints (context + identity), any surface | diff --git a/docs/trusted-match/context-and-identity.mdx b/docs/trusted-match/context-and-identity.mdx index d8d4ee2c38..54ea279453 100644 --- a/docs/trusted-match/context-and-identity.mdx +++ b/docs/trusted-match/context-and-identity.mdx @@ -149,14 +149,14 @@ Note what is absent: no URL, no search query, no content signals, no topic IDs. "type": "identity_match_response", "request_id": "id-9b2c", "eligible_package_ids": ["pkg-A", "pkg-B"], - "ttl_sec": 60, + "serve_window_sec": 60, "tmpx": "k1.dG1weC1leGFtcGxlLWVuY3J5cHRlZC10b2tlbi4uLg" } ``` The buyer reports that this user is eligible for packages A and B. Package C is absent — the user is not eligible. The publisher does not need to know why — frequency capping, audience mismatch, and other disqualification reasons are buyer-internal. -The `ttl_sec: 60` tells the router: "Cache this for 60 seconds." The router uses this cached eligibility to fill whatever placements exist — a single slot, a CTV ad pod, or a page with multiple ad units — without re-querying the buyer. The publisher decides how to allocate across placements. +The `serve_window_sec: 60` tells the router: "Cache this for 60 seconds." The router uses this cached eligibility to fill whatever placements exist — a single slot, a CTV ad pod, or a page with multiple ad units — without re-querying the buyer. The publisher decides how to allocate across placements. ### What Identity Match never carries diff --git a/docs/trusted-match/data-protection-roles.mdx b/docs/trusted-match/data-protection-roles.mdx new file mode 100644 index 0000000000..44e24c36c8 --- /dev/null +++ b/docs/trusted-match/data-protection-roles.mdx @@ -0,0 +1,239 @@ +--- +title: Data Protection Roles +description: How TMP's architecture maps to GDPR controller/processor roles — what each party determines, what data each party holds, and where the risks are. +"og:title": "AdCP TMP Data Protection Roles" +--- + +# Data Protection Roles + +This page maps TMP's architecture to GDPR data protection roles (controller, processor, joint controller). It explains what each participant can and cannot determine about individuals, where the architectural boundaries support a processor position, and where they don't. + +This is an architectural analysis, not legal advice. Organizations should consult qualified data protection counsel for their specific circumstances. Familiarity with the [TMP privacy architecture](/docs/trusted-match/privacy-architecture) is assumed — concepts like structural separation, TEE attestation, package set decorrelation, and temporal decorrelation are defined there. + +## Verdict at a Glance + +| Participant | Role | Confidence | Where the risk sits | +|---|---|---|---| +| **TMP Router** | Processor | High (architectural) | Operator deployment integrity. Mitigated by TEE attestation. | +| **Buyer agent** | Processor | Conditional (operational) | Proprietary scoring, cross-advertiser data combination, audience construction, cross-publisher exposure accumulation. | +| **Publisher** | Controller | Unchanged | Consent collection; configuring data processors; the final serve decision. | +| **SSP / wrapper performing the join** | Joint controller or processor for the publisher | Conditional | Whoever performs the context+identity join inherits controller responsibility for that step. | +| **Identity provider** | Out of scope for TMP; controller for token issuance | Varies by provider | Token scope and graph behavior (publisher-first-party vs. deterministic cross-site vs. probabilistic graph) materially changes the publisher's risk. | +| **Measurement / attribution** | Out of scope for TMP | n/a | Post-impression flows reintroduce controller analysis that TMP does not address. See [Out of Scope](#out-of-scope-post-impression-flows). | + +## Background: Controller vs. Processor + +Under GDPR, a **controller** determines the purposes and means of processing personal data. A **processor** processes personal data on behalf of a controller. A **joint controller** jointly determines purposes and means with one or more other controllers. + +The distinction matters because controllers carry heavier obligations: legal basis for processing, data subject rights, DPIAs, and direct liability. The key question is not "who touches the data" but "who decides what happens with it." + +In advertising, the risk of being deemed a controller increases when an intermediary: + +- Decides to show an ad to a specific person based on something it knows or infers about them +- Builds or populates a segment by determining that an identifier has a characteristic +- Combines third-party segment data with other data to create a new profile +- Uses the data for any purpose beyond the specific campaign instruction + +The risk decreases when the intermediary: + +- Processes data under instructions from a party that has a direct relationship with the data subject or data provider +- Does not hold title to any segment or audience data +- Does not use the data independently or for its own purposes +- Does not combine data across sources to create new mappings + +## Why TMP's Posture Is Unusual + +The buyer agent's processor position is unusual relative to how the ad tech industry typically operates. Most DSPs function as joint controllers or independent controllers for their own optimization purposes, even when their MSAs claim processor status. The IAB Europe TCF framework reflects this by assigning separate purposes and legal bases to each vendor in the chain. + +TMP's architecture *enables* a buyer agent processor position that traditional DSPs cannot credibly claim, because: + +- The buyer never receives user identity and page context together (DSPs receive both in every bid request) +- The buyer does not set per-impression prices based on identity (DSPs submit bid prices informed by user data) +- The buyer returns binary eligibility, not a scored bid (DSPs return a price that encodes their valuation of the user) + +Whether a buyer agent operates within that envelope is a contractual and operational question, not an architectural one. Most buyer-side organizations will need to make explicit choices to stay inside it. A buyer agent that introduces proprietary scoring, cross-advertiser data combination, or independent audience construction erodes the architectural advantage and may need to assess its role as a joint controller regardless of what the protocol enables. + +## The TMP Router: Processor + +The TMP Router is infrastructure. It does not make targeting decisions, build profiles, or evaluate users. It receives requests from the publisher and fans them out to buyer agents. It receives responses and merges them. It returns the merged result to the publisher. + +**What the router sees on the Context Match path:** + +- Content signals (topics, keywords, sentiment) +- Placement identifiers +- Geographic context (coarse) +- No user identity of any kind + +**What the router sees on the Identity Match path:** + +- Opaque user tokens +- Package identifiers +- Consent signals +- No page context of any kind + +The two paths are structurally separate: no shared memory, no shared state, no communication channel. The router cannot associate a user token with a page URL because no single code path ever holds both. See [Privacy Architecture](/docs/trusted-match/privacy-architecture) for the enforcement mechanism. + +**What the router does not do:** + +- Evaluate whether a user should see an ad +- Check frequency caps or audience membership +- Build or store user profiles +- Combine context data with identity data +- Make pricing decisions +- Retain data beyond the request/response lifecycle + +The router is a processor acting on the publisher's instructions (which providers to call, which properties to serve). It processes personal data (opaque user tokens) solely to deliver them to buyer agents and return the result. + +> **Bottom line:** The router can credibly claim processor status. With TEE attestation, this is independently verifiable; without TEE, it depends on operator integrity and code audit. + +## The Buyer Agent: Processor Conditional on Operational Discipline + +TMP architecturally constrains what the buyer receives (no context with identity) and what it returns (eligible package IDs, nothing more). It does not constrain what the buyer does internally with the tokens it sees, the exposure histories it builds, or the proprietary models it runs. + +The processor position is therefore conditional, not architectural. It depends on the buyer agent's operational choices and the DPAs that govern them. Most buyer-side organizations will need to make deliberate choices to stay inside the envelope TMP enables. + +**What the architecture provides:** + +- The buyer never receives page context with identity. Identity Match requests carry no page URL, no content signals, no topic IDs. +- The buyer does not set price per impression. The Identity Match response is eligible package IDs and a cache TTL — no price, no bid, no scored response. +- The buyer does not make the serve decision. The publisher performs the join (or delegates it — see [the SSP question](#the-publisher-and-the-ssp-join)). + +**Where the processor position erodes:** + +- **Proprietary eligibility scoring.** A buyer agent that uses ML models to score eligibility — even models trained only on advertiser data — is determining means of processing. The line between "applying advertiser criteria" and "operating an optimization engine" is the line between processor and joint controller. A buyer agent that does *not* run a proprietary optimizer is uncompetitive against existing DSPs. This is the base case, not the edge case. +- **Cross-advertiser data combination.** A buyer agent serving multiple advertisers must keep their data isolated. Combining segment membership across advertisers to enrich profiles is controller behavior. +- **Audience construction from observation.** Applying an advertiser-provided audience list (processor) is different from constructing an audience by observing behavior (controller). See [Risks requiring DPA scrutiny](#risks-requiring-dpa-scrutiny). +- **Cross-publisher exposure histories.** Cross-publisher frequency capping is the headline TMP use case *and* the headline data protection exposure. Even without context, an exposure history tied to a user token across many publishers constitutes a behavioral profile under CJEU jurisprudence. The protocol does not eliminate this — it isolates it. + +> **Bottom line:** TMP enables a buyer-agent processor position; it does not enforce one. The DPA between the advertiser and the buyer agent must specify what the buyer may do with the tokens it sees, how exposure histories are bounded, and how proprietary models interact with advertiser data. + +## The Publisher and the SSP Join + +The publisher is the first party. They have a direct relationship with the user. They collect consent. They hold both context (what the user is viewing) and identity (who the user is). TMP does not change the publisher's controller status. + +The publisher's controller responsibilities include: + +- Collecting and transmitting consent signals in Identity Match requests +- Ensuring user tokens are opaque and not reversible to PII by buyer agents +- Performing (or delegating) the join between context and identity locally +- Applying consent logic before serving +- Configuring which providers the router calls (data processor selection) +- Selecting which identity provider's tokens to use (a controller-level decision — see [Publisher configuration choices](#publisher-configuration-choices)) + +**The join in practice.** "The publisher performs the join locally" is correct in principle and incomplete in practice. Most publishers operate ad servers (Google Ad Manager, Kevel, Equativ) that do not natively expose primitives for "join two real-time API responses with consent logic before serving." Publishers using GAM will typically need a header bidder wrapper, a Prebid module, or an SSP shim to perform the join. Many will outsource it to their SSP (Magnite, PubMatic, Index Exchange, OpenX). + +When the join is delegated, the SSP or wrapper inherits controller responsibility for the join step itself. The publisher's DPA with the SSP must reflect this: the SSP becomes a joint controller for the join (or processor specifically scoped to the join) depending on how their broader services are characterized. A publisher who assumes "TMP made me a processor" by virtue of delegating the join has misread the architecture. + +> **Bottom line:** Publisher remains controller. If the join is delegated to an SSP or wrapper, that party becomes a joint controller for the join step and must be addressed in the DPA chain. + +## Pricing and Real-Time Decisions + +A critical factor in controller/processor analysis is whether an intermediary makes real-time pricing or bidding decisions based on user identity. + +**TMP does not include real-time pricing based on identity.** Here is where pricing occurs in the protocol: + +| Decision | When | Based on | Where | +|---|---|---|---| +| Package price | Media buy negotiation (offline) | Product catalog, volume, terms | Buyer agent and publisher, before any user is evaluated | +| Context Match offer price | Request time | Content context only — no user identity | Context Match path (no identity data available) | +| Identity Match eligibility | Request time | Frequency caps, audience membership | Identity Match path (no context data available) | +| Final serve decision | After both responses return | Publisher (or SSP) joins context + identity + consent | Publisher infrastructure or delegated to SSP | + +No participant in TMP sets a price based on "this specific user on this specific page." The Context Match path may include variable pricing (a buyer might value hiking content more than general content), but this is based on content, not identity. The Identity Match path determines eligibility, not price. + +The pre-negotiated pricing model reduces the link between identity and economic outcomes, but does not eliminate it entirely. When the publisher (or SSP) joins context match offers with identity match eligibility and activates a package, the economic result is that this user on this page saw this ad at this price. A regulator may assess the system holistically rather than examining individual protocol messages. The architectural distinction is that no single intermediary makes a combined user-plus-context pricing decision. + +This is distinct from OpenRTB, where a bidder receives user identity and page context together and submits a per-impression bid price. In that model, the bidder *can* make a real-time pricing decision based on who the user is combined with what they are viewing. Whether it does depends on the campaign — many bidders price primarily on context and use identity only for frequency capping, which is closer to a processor pattern. The structural concern is that OpenRTB *enables* this combination, and the processor position depends on contractual constraints rather than architectural ones. TMP separates these concerns at the protocol level. + +## Out of Scope: Post-Impression Flows + +**TMP covers real-time decisioning only.** It does not specify how delivery reports, conversion events, attribution, or measurement data flow after an impression is served. + +This matters because every campaign needs conversion tracking, view-through attribution, MMM inputs, and incrementality measurement. These flows route impression-level data — typically including user tokens, creative IDs, timestamps, and conversion events — to attribution systems (CM360, Innovid, Flashtalking), DSP-native attribution stacks, and measurement vendors (DV, IAS, iSpot, Nielsen). These vendors are typically controllers or joint controllers for the data they receive. + +If operators plug their existing impression-log pipelines into TMP-decisioned campaigns, they will have built privacy-preserving real-time decisioning attached to a wide-open post-impression pipe. The architectural protection in front does not reach the back. + +Two paths address this: + +1. **Scope-limit deployment.** Treat post-impression flows as a separate data protection question, governed by the existing DPAs between the advertiser, the measurement vendors, and any clean room operators. TMP is not the lever for solving attribution privacy; existing frameworks (clean rooms, measurement APIs, aggregated reporting) are. +2. **Adopt compatible attribution.** Use buyer-blind conversion APIs, publisher-side conversion logs joined in a clean room, or aggregated measurement systems that maintain the same separation discipline as TMP itself. This is an emerging area; AdCP does not yet specify a "TMP-compatible attribution" pattern, though one may follow. + +DPOs evaluating TMP adoption should treat post-impression flows as a distinct workstream and not assume the protocol's separation properties extend to them. + +## Comparison: AXE (Deprecated) vs. TMP + +[AXE](/docs/media-buy/advanced-topics/agentic-execution-engine), TMP's predecessor, had a weaker data protection posture: + +| | AXE | TMP | +|---|---|---| +| What the real-time endpoint received | Full OpenRTB-style request: user identity + page context + device signals | Separate requests: context OR identity, never both | +| Who saw user + context together | The AXE endpoint operator | Only the publisher (or delegated SSP), as first party | +| Profile construction risk | Operator could theoretically build browsing profiles | Architecturally constrained for the router (no code path holds both signals); buyer-side correlation impeded by [decorrelation mechanisms](/docs/trusted-match/privacy-architecture) but depends on publisher compliance with SHOULD-level requirements | +| Pricing model | Opaque segment decisions fed ad server targeting | Pre-negotiated package prices, no per-impression bidding | +| Separation enforcement | Trust and contract | Code structure (auditable) or TEE attestation (verifiable) | + +AXE's design meant the endpoint operator (typically the orchestrator) received user identity and page context in the same request. The operator's processor position depended on contractual commitments not to misuse the combined data. TMP replaces this with structural separation — the router cannot misuse data it never holds together. + +## Comparison: RTD Modules (Prebid Real-Time Data) + +RTD modules are vendor-specific Prebid extensions that enrich bid requests at auction time. Each module sends the full OpenRTB BidRequest (2-10KB) to a vendor endpoint, which returns enrichment data (audience segments, contextual classifications, brand safety scores). + +**Data protection concern:** RTD modules send user identity and page context together to each vendor. The vendor's processor position depends on contract, not architecture. The cumulative exposure is significant: a publisher using 5 RTD modules sends the full user-plus-context payload to 5 different vendor endpoints per impression. Each vendor's processor position is independently contractual. + +TMP replaces vendor-specific RTD modules with a standardized protocol that enforces separation. Instead of sending everything to every vendor, TMP sends context to the context path and identity to the identity path. The result is the same (packages activate or don't), but the data exposure is structurally minimized. + +## Risks Requiring DPA Scrutiny + +These are operational risks that DPAs must address. The architecture does not constrain them. + +**1. Cross-publisher exposure histories** (the headline buyer-agent risk). + +Buyer agents that track cross-publisher frequency maintain exposure histories tied to user tokens. Even without page context, this constitutes a behavioral profile: how many properties this user appears on, how frequently, across which publisher categories. The CJEU's *Meta Platforms* decision (Case C-252/21) established that combining data across services can constitute controller-level processing even without deep profiling. + +This is not a footnote risk — it is the headline data protection exposure of the headline TMP use case. The protocol does not eliminate it; it isolates it. DPAs between advertisers and buyer agents must specify the legal basis, retention period, purpose limitation, and erasure flow (Article 17 rights apply: a user exercising erasure means the buyer's exposure history tied to that token must be deletable). + +**2. Retargeting audience construction.** + +There is a distinction between *applying* an audience (checking a user token against an advertiser-provided list — processor pattern) and *building* an audience (determining that a user token has a characteristic based on observed behavior — controller pattern). If a buyer agent receives conversion events or site visit signals and constructs a retargeting pool, it is building an audience. + +How retargeting audiences enter the system matters. Advertiser-provided lists that the buyer checks mechanically support a processor position. Buyer-constructed audiences from observed behavior do not. + +**3. Proprietary eligibility scoring.** + +A buyer agent that uses ML models to score eligibility — even models trained only on advertiser data — is determining means of processing. DPAs should specify what models the buyer may run, what data trains them, and how their outputs are constrained. + +**4. Measurement and attribution flows.** + +Covered in [Out of Scope](#out-of-scope-post-impression-flows). Treat as a distinct workstream from TMP itself. + +## Publisher Configuration Choices + +These are publisher-side configuration decisions with data protection implications. Each is a controller-level decision the publisher makes. + +**1. Identity provider selection.** + +TMP consumes tokens from identity providers, but the providers are not interchangeable. Different graph behaviors create fundamentally different risk shapes: + +| Token type | Risk shape | Examples | +|---|---|---| +| Publisher-first-party | No cross-site linkage. Lowest risk profile. | `publisher_first_party` (per-publisher hashed identifiers) | +| Deterministic cross-site | Same user resolves to same token across sites and devices. Enables cross-site profiling. | UID2 (operated by The Trade Desk, primarily for buy-side use), ID5 | +| Probabilistic / commercial graph | Provider operates an identity graph that resolves tokens to PII inside the provider's walls. | RampID (LiveRamp) | + +The choice of identity provider is itself a controller-level decision. Selecting a deterministic cross-site token expands the buyer agent's cross-publisher correlation surface. Publisher DPAs and consent flows should reflect the provider's specific posture, not treat all `uid_type` values equivalently. + +**2. Context Match with full artifacts.** + +When a publisher sends full content (`artifact` field) rather than classified signals (`context_signals`), the buyer agent receives the actual content. `context_signals` (pre-classified topics, sentiment, keywords) is the privacy-preserving baseline. Full artifacts exist for cases where the buyer needs to evaluate content directly (e.g., AI assistant conversations where classification alone is insufficient). + +**3. Cache semantics.** + +Identity Match responses include a `ttl_sec` caching contract. During the cache window, the router returns cached eligibility without re-querying the buyer. Cached eligibility is personal data (it's tied to a user token). The [specification](/docs/trusted-match/specification) allows TTLs up to 86,400 seconds (24 hours) with a recommended clamp at 3,600 seconds. Routers should enforce short TTLs, must not retain cached data beyond expiry, and must not use cached eligibility for any purpose other than responding to subsequent Identity Match requests for the same token. + +**4. Variable pricing on context.** + +The Context Match offer can include an `OfferPrice`. Because Context Match carries no identity, this is contextual pricing — not per-user pricing. However, if a publisher's `context_signals` are specific enough to identify an individual (e.g., a unique AI conversation summary), the contextual path could carry de facto identity. Publishers should ensure `context_signals` do not contain PII or uniquely identifying content. + +**5. Join delegation.** + +If the publisher delegates the context+identity join to an SSP, header bidder wrapper, or third-party module, that party becomes a joint controller for the join step. The publisher remains controller for the broader serve decision but must address the delegate's role in the DPA chain. diff --git a/docs/trusted-match/identity-match-implementation.mdx b/docs/trusted-match/identity-match-implementation.mdx new file mode 100644 index 0000000000..d333d80162 --- /dev/null +++ b/docs/trusted-match/identity-match-implementation.mdx @@ -0,0 +1,116 @@ +--- +title: Identity Match Frequency-Cap Data Flow +sidebarTitle: Frequency-Cap Data Flow +description: "Boundary contract between the impression tracker and the Identity Match service for frequency capping — the data flow only. Internal counting, policy evaluation, and storage layout are buyer-internal concerns." +"og:title": "AdCP TMP Identity Match Frequency-Cap Data Flow" +--- + +# Identity Match Frequency-Cap Data Flow + +This page describes how frequency-cap state reaches the Identity Match service and how Identity Match consumes it at eligibility time. It defines **the data flow only** — what crosses the boundary between the impression tracker and the Identity Match service. Internal mechanics (how the impression tracker counts impressions, where policies live, what storage layout the Identity Match service uses, how identities are deduplicated upstream) are buyer-internal concerns and are out of scope here. + +The wire spec lives in the [TMP specification](/docs/trusted-match/specification); the conformance invariants the Identity Match service must satisfy are also normative there. The reference implementation of the Identity Match cap-state store ships in [`adcp-go/targeting/fcap`](https://github.com/adcontextprotocol/adcp-go/tree/main/targeting/fcap). + +## Roles + +| Component | Responsibility | +|---|---| +| **Identity Match service** | At query time, returns `eligible_package_ids` — the subset of requested packages the user is not currently capped on (and that pass other eligibility checks). It does not count impressions and does not own fcap policies. | +| **Impression tracker** | Receives pixel fires, decodes TMPX, applies the buyer's fcap policies (counting, windowing, multi-identity dedup, whatever the buyer's policy logic does), and signals "cap fired" to the Identity Match cap-state store on the impression that exhausts a cap. | +| **Identity Match cap-state store** | Records `(user_identity, package) → cap-until` entries with TTL. Queried by the Identity Match service at eligibility time. Written by the impression tracker (or a downstream service in its pipeline). | + +The split is deliberate: counting impressions, evaluating windows, and deciding when a cap fires are buyer-internal policy concerns that vary across buyers and across campaigns. The Identity Match service stays narrow — it answers "is this user currently capped on this package?" and nothing more. New cap dimensions (advertiser, campaign, creative — see [extensions](#future-extensions)) plug into the same boundary contract without changing the service. + +## End-to-end flow + +``` +1. Identity Match query + publisher → router → Identity Match service + Identity Match looks up cap state for each (identity, package) pair + returns eligible_package_ids + tmpx (HPKE-encrypted resolved identities) + +2. Ad serves; creative tracking URL fires pixel with {TMPX} + publisher's player/page → impression tracker + +3. Impression tracker decodes TMPX + → resolved identities + signed package context (seller_agent_url, package_id) + +4. Impression tracker applies the buyer's fcap policies + → counts this exposure against whatever dimensions the buyer caps on + (package, campaign, advertiser, creative, line item, …) for each + resolved identity, using whatever policy logic and storage the buyer + runs internally + +5. If this impression exhausts a cap (i.e., it is the last allowed exposure + under one of the buyer's policies), the impression tracker (or a + downstream service in its pipeline) writes a cap-fire entry to the + Identity Match cap-state store: + (user_identity, package) capped until + +6. Subsequent Identity Match queries for that user see the cap-state entry + and exclude the package from eligible_package_ids until the entry expires +``` + +Steps 1, 2, and 6 cross the wire and are normatively defined in the [TMP specification](/docs/trusted-match/specification). Steps 3 and 5 cross the impression-tracker → cap-state-store boundary and are defined on this page. Step 4 is buyer-internal — the protocol does not constrain it. + +## The cap-fire event + +When a buyer's policy evaluation determines that an impression has exhausted a cap, the impression tracker writes a cap-fire entry to the Identity Match cap-state store. Each entry consists of: + +| Field | Description | +|---|---| +| `user_identity` | The resolved identity token (e.g., `rampid:abc`, `id5:def`, `maid:ghi`) the cap fired on. If a single impression resolved to multiple identities and the policy fired on all of them, the impression tracker writes one entry per identity. | +| `seller_agent_url` | The seller agent the package belongs to. Disambiguates identical `package_id` strings across sellers. | +| `package_id` | The package the cap fired on. | +| `expire_at` | Wall-clock time at which the cap expires. The cap-state store enforces this as a TTL — entries are absent after `expire_at`. | + +A single cap-fire event typically corresponds to one entry; a cap that fires on multiple resolved identities or multiple packages produces one entry per `(identity, package)` pair, all sharing the same `expire_at` if the buyer's policy is the same. + +The cap-state store does not record per-impression counts, policy definitions, or window configurations. Its only job is to answer "is this `(user_identity, package)` currently capped?" The buyer's policy logic — counting, windowing, choosing dimensions to cap on, deciding when to fire — lives entirely in the impression tracker. + +## The eligibility query + +At query time, the Identity Match service receives a list of identities and a list of candidate packages. For each candidate package, it checks the cap-state store for any matching `(identity, package)` entry across the user's identities. If any entry exists, the package is excluded from `eligible_package_ids`. This is a presence check, not a count. + +Cap state is one input to eligibility. The Identity Match service also evaluates audience membership, package active state, audience freshness, and any other inputs the buyer cares about — see the [conformance invariants](/docs/trusted-match/specification#conformance-invariants-for-identitymatch-eligibility). The cap-state portion of that evaluation is the part this page defines. + +## Policy updates and cap-state re-evaluation + +Cap-state entries are written under whatever fcap policy was in force at cap-fire time. When the buyer's fcap policies change — a window shortens or lengthens, a `max_count` rises or falls, a policy is paused or removed, a package is reassigned to a different policy — the existing cap-state entries written under the old policy can become stale. Stale entries either suppress users who should now be eligible (over-suppression) or fail to suppress users who should now be capped (under-suppression). + +When a fcap rule changes, the buyer's policy owner (typically the impression tracker or a service in its pipeline) MUST re-evaluate every cap-state entry the rule applied to and push the appropriate update to the IdentityMatch cap-state store. Two event shapes cover the cases: + +| Event | When to push | Effect on cap-state | +|---|---|---| +| **Delete cap-state** | A user's exposure count under the new policy is below the new `max_count`, or the policy was removed/disabled, or the package was reassigned away from the policy. | Remove the `(user_identity, package)` entry — the user is no longer suppressed on that package. | +| **Extend cap-state** | A user is still over-cap under the new policy, but the new `expire_at` differs from the existing entry — for example, the window was lengthened (push a later `expire_at`) or shortened (push an earlier `expire_at`). | Overwrite the entry with the new `expire_at`. | + +Re-evaluation runs over the buyer's own counting state (where impression history lives), not over the cap-state store — the cap-state store doesn't carry counts. The output is the set of delete-or-extend events to apply. + +The reference store in [`adcp-go/targeting/fcap`](https://github.com/adcontextprotocol/adcp-go/tree/main/targeting/fcap) implements extend natively (a second `RecordCap` for the same `(user_identity, field)` overwrites the prior `expire_at` via `HSETEX`). Delete is a future extension — today, the simplest workaround is to extend with an `expire_at` already in the past, which causes the entry to be treated as absent at the next query and to be reaped by the backend's TTL machinery. + +Re-evaluation can be expensive when a policy applies to many users. Buyers typically run it asynchronously: enqueue the policy-change event, sweep the affected user population in batches, push delete/extend events incrementally. The protocol does not constrain the cadence — only the eventual consistency requirement that cap-state must converge to what the current policies imply. + +## Reference implementation + +The cap-state store API in [`adcp-go/targeting/fcap`](https://github.com/adcontextprotocol/adcp-go/tree/main/targeting/fcap) is the reference shape. It exposes two operations: + +```go +RecordCap(ctx, userIdentity string, fields []Field, expireAt time.Time) error +IsCapped(ctx, userIdentity string, field Field) (bool, error) +``` + +— plus batch variants for both. `Field` is `{SellerAgentURL, PackageID}`. The reference store is backed by Valkey 9 hashes, hashed by user identity, with one hash field per `(seller_agent_url, package_id)` tuple and a TTL set to `expire_at`. Other backends (Aerospike, DynamoDB, in-memory, anything) are conformant if they satisfy the boundary contract above. + +## Future extensions + +Today the cap-state store is keyed at `(user_identity, seller_agent_url, package_id)`. Future protocol versions may extend the field to additional dimensions — advertiser, campaign, creative, line item — so a buyer can express caps that span multiple packages without writing N entries on every cap-fire. The boundary contract on this page is unchanged by such extensions: the impression tracker writes cap-fire entries; the Identity Match service checks presence at query time. + +## See also + +- [TMP Specification](/docs/trusted-match/specification) — wire spec, TMPX format, conformance invariants +- [Buyer Guide](/docs/trusted-match/buyer-guide) — buyer agent integration, Context Match + Identity Match flows +- [Migration from AXE](/docs/trusted-match/migration-from-axe) — for buyers transitioning from AXE-shaped pipelines, including the OpenRTB User.eids cross-walk +- [Privacy architecture](/docs/trusted-match/privacy-architecture) — what each party learns +- [Router architecture](/docs/trusted-match/router-architecture) — provider registration, fan-out, latency +- [`adcp-go/targeting/fcap`](https://github.com/adcontextprotocol/adcp-go/tree/main/targeting/fcap) — reference cap-state store in Go diff --git a/docs/trusted-match/index.mdx b/docs/trusted-match/index.mdx index c495b58ac6..5be186a531 100644 --- a/docs/trusted-match/index.mdx +++ b/docs/trusted-match/index.mdx @@ -161,11 +161,11 @@ Response from Sam's buyer agent: "type": "identity_match_response", "request_id": "id-7c9e1d", "eligible_package_ids": ["pkg-outdoor-audio"], - "ttl_sec": 60 + "serve_window_sec": 60 } ``` -Only eligible packages are listed — `pkg-outdoor-audio` passes the buyer's checks. The `ttl_sec: 60` tells the router to cache this eligibility for 60 seconds. +Only eligible packages are listed — `pkg-outdoor-audio` passes the buyer's checks. The `serve_window_sec: 60` tells the router to cache this eligibility for 60 seconds. The example sends `package_ids` explicitly, but the publisher MAY omit it — Sam's identity-match service resolves the active package set from `seller_agent_url`. When `package_ids` IS sent, its composition MUST be independent of the current page — either all-active (every Sam package at StreamHaus) or fuzzed (a random sample padded with synthetic IDs that Sam will silently drop). A page-specific subset is forbidden; it would let the buyer correlate package sets across Context Match and Identity Match, breaking the structural separation. @@ -226,6 +226,9 @@ The same TMP Router handles StreamHaus's website, their mobile app, their CTV ap Structural separation, temporal decorrelation, and TEE attestation. + + Controller vs. processor analysis for each TMP participant. + Deployment, fan-out, and provider configuration. diff --git a/docs/trusted-match/migration-from-axe.mdx b/docs/trusted-match/migration-from-axe.mdx index 673cdbd3f6..829ad4b04a 100644 --- a/docs/trusted-match/migration-from-axe.mdx +++ b/docs/trusted-match/migration-from-axe.mdx @@ -85,3 +85,21 @@ New media buys should omit AXE fields entirely. The buyer agent's Context Match - **`sync_creatives`** — Same creative sync - **GAM as the ad server** — TMP still sets key-values that GAM evaluates - **Geographic and other targeting overlays** — These are media buy fields, not execution-layer concerns + +## OpenRTB User.eids cross-walk + +For buyers bridging from OpenRTB-shaped pipelines, the TMP Identity Match `identities[]` shape maps to OpenRTB 2.6 `User.eids[]` as follows: + +| AdCP TMP `identities[].uid_type` | OpenRTB 2.6 `User.eids[].source` | Notes | +|---|---|---| +| `rampid` / `rampid_derived` | `liveramp.com` | `atype: 3` (person-based, per [IAB AdCOM Agent Types](https://github.com/InteractiveAdvertisingBureau/AdCOM/blob/main/AdCOM%20v1.0%20FINAL.md#list_agenttypes)) | +| `id5` | `id5-sync.com` | | +| `uid2` | `uidapi.com` | `atype: 3` | +| `euid` | `euid.eu` | | +| `pairid` | `iabtechlab.com/pair` | | +| `maid` | `adid` (Android) / `idfa` (iOS) | Atypically carried on `Device.ifa` rather than `User.eids` in OpenRTB | +| `hashed_email` | `liveintent.com` or buyer-specific | `atype: 3` | +| `publisher_first_party` | publisher-defined `source` URL | | +| `other` | buyer-defined `source` URL | | + +The TMP `user_token` field corresponds to `User.eids[].uids[].id`. AdCP carries up to 3 identities per Identity Match request (HPKE size budget — see [TMPX size budget](/docs/trusted-match/specification#size-budget)); OpenRTB has no such limit, so a buyer bridging from OpenRTB into TMP must apply a buyer-configured priority order to truncate (typically: deterministic graphs first — UID2, RampID — then probabilistic or publisher-scoped IDs). diff --git a/docs/trusted-match/privacy-architecture.mdx b/docs/trusted-match/privacy-architecture.mdx index 1a606f5baa..c34290abb0 100644 --- a/docs/trusted-match/privacy-architecture.mdx +++ b/docs/trusted-match/privacy-architecture.mdx @@ -178,3 +178,5 @@ TMP's structural separation aligns with the data minimization principle required - With TEE attestation, the separation is independently verifiable, providing auditable evidence for regulators. This is an architectural observation, not legal advice. Publishers and buyer agents should consult their own legal counsel regarding regulatory compliance. + +For a detailed analysis of how TMP's architecture maps to GDPR controller and processor roles, see [Data Protection Roles](/docs/trusted-match/data-protection-roles). For cross-protocol privacy guidance, see [Privacy Considerations](/docs/reference/privacy-considerations). diff --git a/docs/trusted-match/router-architecture.mdx b/docs/trusted-match/router-architecture.mdx index 3c8c3722cd..c054a9e78f 100644 --- a/docs/trusted-match/router-architecture.mdx +++ b/docs/trusted-match/router-architecture.mdx @@ -156,7 +156,7 @@ The router filters Identity Match providers by country and identity type: 7. Because the per-provider payload differs from the inbound request, the router **re-signs** each per-provider forward using the canonical `identities_hash` of the filtered set. Providers verify signatures against the router's public key. 8. It fans out to all matching providers in parallel, merges eligibility results, and returns a unified response. -Duplicate `package_id` across providers is a configuration error — packages come from media buys and are provider-specific. If it occurs, the router applies conservative merging: the package is only eligible if it appears in `eligible_package_ids` from both providers. The router uses the minimum `ttl_sec` across providers and SHOULD log a warning. +Duplicate `package_id` across providers is a configuration error — packages come from media buys and are provider-specific. If it occurs, the router applies conservative merging: the package is only eligible if it appears in `eligible_package_ids` from both providers. The router uses the minimum `serve_window_sec` across providers and SHOULD log a warning. ### Timeout handling diff --git a/docs/trusted-match/specification.mdx b/docs/trusted-match/specification.mdx index 472e633ab3..138b0c1837 100644 --- a/docs/trusted-match/specification.mdx +++ b/docs/trusted-match/specification.mdx @@ -7,7 +7,7 @@ description: Authoritative message type definitions, field tables, privacy requi # Trusted Match Protocol Specification -**Experimental.** The Trusted Match Protocol is part of AdCP 3.0 as an experimental surface — it may change between 3.x releases with at least 6 weeks' notice. Sellers implementing TMP MUST declare `trusted_match.core` in `experimental_features`. See [experimental status](/docs/reference/experimental-status) for the full contract. +**Experimental.** The Trusted Match Protocol is part of AdCP 3.0 as an experimental surface — it may change between 3.x releases with at least 6 weeks' notice. Sellers implementing TMP MUST declare `trusted_match.core` in `experimental_features`. See [experimental status](/docs/reference/experimental-status) for the full contract. Fields on this surface are not subject to deprecation cycles until 3.0.0 GA. This is the authoritative reference for the Trusted Match Protocol (TMP). For conceptual introductions, see the [overview](/docs/trusted-match/) and [core concepts](/docs/trusted-match/context-and-identity). @@ -24,7 +24,7 @@ Specific areas expected to evolve include TMPX exposure tokens, country-partitio | **Offer** | A buyer's response to a context match request. Ranges from simple activation (package_id only) to rich proposals with brand, price, summary, and creative manifest. | | **Available package** | A package from an active media buy that is eligible for evaluation on a given placement. Package metadata — including the originating seller agent — is synced at media buy time. See [Package Sync](#package-sync). | | **Seller agent** | The buyer-side agent that sold the package into a publisher. Identified by the agent URL declared in the publisher's `adagents.json` `authorized_agents[].url`. Every `AvailablePackage` is bound to exactly one seller agent at sync time. | -| **Eligibility** | List of eligible package IDs returned by Identity Match, plus a TTL caching contract. The buyer computes eligibility from frequency caps, audience membership, and other signals; the reasons are opaque to the publisher. | +| **Eligibility** | List of eligible package IDs returned by Identity Match, plus a serve-window throttle. The buyer computes eligibility from frequency caps, audience membership, and other signals; the reasons are opaque to the publisher. | | **Artifact** | A typed content reference associated with a publisher property (article URL, episode EIDR, show Gracenote ID, music ISRC, product GTIN, conversation turn). Each artifact has a `type` and `value`. Referenced in context match requests. | | **Temporal decorrelation** | Random delay and random ordering between Context Match and Identity Match requests, preventing timing- and order-based correlation. | @@ -196,19 +196,34 @@ Each entry in `identities` is an `{user_token, uid_type}` pair: ### IdentityMatchResponse -Returned by the buyer agent. A list of eligible package IDs with a caching TTL. +Returned by the buyer agent. A list of eligible package IDs with a serve-window throttle. | Field | Type | Required | Description | |---|---|---|---| | `type` | string | Yes | `"identity_match_response"`. Message type discriminator for deserialization. | | `request_id` | string | Yes | Echo of the request's `request_id`. | | `eligible_package_ids` | List\ | Yes | Package IDs the user is eligible for. Packages not listed are ineligible. | -| `ttl_sec` | integer | Yes | How long the router should cache this response, in seconds. A value of `0` means do not cache — re-query on every request. | +| `serve_window_sec` | integer | Yes | Per-package single-shot fcap window, in seconds. Range: 1–300. Default: 60. After serving the user one impression on each eligible package within this window, the publisher MUST re-query Identity Match before serving from those packages again. This is **not** a router response cache TTL — it is a buyer-asserted serve throttle. Multi-impression frequency caps are handled separately by the buyer's impression tracker, which writes cap-fire events to the IdentityMatch cap-state store at the boundary regardless of this window — see [Frequency-Cap Data Flow](/docs/trusted-match/identity-match-implementation). | | `tmpx` | string | No | HPKE-encrypted exposure token containing resolved user identity tokens. The publisher substitutes this into creative tracking URLs as `{TMPX}`. The buyer's impression pixel receives the token, enabling real-time per-user frequency state updates. Wire format: `kid.base64url_nopad(ciphertext)` (unpadded, no `=` characters). Publishers MUST treat this value as opaque pass-through data. | -The response includes eligible package IDs, a TTL, and an optional `tmpx` field. The TMPX token is an HPKE-encrypted exposure token that flows through creative tracking URLs to the buyer's impression pixel, enabling real-time per-user frequency state updates without exposing user identity to the publisher. The buyer computes eligibility from whatever identity signals they have (frequency caps, audience membership, purchase history) and returns only the packages that pass. The publisher does not need to know why a package was excluded — just which packages are eligible. +The response includes eligible package IDs, a serve-window throttle, and an optional `tmpx` field. The TMPX token is an HPKE-encrypted exposure token that flows through creative tracking URLs to the buyer's impression pixel, enabling real-time per-user frequency state updates without exposing user identity to the publisher. The buyer computes eligibility from whatever identity signals they have (frequency caps, audience membership, purchase history) and returns only the packages that pass. The publisher does not need to know why a package was excluded — just which packages are eligible. -The `ttl_sec` field is a caching contract. The buyer is saying: "Cache this for N seconds." The router caches the `eligible_package_ids` list and returns it for subsequent requests during the window — it does not track which packages have been served. The publisher enforces allocation rules (at most one ad per package, competitive separation, pod composition) using the cached eligibility as input. This eliminates the need for pod-specific or batch-specific protocol semantics — the router has cached eligibility and the publisher allocates across whatever placements exist during the TTL window (a CTV ad pod, a web page with 20 slots, a single pre-roll). The buyer doesn't need to know the allocation details. +The `serve_window_sec` field is a **per-package single-shot fcap**, not a router cache TTL. The buyer is saying: "After you serve the user one impression on each eligible package, re-query me before serving from those packages again." The router MAY still cache the response for an internal deduplication/cost-saving window, but the binding contract on the publisher side is "one impression per eligible package per window." Multi-impression frequency caps (5 per day per campaign, 100 per month per advertiser, etc.) live in the buyer's impression tracker and surface to the IdentityMatch service as cap-fire events at the boundary regardless of `serve_window_sec`. + +The publisher enforces allocation rules (competitive separation, pod composition) using the eligibility list as input. This eliminates the need for pod-specific or batch-specific protocol semantics — the publisher allocates across whatever placements exist during the serve window (a CTV ad pod, a web page with 20 slots, a single pre-roll), honoring the one-impression-per-package contract. + +#### Conformance invariants for IdentityMatch eligibility + +A conformant IdentityMatch service MUST compute `eligible_package_ids` such that, for each `package_id ∈ request.package_ids`, the package is included in `eligible_package_ids` if and only if **all** of the following hold: + +1. **Audience eligibility.** Either the package has no audience requirement, OR there exists at least one audience identifier `a` such that `a` is in the package's required audience set AND `a` is in the audience-membership of at least one identity `i ∈ request.identities` (the union across the user's resolved identities intersects the package's required audiences). +2. **Frequency cap eligibility.** No `(identity, package)` cap-state entry exists for any identity `i ∈ request.identities` against the package. Cap-state entries are written by the buyer's impression tracker when it determines an impression has exhausted a cap and carry an expiration timestamp; an entry is "present" until that timestamp. The protocol does not constrain how the impression tracker counts impressions, evaluates windows, or decides when a cap fires — only the boundary contract (cap-fire entries flow into the cap-state store; the IdentityMatch service checks presence at query time). See [Frequency-Cap Data Flow](/docs/trusted-match/identity-match-implementation) for the boundary contract. +3. **Active state.** Packages or policies marked inactive MUST be treated as if absent. +4. **Audience freshness.** If the buyer's audience pipeline publishes a freshness deadline and the current time is past it, that audience-membership entry MUST NOT contribute to (1). + +The TMPX returned with the response MUST encode the resolved identities so the out-of-band impression tracker can update fcap policy state and signal cap-fire events to the IdentityMatch cap-state store — see § TMPX tokens and [Frequency-Cap Data Flow](/docs/trusted-match/identity-match-implementation). + +Storage backend (valkey, Aerospike, DynamoDB, in-memory, anything) is implementation. Two services with different storage backends that satisfy these invariants for the same inputs MUST return the same eligibility output. #### Consent @@ -599,9 +614,9 @@ The 8-byte random nonce enables deduplication at the master. The master stores n ### Caching behavior -The TMPX token is generated once per Identity Match evaluation and cached alongside the eligibility response for `ttl_sec` seconds. All impressions within the TTL window share the same TMPX value (same nonce, same tokens). +The TMPX token is generated once per Identity Match evaluation and accompanies the eligibility response for the `serve_window_sec` window. All impressions on eligible packages within that window share the same TMPX value (same nonce, same tokens). -The buyer's master MUST NOT deduplicate by TMPX value or nonce within a TTL window — each pixel fire is one impression. Multiple ads served to the same user in a CTV pod or a web page with multiple ad units all produce distinct pixel fires with the same TMPX token. The nonce deduplication only prevents replay of the same TMPX token *after* the TTL window expires — if the same nonce appears outside its original TTL window, it is a replay and MUST be rejected. +The buyer's master MUST NOT deduplicate by TMPX value or nonce within a serve window — each pixel fire is one impression. Multiple ads served to the same user in a CTV pod or a web page with multiple ad units all produce distinct pixel fires with the same TMPX token. The nonce deduplication only prevents replay of the same TMPX token *after* the serve window expires — if the same nonce appears outside its original window, it is a replay and MUST be rejected. ### Publisher obligations @@ -648,9 +663,9 @@ Context Match responses are cacheable because the same packages are evaluated fo - Routers SHOULD cache Context Match responses with a TTL of **5 minutes**. - Providers MAY include a `cache_ttl` field (integer, seconds) in Context Match responses to override the default. Routers MUST respect this value when present. -- Identity Match responses are cached per the `ttl_sec` value in the response. Cache key: `{identities_hash, provider_id, package_ids_hash, consent_hash}`, where `identities_hash` is the SHA-256 of the canonical `identities` bytes defined in [Identity Match signed fields](#identity-match-signed-fields) (computed over the per-provider filtered subset); `package_ids_hash` is SHA-256 over the JCS serialization of the sorted `package_ids` array; `consent_hash` is SHA-256 over the JCS serialization of the request's `consent` object (or JCS `null` when the field is absent — this distinguishes "consent unknown" from an explicit-empty consent object). JCS framing prevents delimiter-injection: raw consent strings or package IDs containing `|`, `,`, or `\n` cannot collide two distinct inputs. Including the identity set ensures that adding or removing tokens produces a distinct cache entry. Including the package list hash ensures cached responses are invalidated when the active package set changes (e.g., a new media buy activates). Including the consent hash prevents eligibility decisions taken under one consent state from being served under another. -- When a provider's targeting configuration changes (new packages, updated targeting rules), the provider SHOULD return `"cache_ttl": 0` until the change has propagated, then resume normal caching. -- Both `ttl_sec` and `cache_ttl` have a schema-enforced maximum of 86400 seconds (24 hours). Routers SHOULD clamp buyer-provided values to a configured maximum (recommended: 3600 seconds) to limit the blast radius of stale caches. +- Identity Match responses are bound by `serve_window_sec` (per-package single-shot fcap, max 300s, default 60s). Routers MAY apply an internal deduplication cache keyed on `{identities_hash, provider_id, package_ids_hash, consent_hash}`, where `identities_hash` is the SHA-256 of the canonical `identities` bytes defined in [Identity Match signed fields](#identity-match-signed-fields) (computed over the per-provider filtered subset); `package_ids_hash` is SHA-256 over the JCS serialization of the sorted `package_ids` array; `consent_hash` is SHA-256 over the JCS serialization of the request's `consent` object (or JCS `null` when the field is absent — this distinguishes "consent unknown" from an explicit-empty consent object). JCS framing prevents delimiter-injection: raw consent strings or package IDs containing `|`, `,`, or `\n` cannot collide two distinct inputs. Including the identity set ensures that adding or removing tokens produces a distinct cache entry. Including the package list hash ensures cached responses are invalidated when the active package set changes (e.g., a new media buy activates). Including the consent hash prevents eligibility decisions taken under one consent state from being served under another. The publisher's binding contract is the serve-window throttle, not the router's internal cache window. +- When a provider's targeting configuration changes (new packages, updated targeting rules), the provider SHOULD return `"cache_ttl": 0` (Context Match) or `"serve_window_sec": 1` (Identity Match) until the change has propagated, then resume normal values. +- `cache_ttl` (Context Match) has a schema-enforced maximum of 86400 seconds. `serve_window_sec` is bounded at 300 seconds — longer windows make per-package fcap too coarse for typical campaigns, shorter than the IdentityMatch round-trip wastes the throttle. ## Conformance Levels diff --git a/docs/trusted-match/surfaces/ai-assistants.mdx b/docs/trusted-match/surfaces/ai-assistants.mdx index 4e9d6950e8..018c1c2b7c 100644 --- a/docs/trusted-match/surfaces/ai-assistants.mdx +++ b/docs/trusted-match/surfaces/ai-assistants.mdx @@ -113,7 +113,7 @@ The buyer responds with the IDs of eligible packages and a TTL. The buyer comput "pkg-outdoor-gear", "pkg-seasonal-sale" ], - "ttl_sec": 120 + "serve_window_sec": 120 } ``` @@ -151,7 +151,7 @@ User message: "What are the best sneakers for spring?" → Router returns merged response → (300ms later) Platform sends Identity Match with ALL buyer's active packages - → Response: eligible_package_ids includes pkg-sneaker-reco, ttl_sec: 120 + → Response: eligible_package_ids includes pkg-sneaker-reco, serve_window_sec: 120 → Router caches eligibility → Platform joins: pkg-sneaker-reco offer is eligible diff --git a/docs/trusted-match/surfaces/ctv.mdx b/docs/trusted-match/surfaces/ctv.mdx index 8fec75bf15..8c27df9d6a 100644 --- a/docs/trusted-match/surfaces/ctv.mdx +++ b/docs/trusted-match/surfaces/ctv.mdx @@ -134,11 +134,11 @@ Each buyer agent evaluates the household token against its own data (frequency c "pkg-vaultline-audio", "pkg-driftmoto-30s" ], - "ttl_sec": 90 + "serve_window_sec": 90 } ``` -The response covers all packages, not just CTV ones. The `ttl_sec: 90` covers the duration of the ad break — the router uses cached eligibility to fill all pod slots without re-querying. The publisher extracts only the package IDs relevant to the current pod. +The response covers all packages, not just CTV ones. The `serve_window_sec: 90` covers the duration of the ad break — the router uses cached eligibility to fill all pod slots without re-querying. The publisher extracts only the package IDs relevant to the current pod. ## Pod Composition @@ -198,7 +198,7 @@ Mid-roll break in "The Night Kitchen" S02E07 Identity Match (after temporal decorrelation — random delay + random order) --> Broadcaster sends request: all 9 active packages across all buyers --> Response: eligible_package_ids includes Sparklean, Greenleaf, Vaultline, Driftmoto-30s - --> ttl_sec: 90 (covers the ad break) + --> serve_window_sec: 90 (covers the ad break) Pod Assembly (broadcaster's ad server) --> Join: Sparklean and Greenleaf both activated and eligible diff --git a/docs/trusted-match/surfaces/mobile.mdx b/docs/trusted-match/surfaces/mobile.mdx index c5870d3e6e..841a817328 100644 --- a/docs/trusted-match/surfaces/mobile.mdx +++ b/docs/trusted-match/surfaces/mobile.mdx @@ -194,13 +194,13 @@ Seven package IDs — the example uses all-active mode (every active package for "pkg-sports-banner-05", "pkg-sports-rewarded-07" ], - "ttl_sec": 60 + "serve_window_sec": 60 } ``` Only eligible packages are listed. The buyer computes eligibility from frequency caps, audience membership, purchase history, and any other identity-based signals. The reasons are opaque to the publisher. The publisher does not learn why `pkg-telecom-inter-03` is ineligible — just that it is absent from the list. -The `ttl_sec` tells the router how long to cache this response. During the TTL window, the router uses cached eligibility to fill interstitials, banners, and rewarded ads without re-querying the buyer. +The `serve_window_sec` tells the router how long to cache this response. During the TTL window, the router uses cached eligibility to fill interstitials, banners, and rewarded ads without re-querying the buyer. ## Joining and Activation diff --git a/docs/trusted-match/surfaces/retail-media.mdx b/docs/trusted-match/surfaces/retail-media.mdx index 0e088c1032..4bee694a7e 100644 --- a/docs/trusted-match/surfaces/retail-media.mdx +++ b/docs/trusted-match/surfaces/retail-media.mdx @@ -99,7 +99,7 @@ The buyer responds with the IDs of eligible packages and a TTL: "pkg-bakery-seasonal", "pkg-frozen-meals" ], - "ttl_sec": 60 + "serve_window_sec": 60 } ``` @@ -122,7 +122,7 @@ Shopper searches "cold brew" → Buyer: offer with creative manifest (cold brew + iced latte items, promo banner, badges) → (fuzzed) Retailer sends Identity Match: loyalty token + all active package IDs - → Buyer: eligible_package_ids includes pkg-coffee-sponsored, ttl_sec: 60 + → Buyer: eligible_package_ids includes pkg-coffee-sponsored, serve_window_sec: 60 → Retailer joins: accept offer, render items from creative manifest → Render sponsored carousel in search results diff --git a/docs/trusted-match/surfaces/web.mdx b/docs/trusted-match/surfaces/web.mdx index f7bcd38f0c..45717d78de 100644 --- a/docs/trusted-match/surfaces/web.mdx +++ b/docs/trusted-match/surfaces/web.mdx @@ -119,14 +119,14 @@ The buyer evaluates the user against all requested packages and returns the IDs "pkg-native-0079", "pkg-display-0104" ], - "ttl_sec": 60 + "serve_window_sec": 60 } ``` Key points: - Only eligible packages are listed. Packages absent from the list (e.g., `pkg-display-0043`, `pkg-display-0103`, `pkg-video-0201`) are ineligible. The buyer computes eligibility from frequency caps, audience membership, purchase history, and any other identity-based signals. The reasons are opaque to the publisher. -- `ttl_sec` tells the router how long to cache this response. During that window, the router returns cached eligibility without re-querying the buyer. The publisher uses cached eligibility to allocate across all placements on the page. +- `serve_window_sec` tells the router how long to cache this response. During that window, the router returns cached eligibility without re-querying the buyer. The publisher uses cached eligibility to allocate across all placements on the page. - There is no `frequency_capped`, `audience_match`, or `recency` field. The buyer's internal reasons stay with the buyer. ## Activation: Joining Context and Identity diff --git a/specs/identitymatch-fcap-architecture.md b/specs/identitymatch-fcap-architecture.md new file mode 100644 index 0000000000..b73c84a93b --- /dev/null +++ b/specs/identitymatch-fcap-architecture.md @@ -0,0 +1,143 @@ +# IdentityMatch & Frequency Capping — Architecture Spec + +**Status**: landed (architecture decisions). +**Target release**: 3.0.1 (additive wire change). + +This spec captures the architecture decisions behind the buyer-side IdentityMatch surface in TMP. It is a **design-history document**, not an implementation reference — the authoritative spec lives in: + +- [`docs/trusted-match/specification.mdx`](../docs/trusted-match/specification.mdx) — wire spec (normative): `serve_window_sec` field, conformance invariants for IdentityMatch eligibility, TMPX binary format. +- [`docs/trusted-match/identity-match-implementation.mdx`](../docs/trusted-match/identity-match-implementation.mdx) — frequency-cap data flow (boundary contract): the cap-fire event the impression tracker writes into the IdentityMatch cap-state store, and how the IdentityMatch service consumes it at query time. Internal counting / policy / storage layout are buyer-internal and out of scope. +- [`docs/trusted-match/buyer-guide.mdx`](../docs/trusted-match/buyer-guide.mdx) — buyer-agent integration walkthrough; updated for `serve_window_sec` semantic. +- [`docs/trusted-match/migration-from-axe.mdx`](../docs/trusted-match/migration-from-axe.mdx) — adds OpenRTB 2.6 `User.eids` cross-walk for buyers bridging from OpenRTB-shaped pipelines. + +Read this doc when you want to understand **why** the design landed where it did. Read the docs above when you want to **implement** against it. + +## Problem + +The TMP IdentityMatch wire spec defines what flows on the wire: identity tokens in, eligible package IDs and an HPKE-encrypted exposure token (`tmpx`) out. It did not previously define: + +1. **Where fcap policy and counting live.** Originally implied to be inside the IdentityMatch service. Settled here as buyer-internal in the impression tracker; the IdentityMatch service consumes only cap-fire events at the boundary. +2. **Boundary contract between impression tracker and IdentityMatch service** — what events flow from the impression-tracking pipeline into the IdentityMatch cap-state store. +3. **Audience freshness vs. response throttle** — `ttl_sec` was documented as a router cache TTL but operationally functioned as a per-package serve throttle, conflating two distinct concerns. +4. **Conformance** — how a third party validates that an IdentityMatch implementation is correct. + +Without these decisions, the open-source IdentityMatch reference impl risked shipping with Go-shaped assumptions baked into wire-adjacent surfaces, or with policy logic baked into the service that should sit in the buyer's impression-tracking pipeline. + +## Architectural decisions + +### 1. Three layers, with explicit normative status + +| Layer | Status | What it covers | +|---|---|---| +| **Wire spec** | Normative | HTTP JSON, `serve_window_sec` semantic, TMPX binary format. Anything crossing an agent boundary. | +| **Conformance invariants** | Normative | The eligibility logic an IdentityMatch service MUST compute, expressed in terms of inputs (identities, packages, audiences, cap-state) and outputs (eligible_package_ids). Storage-agnostic. | +| **Boundary contract for cap-fire events** | Normative for the cap-state store API | What events flow from the impression tracker into the IdentityMatch cap-state store, and what state IdentityMatch consumes at query time. The store interface (e.g. `RecordCap` / `IsCapped` in `adcp-go/targeting/fcap`) is the reference shape. Storage backend is implementer choice. | + +The protocol describes **what** the service must compute and **what** events flow into it, not how the impression tracker counts impressions or where its policy state lives. + +### 2. Counting and policy live in the impression tracker, not in IdentityMatch + +The IdentityMatch service does not count impressions. It does not own fcap policies. It does not evaluate windows. Those concerns live entirely in the buyer's impression-tracking pipeline, where they vary across buyers and across campaigns. + +The IdentityMatch service maintains a narrow **cap-state store** keyed at `(user_identity, seller_agent_url, package_id)` with a TTL-bound expiration. The impression tracker writes a cap-fire entry on the impression that exhausts a cap; the IdentityMatch service checks presence at query time and excludes the package from `eligible_package_ids` while the entry is live. + +This split keeps the IdentityMatch service narrow and makes new cap dimensions (advertiser, campaign, creative, line item — see [Future extensions](#future-extensions)) extensions of the boundary contract rather than rewrites of the service. Earlier iterations of this design proposed an exposure-log model inside the IdentityMatch service, with cross-identity dedup via `impression_id`, label-model fcap keys, and the IdentityMatch service evaluating windows at read time. That design was unwound — counting, dedup, and policy evaluation all depend on buyer-internal concerns the protocol shouldn't constrain. The reference store in [`adcp-go/targeting/fcap`](https://github.com/adcontextprotocol/adcp-go/tree/main/targeting/fcap) implements the simpler boundary contract. + +### 3. Cross-identity dedup is a buyer-internal concern + +A single impression resolved to multiple identity tokens may produce multiple cap-fire entries — one per `(identity, package)` pair the cap fired on — but how the impression tracker decides "this is one impression vs. three" is buyer-internal. Buyers running their own identity graph can canonicalize before counting; buyers that don't get whatever counting their impression tracker is configured to do. The protocol does not require an `impression_id` and does not constrain dedup logic. + +### 4. `serve_window_sec` replaces `ttl_sec` + +The original `ttl_sec` field was documented as a router cache TTL but operationally functioned as a per-package single-shot fcap. Two distinct concerns sharing one knob meant tuning for cost (long cache) silently broke fcap, and tuning for fcap (short cache) wasted IdentityMatch round-trips. + +Replacement: `serve_window_sec` (1–300, default 60) with the corrected semantic — *after serving the user one impression on each eligible package within this window, the publisher MUST re-query Identity Match before serving from those packages again.* + +`ttl_sec` is removed. No deprecation window: TMP is pre-launch (experimental, pre-3.0.0 GA) and not subject to deprecation cycles. The field is not present in the 3.0.1 schema. + +### 5. Cap-fire events as the impression-handling primitive + +The impression tracker decodes TMPX, applies the buyer's policy logic, and (when a cap fires) writes a cap-fire entry to the IdentityMatch cap-state store. The cap-state store API ([`adcp-go/targeting/fcap`](https://github.com/adcontextprotocol/adcp-go/tree/main/targeting/fcap)) exposes: + +``` +RecordCap(ctx, userIdentity, fields[]Field, expireAt) // write cap-fire +IsCapped(ctx, userIdentity, field Field) (bool) // query cap-state +``` + +— plus batch variants. `Field` is `{SellerAgentURL, PackageID}`. Production deployments separate decode (synchronous, at intake) from policy evaluation and cap-state writes (asynchronous, behind a queue) for buffering — bundling would force synchronous topology and break the pattern. + +### 6. TMP IdentityMatch service is a downstream consumer of cap-state + +The IdentityMatch service reads cap-state on each `/identity` call. Writes come from the impression tracker (or a downstream service in its pipeline) on cap-fire. No new wire endpoints for impressions or policies. The IdentityMatch service stays narrow. + +### 7. Policy updates trigger cap-state re-evaluation at the buyer + +Cap-state entries are written under whatever fcap policy was in force at cap-fire time. When policies change (window length, `max_count`, activation, package reassignment), the buyer's policy owner MUST re-evaluate every affected `(user_identity, package)` entry against the new policy and push delete-or-extend events to the cap-state store. The cap-state store carries no counts and can't re-evaluate on its own — the buyer's counting state is the source of truth. The protocol does not constrain re-evaluation cadence; only that cap-state must converge to what the current policies imply. See [docs/trusted-match/identity-match-implementation.mdx § Policy updates and cap-state re-evaluation](../docs/trusted-match/identity-match-implementation.mdx#policy-updates-and-cap-state-re-evaluation) for the event shapes. + +### 8. `sync_audiences` is the audience on-ramp + +The existing wire `sync_audiences` task has `add[]`/`remove[]` deltas of audience-member objects — exactly the CRUD shape the IdentityMatch backend needs for the audience side of eligibility. No schema extension required. + +## Future extensions + +Today the cap-state store is keyed at `(user_identity, seller_agent_url, package_id)`. Future protocol versions may extend the field to additional dimensions — advertiser, campaign, creative, line item — so a buyer can express caps that span multiple packages without writing N entries on every cap-fire. The boundary contract is unchanged by such extensions: the impression tracker writes cap-fire entries; the IdentityMatch service checks presence at query time. + +## Open questions + +1. **Cap-state extensions for advertiser/campaign/creative.** v1 keys at `(user_identity, seller_agent_url, package_id)`. Extending to broader cap dimensions without forcing the impression tracker to write N entries on each cap-fire is a follow-up workstream. +2. **Explicit delete primitive on the cap-state store.** The reference impl exposes `RecordCap` (write/extend) and `IsCapped` (presence) but no explicit delete. Re-evaluation today expresses "delete" as "extend with an `expire_at` already in the past." A first-class `DeleteCap` operation is a candidate primitive, especially as policy-change re-evaluation becomes a hot path. +3. **Identity-graph plug-point.** Whether the impression tracker canonicalizes identities before writing cap-state, or writes per-resolved-identity, is buyer-internal. The protocol does not require the IdentityMatch service to know about identity graphs. +4. **Audience strength scores.** Per-segment scores are an open extension on the audience side of eligibility, separate from cap-state. +5. **Production-deployment perf benchmarks.** Cap-state lookups are hash-field presence checks (HEXISTS), but real-world latency depends on backend choice, network co-location, and cluster sharding under load. Tracked as a rollout-plan deliverable. + +## Deferred security & privacy issues (follow-up) + +These came out of pre-merge review. Each warrants a focused follow-up rather than blocking this design landing. + +1. **TMPX harvest → competitor-suppression attack.** TMPX in publisher creative URLs is harvestable. Without per-impression binding (creative_id, slot_id, ts) inside the AEAD AAD, an attacker fires harvested tokens at the buyer's impression endpoint to drive cap-fire signals and starve a target user out of a campaign. Mitigation: bind TMPX to per-impression context, or rate-limit-per-token at the impression handler. +2. **Eligibility-as-audience-membership oracle.** A malicious publisher submits honeypot `package_ids` and observes which return eligible to reconstruct the user's audience profile. The "publishers don't see audience records" privacy claim is wire-correct but functionally false. Mitigation: package-ownership check at IdentityMatch ingress, or k-anonymity floor on eligibility responses. +3. **Consent revocation between IdentityMatch and impression.** TMPX has no consent fingerprint; if consent is revoked during the serve window, the impression tracker may still process the exposure. GDPR/TCF problem. +4. **Side-channel via eligibility deltas.** A router observing two responses for the same user 30s apart sees `eligible_package_ids` shrink as caps trip — fingerprinting fcap state per-user. +5. **`hashed_email` in TMPX widens identity-leak surface.** Putting unsalted SHA-256 email inside a creative URL macro re-identifies on token leak. Either prohibit `hashed_email` in TMPX plaintext or require salting. +6. **DoS amplification via large `package_ids[]`.** Per-IdentityMatch cap-state reads scale `O(|identities| × |candidate_packages|)` — at 25k packages from a busy publisher, this is an amplification primitive. Cap candidate_packages at IdentityMatch ingress. +7. **Rollout work plan ownership gaps.** No named owner for the eligibility-evaluator hot path, observability/SLO, key-rotation drill, or load testing. Address before SDK ships. + +## Rollout plan + +### What this PR landed + +- Wire spec change (additive): `serve_window_sec` field on `identity-match-response.json`. `ttl_sec` removed (pre-launch, no deprecation cycle needed). +- Doc updates to `docs/trusted-match/specification.mdx`, `buyer-guide.mdx`, `migration-from-axe.mdx`. +- New page: `docs/trusted-match/identity-match-implementation.mdx` — frequency-cap data flow (boundary contract). +- This architecture-rationale doc. + +### Next workstreams (not in this PR) + +1. **`adcp-go/targeting/fcap` cap-state store** — landed upstream as the reference cap-state store backed by Valkey 9 hashes (`fcap:{hash}` keys, one HSETEX field per `(seller_agent_url, package_id)`). +2. **`@adcp/client` (TS) and `adcp` (Python) parity** — same `RecordCap` / `IsCapped` boundary in TS and Python. +3. **`adcp-go/identitymatch` reference TMP server** — open-source read path for `POST /identity` over the cap-state store. +4. **Scope3 hosted IdentityMatch** — public deployment for buyers who don't want to host their own service. +5. **Training agent integration** — hosts both AdCP MCP/A2A and TMP `/identity` surfaces, sharing the cap-state store internally. End-to-end IdentityMatch demo. +6. **Conformance harness** — runner script that seeds cap-state directly, runs `/identity` queries against the TMP server, and asserts eligibility responses. Lives as integration tests inside `adcp-go` and `@adcp/client`. +7. **TMP graduation (target: 3.1.0)** — TMP enters `supported_protocols` (currently in `experimental_features` as `trusted_match.core`). At that point AdCP storyboards can wrap the harness if cross-protocol integration testing becomes useful. + +## Threads consolidated from Slack 2026-04-26 + +- **Thread 1 (exposure struct location):** resolved by the three-layer model. Cross-language interop is at the cap-state store API level (`RecordCap` / `IsCapped`); no proto, no JSON Schema for buyer-internal records. TMPX wire format stays as published in `docs/trusted-match/specification.mdx`. +- **Thread 2 (campaign isn't AdCP):** resolved — cap dimensions live in the impression tracker, not in the wire protocol. v1 cap-state keys at `(user_identity, seller_agent_url, package_id)`. Seller agent + package_id remains the seller-side identifier per `core/seller-agent-ref.json`. +- **Thread 3 (campaign logic in IdentityMatch):** resolved — counting and policy live in the impression tracker; IdentityMatch consumes cap-fire events at the boundary. +- **Thread 4 (campaign sync via Cerberus):** resolved — cap-fire events are written directly to the cap-state store from the impression tracker; no Cerberus. + +## Threads consolidated from Slack 2026-04-30 (impression handling) + +Per discussion with @bhuo (Scope3 impression-tracker owner) and Brian: + +- Production deployments separate decode at intake (synchronous) from policy evaluation and cap-state writes (asynchronous, behind a queue) for buffering. The cap-state store API exposes the write-side primitive (`RecordCap`); the impression tracker decides when to call it. +- "JS for writers, Go for reader" framing was wrong — Brian's "JS" was shorthand for "the language the impression tracker runs in," currently Go at Scope3. Spec/SDK is language-neutral; the cap-state API ships in `adcp-go`, with TS and Python parity tracked as a follow-up. +- Pub/sub buffering, retries, dedup, observability, abuse protection are deployment concerns, not protocol concerns. The cap-state store ships the boundary primitives; topology is the implementer's choice. + +## Threads consolidated from PR #3359 review + +- **@oleksandr's normative/reference layering question:** the original spec called the buyer-side valkey schema "normative" while leaving an open question for a pluggable FrequencyStore interface. Inconsistent. Resolved by the three-layer model — wire spec + conformance invariants are normative; cap-state store interface is the boundary contract; storage backend is implementer choice. +- **Counter-vs-log debate (Brian):** earlier iterations explored a counter-based exposure model and a log-based exposure-log model with `impression_id` dedup, both inside the IdentityMatch service. Both unwound — counting and dedup are buyer-internal concerns the protocol shouldn't constrain. The IdentityMatch service consumes cap-fire events; whatever counting the impression tracker does to decide "this is the cap-firing impression" is up to the buyer. +- **Cap dimensions:** earlier iterations debated how the protocol should express advertiser/campaign/creative caps (label model, hierarchy, etc.). Resolved — the protocol does not enumerate cap dimensions at all. The cap-state store v1 keys at `(user_identity, seller_agent_url, package_id)`; broader-dimension caps are a follow-up extension to the boundary contract. diff --git a/static/schemas/source/index.json b/static/schemas/source/index.json index 3bcd8c6bd0..a967605f2a 100644 --- a/static/schemas/source/index.json +++ b/static/schemas/source/index.json @@ -1555,7 +1555,8 @@ "description": "Per-package eligibility — boolean eligible plus optional intent score" } } - } + }, + "implementation-guidance": "Conformance invariants and the boundary contract between the impression tracker and the IdentityMatch cap-state store are documented in specs/identitymatch-fcap-architecture.md and docs/trusted-match/identity-match-implementation.mdx. Storage backend is an implementation choice; conformant services may use any store that satisfies the invariants." }, "brand-protocol": { "description": "Brand protocol for identity retrieval, rights discovery, acquisition, and lifecycle management", diff --git a/static/schemas/source/tmp/identity-match-response.json b/static/schemas/source/tmp/identity-match-response.json index 39e83c6946..244b259482 100644 --- a/static/schemas/source/tmp/identity-match-response.json +++ b/static/schemas/source/tmp/identity-match-response.json @@ -2,7 +2,7 @@ "$schema": "http://json-schema.org/draft-07/schema#", "$id": "/schemas/tmp/identity-match-response.json", "title": "Identity Match Response", - "description": "Response indicating which packages the user is eligible for. The ttl_sec field defines a caching contract: the router caches this response and returns cached eligibility without re-querying the buyer during the TTL window. Extension fields (ext, context) are intentionally omitted to prevent data leakage across the identity privacy boundary.", + "description": "Response indicating which packages the user is eligible for. The serve_window_sec field defines a per-package single-shot fcap: after serving the user one impression on each eligible package, the publisher MUST re-query Identity Match before serving from those packages again. Extension fields (ext, context) are intentionally omitted to prevent data leakage across the identity privacy boundary.", "x-status": "experimental", "type": "object", "properties": { @@ -22,11 +22,11 @@ "type": "string" } }, - "ttl_sec": { + "serve_window_sec": { "type": "integer", - "description": "How long the router should cache this response, in seconds. The router returns cached eligibility without re-querying the buyer during this window. A value of 0 means do not cache.", - "minimum": 0, - "maximum": 86400 + "description": "Per-package single-shot fcap window, in seconds. After serving the user one impression on each eligible package within this window, the publisher MUST re-query Identity Match before serving from those packages again. This is NOT a router response cache TTL — it is a buyer-asserted serve throttle. Multi-impression frequency caps are handled separately by the buyer's impression tracker, which writes cap-fire events to the IdentityMatch cap-state store at the boundary regardless of this window. Maximum 300 — longer windows reduce IdentityMatch load but coarsen fcap granularity below what most campaigns require.", + "minimum": 1, + "maximum": 300 }, "tmpx": { "type": "string", @@ -37,7 +37,7 @@ "type", "request_id", "eligible_package_ids", - "ttl_sec" + "serve_window_sec" ], "additionalProperties": true } diff --git a/tests/example-validation-simple.test.cjs b/tests/example-validation-simple.test.cjs index aefb2391bd..752ae303f5 100644 --- a/tests/example-validation-simple.test.cjs +++ b/tests/example-validation-simple.test.cjs @@ -581,7 +581,7 @@ async function runTests() { "type": "identity_match_response", "request_id": "id-7c9e1d", "eligible_package_ids": ["pkg-outdoor-audio"], - "ttl_sec": 60 + "serve_window_sec": 60 }, '/schemas/tmp/identity-match-response.json', 'TMP Identity Match response — web (overview walkthrough)'