Skip to content

integrate uid2 with ttd-databricks#47

Open
adithyasamavedhi-ttd wants to merge 1 commit into
mainfrom
Integrate-UID2-with-ttd-databricks
Open

integrate uid2 with ttd-databricks#47
adithyasamavedhi-ttd wants to merge 1 commit into
mainfrom
Integrate-UID2-with-ttd-databricks

Conversation

@adithyasamavedhi-ttd
Copy link
Copy Markdown
Collaborator

@adithyasamavedhi-ttd adithyasamavedhi-ttd commented May 19, 2026

What does this MR do?

  • Integrate UID2 resolution from ttd-data into ttd-databricks
  • it adds uid2_resolutions column to output table, this is a breaking change since now all output columns need to have this column. Currently no users of the sdk.

Tests

NOTE: All screenshots include fabricated/test emails, phones, hashed emails, hashed phones and the converted UID2s are from the staging/test operator

  1. Advertiser Data with UID2 Config

push_data:
Screenshot 2026-05-22 at 9 51 43 AM

batch_process:

Screenshot 2026-05-22 at 9 53 39 AM

Retrieve the Mapping:
Screenshot 2026-05-22 at 10 30 21 AM

  1. Offline Conversion with UID2 Config

push_data:
Screenshot 2026-05-22 at 9 57 11 AM

batch_process:
Screenshot 2026-05-22 at 1 53 19 PM

Retrieve Mapping:
Screenshot 2026-05-22 at 2 17 31 PM

  1. Advertiser Data without UID2 Config

push_data:
Screenshot 2026-05-22 at 1 54 35 PM

batch_process:
Screenshot 2026-05-22 at 2 02 22 PM

  1. Offline conversion without UID2 Config

push_data:
Screenshot 2026-05-22 at 2 06 36 PM

batch_process:

Screenshot 2026-05-22 at 2 13 45 PM

@adithyasamavedhi-ttd adithyasamavedhi-ttd requested a review from a team May 19, 2026 21:26
@adithyasamavedhi-ttd adithyasamavedhi-ttd force-pushed the Integrate-UID2-with-ttd-databricks branch 16 times, most recently from 6a9d400 to 6d00880 Compare May 22, 2026 21:42
@adithyasamavedhi-ttd adithyasamavedhi-ttd force-pushed the Integrate-UID2-with-ttd-databricks branch from 6d00880 to bb1fe13 Compare May 22, 2026 21:55


def _resolution_to_dict(resolution: Any, submitted_id: Optional[str]) -> dict[str, Any]:
return {
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's basically identical struct in ttd_databricks_python/ttd_databricks/schemas/init.py
Does it make sense to introduce a DTO?

  @dataclass
  class Uid2ResolutionRecord:
      submitted_id: Optional[str]
      current_uid2: Optional[str]
      previous_uid2: Optional[str]
      refresh_from: Optional[datetime]
      unmapped_reason: Optional[str]

Comment thread pyproject.toml

[[tool.mypy.overrides]]
# ttd_data re-exports DataClient/UserIdType/etc. from submodules without `__all__`,
# which mypy's strict mode flags as not explicitly exported. Treat them as exported.
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does it make sense to fix it at source? in ttd_data?

Comment thread pyproject.toml

dependencies = [
"ttd-data>=0.1.7",
"ttd-data>=0.2.0",
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps worth capping the upper version too, especially with breaking changes on the way to the dataserver.

self._uid2_config,
)

output_df.write.format("delta").mode("append").saveAsTable(output_table)
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will throw due to missing uid2_resolutions column if client is not on the latest schema when they run this.

It might be worth doing a check after process_partitions before output_df.write to validate the schema before the client makes any API calls.

Maybe even constructing them an ALTER TABLE ... command for a quick fix.


# Empty `uid2_resolutions` value for failure paths. Read-only: callers spread into
# new dicts (`**EMPTY_RESOLUTION_VALUE`), never mutate.
EMPTY_RESOLUTION_VALUE: dict[str, list[Any]] = {UID2_RESOLUTIONS_COLUMN: []}
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might be worth to add a function that returns a copy, to avoid mutation risk

  def empty_resolution_value() -> dict[str, list[Any]]:
      return {UID2_RESOLUTIONS_COLUMN: []}

}


def attach_resolutions(
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should either mutate and return none or return a copy and do not mutate the inputs.

return [[d["id_value"]] if is_raw_pii_id_type(d["id_type"]) else [] for d in items_data]


def extract_response_data(response: Any, server_response_attr: str) -> tuple[list[Any], dict[str, UID2Resolution]]:
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like server_response_attr could be an enum or a constant. String typing appears to be a bit fragile/typo prone.

Email, Phone, HashedEmail, HashedPhone.
Email/Phone/HashedEmail/HashedPhone are resolved to a UID2/EUID
client-side by ttd-data sdk and mapping is stored under
`uid2_resolutions` column of output table.
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it might be worth adding that Email/Phone/HashedEmail/HashedPhone are also uid2_config dependent.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants