integrate uid2 with ttd-databricks#47
Conversation
6a9d400 to
6d00880
Compare
6d00880 to
bb1fe13
Compare
|
|
||
|
|
||
| def _resolution_to_dict(resolution: Any, submitted_id: Optional[str]) -> dict[str, Any]: | ||
| return { |
There was a problem hiding this comment.
There's basically identical struct in ttd_databricks_python/ttd_databricks/schemas/init.py
Does it make sense to introduce a DTO?
@dataclass
class Uid2ResolutionRecord:
submitted_id: Optional[str]
current_uid2: Optional[str]
previous_uid2: Optional[str]
refresh_from: Optional[datetime]
unmapped_reason: Optional[str]
|
|
||
| [[tool.mypy.overrides]] | ||
| # ttd_data re-exports DataClient/UserIdType/etc. from submodules without `__all__`, | ||
| # which mypy's strict mode flags as not explicitly exported. Treat them as exported. |
There was a problem hiding this comment.
does it make sense to fix it at source? in ttd_data?
|
|
||
| dependencies = [ | ||
| "ttd-data>=0.1.7", | ||
| "ttd-data>=0.2.0", |
There was a problem hiding this comment.
Perhaps worth capping the upper version too, especially with breaking changes on the way to the dataserver.
| self._uid2_config, | ||
| ) | ||
|
|
||
| output_df.write.format("delta").mode("append").saveAsTable(output_table) |
There was a problem hiding this comment.
This will throw due to missing uid2_resolutions column if client is not on the latest schema when they run this.
It might be worth doing a check after process_partitions before output_df.write to validate the schema before the client makes any API calls.
Maybe even constructing them an ALTER TABLE ... command for a quick fix.
|
|
||
| # Empty `uid2_resolutions` value for failure paths. Read-only: callers spread into | ||
| # new dicts (`**EMPTY_RESOLUTION_VALUE`), never mutate. | ||
| EMPTY_RESOLUTION_VALUE: dict[str, list[Any]] = {UID2_RESOLUTIONS_COLUMN: []} |
There was a problem hiding this comment.
Might be worth to add a function that returns a copy, to avoid mutation risk
def empty_resolution_value() -> dict[str, list[Any]]:
return {UID2_RESOLUTIONS_COLUMN: []}
| } | ||
|
|
||
|
|
||
| def attach_resolutions( |
There was a problem hiding this comment.
This should either mutate and return none or return a copy and do not mutate the inputs.
| return [[d["id_value"]] if is_raw_pii_id_type(d["id_type"]) else [] for d in items_data] | ||
|
|
||
|
|
||
| def extract_response_data(response: Any, server_response_attr: str) -> tuple[list[Any], dict[str, UID2Resolution]]: |
There was a problem hiding this comment.
It looks like server_response_attr could be an enum or a constant. String typing appears to be a bit fragile/typo prone.
| Email, Phone, HashedEmail, HashedPhone. | ||
| Email/Phone/HashedEmail/HashedPhone are resolved to a UID2/EUID | ||
| client-side by ttd-data sdk and mapping is stored under | ||
| `uid2_resolutions` column of output table. |
There was a problem hiding this comment.
it might be worth adding that Email/Phone/HashedEmail/HashedPhone are also uid2_config dependent.
What does this MR do?
ttd-dataintottd-databricksuid2_resolutionscolumn to output table, this is a breaking change since now all output columns need to have this column. Currently no users of the sdk.Tests
NOTE: All screenshots include fabricated/test emails, phones, hashed emails, hashed phones and the converted UID2s are from the staging/test operator
push_data:

batch_process:
Retrieve the Mapping:

push_data:

batch_process:

Retrieve Mapping:

push_data:

batch_process:

push_data:

batch_process: