Skip to content

feat(eap): Co-occurring attrs v2 — merge master + add last_seen field#8095

Closed
phacops wants to merge 7 commits into
masterfrom
claude/friendly-ride-hs4g31
Closed

feat(eap): Co-occurring attrs v2 — merge master + add last_seen field#8095
phacops wants to merge 7 commits into
masterfrom
claude/friendly-ride-hs4g31

Conversation

@phacops

@phacops phacops commented Jun 23, 2026

Copy link
Copy Markdown
Contributor

Summary

Builds on top of the co-occurring attributes v2 work from #7801 (branch phacops/eap-co-occurring-attrs-v2), brings it up to date with master, and adds a last_seen field for these attributes.

What's included

  1. Merged master into the v2 work and resolved the conflict. The PR's migration was numbered 0059, but master now has 0059_add_array_attribute_map_columns.py and 0060_add_conversation_id_and_session_id.py. Migration numbers must be strictly increasing with no duplicates (enforced by DirectoryLoader), so the migration was renumbered to 0061_add_count_to_co_occurring_attrs.py.

  2. Added a last_seen field for the co-occurring attributes. A new last_seen column tracks the most recent timestamp at which a set of attributes was seen:

    • Type is SimpleAggregateFunction(max, DateTime). The v2 table uses SummingMergeTree, which applies the max aggregate on merge, so the latest timestamp is preserved as rows collapse.
    • The materialized view from eap_items_1_local populates it via timestamp AS last_seen.
    • The column is also exposed in the eap_item_co_occurring_attrs_v2 readable storage config.

Validation

  • EventsAnalyticsPlatformLoader loads all 60 EAP migrations with no duplicate/gap errors (latest is 0061).
  • The migration renders valid ClickHouse DDL — the table and MV both include last_seen SimpleAggregateFunction(max, DateTime).
  • snuba/validate_configs.py reports all configs valid, including the updated v2 storage.
  • ruff check and ruff format --check pass on the migration.

🤖 Generated with Claude Code


Generated by Claude Code

phacops and others added 7 commits March 5, 2026 12:04
Add a new SummingMergeTree-based storage for co-occurring attributes
that includes a count column for proper deduplication via key_hash.
The v2 storage is gated behind a `use_co_occurring_attrs_v2` feature
flag. Also simplify result row parsing in the attribute names endpoint.

Co-Authored-By: Claude <noreply@anthropic.com>

Agent transcript: https://claudescope.sentry.dev/share/yM8dAMnfR-nHQ6Z7BKDQd12ih3FsVPMAzgudpbFlskw
Master picked up 0054_fix_bools_in_autocomplete; bump this one to 0055
to resolve the duplicate migration number.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Agent transcript: https://claudescope.sentry.dev/share/3bKJJo4cpTu-irMjftAcw6rYLjZEJsxUtHC2hucYt6s
Bring the branch up to date with master and narrow it to just the new
co-occurring attributes storage:

- Renumber the migration 0055 -> 0059 (0055-0058 are now taken on master).
- Drop the endpoint changes (the `use_co_occurring_attrs_v2` flag and the
  storage switch). The v2 SummingMergeTree table with the `count` column is
  landed as groundwork only; the attribute-names endpoint continues to read
  the existing storage. Wiring the endpoint to read v2 (and sort by
  sum(count)) will be a follow-up.

Refs EAP-432
…on number

Resolve the conflict from merging master into the co-occurring attrs v2
work by renumbering the migration from 0059 to 0061 (0059 and 0060 are
now taken on master), which keeps migration numbers strictly increasing.

Add a `last_seen` column to the v2 co-occurring attributes storage so we
can track the most recent time a set of attributes was seen. It is a
SimpleAggregateFunction(max, DateTime), which the SummingMergeTree engine
collapses with `max` during merges, and the materialized view populates it
from the item `timestamp`.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01SQHFWAZS2wQBJ2GTCGCoax
@phacops phacops requested review from a team as code owners June 23, 2026 18:46

phacops commented Jun 23, 2026

Copy link
Copy Markdown
Contributor Author

Closing — these changes (master merge + migration renumber to 0061 + the last_seen field) have been pushed directly to #7801 instead.


Generated by Claude Code

@phacops phacops closed this Jun 23, 2026
@github-actions

Copy link
Copy Markdown

This PR has a migration; here is the generated SQL for ./snuba/migrations/groups.py ()

-- start migrations

-- forward migration events_analytics_platform : 0061_add_count_to_co_occurring_attrs
Local op: CREATE TABLE IF NOT EXISTS eap_item_co_occurring_attrs_2_local ON CLUSTER 'cluster_one_sh' (organization_id UInt64, project_id UInt64, item_type UInt8, date Date CODEC (DoubleDelta, ZSTD(1)), retention_days UInt16, attribute_keys_hash Array(UInt64) MATERIALIZED arrayMap(k -> cityHash64(k), arrayDistinct(arrayConcat(attributes_string, attributes_float, attributes_bool))), attributes_string Array(String), attributes_float Array(String), attributes_bool Array(String), key_hash UInt64 MATERIALIZED cityHash64(arraySort(arrayDistinct(arrayConcat(attributes_string, attributes_float, attributes_bool)))), count UInt64, last_seen SimpleAggregateFunction(max, DateTime)) ENGINE ReplicatedSummingMergeTree('/clickhouse/tables/events_analytics_platform/{shard}/default/eap_item_co_occurring_attrs_2_local', '{replica}') PRIMARY KEY (organization_id, project_id, date, item_type, key_hash) ORDER BY (organization_id, project_id, date, item_type, key_hash, retention_days) PARTITION BY (retention_days, toMonday(date)) TTL date + toIntervalDay(retention_days);
Distributed op: CREATE TABLE IF NOT EXISTS eap_item_co_occurring_attrs_2_dist ON CLUSTER 'cluster_one_sh' (organization_id UInt64, project_id UInt64, item_type UInt8, date Date CODEC (DoubleDelta, ZSTD(1)), retention_days UInt16, attribute_keys_hash Array(UInt64) MATERIALIZED arrayMap(k -> cityHash64(k), arrayDistinct(arrayConcat(attributes_string, attributes_float, attributes_bool))), attributes_string Array(String), attributes_float Array(String), attributes_bool Array(String), key_hash UInt64 MATERIALIZED cityHash64(arraySort(arrayDistinct(arrayConcat(attributes_string, attributes_float, attributes_bool)))), count UInt64, last_seen SimpleAggregateFunction(max, DateTime)) ENGINE Distributed(`cluster_one_sh`, default, eap_item_co_occurring_attrs_2_local);
Local op: ALTER TABLE eap_item_co_occurring_attrs_2_local ON CLUSTER 'cluster_one_sh' ADD INDEX IF NOT EXISTS bf_attribute_keys_hash attribute_keys_hash TYPE bloom_filter GRANULARITY 1;
Local op: CREATE MATERIALIZED VIEW IF NOT EXISTS eap_item_co_occurring_attrs_3_mv ON CLUSTER 'cluster_one_sh' TO eap_item_co_occurring_attrs_2_local (organization_id UInt64, project_id UInt64, item_type UInt8, date Date CODEC (DoubleDelta, ZSTD(1)), retention_days UInt16, attribute_keys_hash Array(UInt64) MATERIALIZED arrayMap(k -> cityHash64(k), arrayDistinct(arrayConcat(attributes_string, attributes_float, attributes_bool))), attributes_string Array(String), attributes_float Array(String), attributes_bool Array(String), key_hash UInt64 MATERIALIZED cityHash64(arraySort(arrayDistinct(arrayConcat(attributes_string, attributes_float, attributes_bool)))), count UInt64, last_seen SimpleAggregateFunction(max, DateTime)) AS 
SELECT
    organization_id AS organization_id,
    project_id AS project_id,
    item_type as item_type,
    toMonday(timestamp) AS date,
    retention_days as retention_days,
    arrayConcat(mapKeys(attributes_string_0), mapKeys(attributes_string_1), mapKeys(attributes_string_2), mapKeys(attributes_string_3), mapKeys(attributes_string_4), mapKeys(attributes_string_5), mapKeys(attributes_string_6), mapKeys(attributes_string_7), mapKeys(attributes_string_8), mapKeys(attributes_string_9), mapKeys(attributes_string_10), mapKeys(attributes_string_11), mapKeys(attributes_string_12), mapKeys(attributes_string_13), mapKeys(attributes_string_14), mapKeys(attributes_string_15), mapKeys(attributes_string_16), mapKeys(attributes_string_17), mapKeys(attributes_string_18), mapKeys(attributes_string_19), mapKeys(attributes_string_20), mapKeys(attributes_string_21), mapKeys(attributes_string_22), mapKeys(attributes_string_23), mapKeys(attributes_string_24), mapKeys(attributes_string_25), mapKeys(attributes_string_26), mapKeys(attributes_string_27), mapKeys(attributes_string_28), mapKeys(attributes_string_29), mapKeys(attributes_string_30), mapKeys(attributes_string_31), mapKeys(attributes_string_32), mapKeys(attributes_string_33), mapKeys(attributes_string_34), mapKeys(attributes_string_35), mapKeys(attributes_string_36), mapKeys(attributes_string_37), mapKeys(attributes_string_38), mapKeys(attributes_string_39)) AS attributes_string,
    mapKeys(attributes_bool) AS attributes_bool,
    arrayConcat(mapKeys(attributes_float_0), mapKeys(attributes_float_1), mapKeys(attributes_float_2), mapKeys(attributes_float_3), mapKeys(attributes_float_4), mapKeys(attributes_float_5), mapKeys(attributes_float_6), mapKeys(attributes_float_7), mapKeys(attributes_float_8), mapKeys(attributes_float_9), mapKeys(attributes_float_10), mapKeys(attributes_float_11), mapKeys(attributes_float_12), mapKeys(attributes_float_13), mapKeys(attributes_float_14), mapKeys(attributes_float_15), mapKeys(attributes_float_16), mapKeys(attributes_float_17), mapKeys(attributes_float_18), mapKeys(attributes_float_19), mapKeys(attributes_float_20), mapKeys(attributes_float_21), mapKeys(attributes_float_22), mapKeys(attributes_float_23), mapKeys(attributes_float_24), mapKeys(attributes_float_25), mapKeys(attributes_float_26), mapKeys(attributes_float_27), mapKeys(attributes_float_28), mapKeys(attributes_float_29), mapKeys(attributes_float_30), mapKeys(attributes_float_31), mapKeys(attributes_float_32), mapKeys(attributes_float_33), mapKeys(attributes_float_34), mapKeys(attributes_float_35), mapKeys(attributes_float_36), mapKeys(attributes_float_37), mapKeys(attributes_float_38), mapKeys(attributes_float_39)) AS attributes_float,
    1 AS count,
    timestamp AS last_seen
FROM eap_items_1_local
;
-- end forward migration events_analytics_platform : 0061_add_count_to_co_occurring_attrs




-- backward migration events_analytics_platform : 0061_add_count_to_co_occurring_attrs
Local op: DROP TABLE IF EXISTS eap_item_co_occurring_attrs_3_mv ON CLUSTER 'cluster_one_sh' SYNC;
Distributed op: DROP TABLE IF EXISTS eap_item_co_occurring_attrs_2_dist ON CLUSTER 'cluster_one_sh' SYNC;
Local op: DROP TABLE IF EXISTS eap_item_co_occurring_attrs_2_local ON CLUSTER 'cluster_one_sh' SYNC;
-- end backward migration events_analytics_platform : 0061_add_count_to_co_occurring_attrs

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants