diff --git a/doc/plan/database-optimization.md b/doc/plan/database-optimization.md index 80eb056a..843f2644 100644 --- a/doc/plan/database-optimization.md +++ b/doc/plan/database-optimization.md @@ -11,7 +11,7 @@ Status definitions: - `Not started`: no clear implementation has been found yet. - `Deferred`: not recommended for the current business stage; only trigger conditions are retained. -Current overall progress: about `77%`. This number is manually estimated by phase weight and can be adjusted later according to actual completed items. +Current overall progress: about `80%`. This number is manually estimated by phase weight and can be adjusted later according to actual completed items. | Phase | Weight | Current completion | Status | Completed | Not done / next steps | | ----- | ------ | ------------------ | ------ | --------- | --------------------- | @@ -20,7 +20,7 @@ Current overall progress: about `77%`. This number is manually estimated by phas | Phase 2: Read models and cache first | 15% | 100% | Done | Redis and Asynq dependencies are reusable; admin dashboard stats, admin project list, and dashboard account summary have short-TTL Redis cache; stats/project list/account cache misses are merged with singleflight; project, prepublish, publish, and account write paths invalidate the related dashboard cache; `workspace_dashboard_stats` and `project_list_summaries` read models are in place, and APIs prefer read models when coverage is complete; async refresh triggers after project save, platform sync, publish completion, and member changes; admin rebuild API, Asynq queue, and worker support full read-model rebuild | None | | Phase 3: Read/write splitting | 15% | 100% | Done | Optional `DB_READER_*` connection, application-level DB Router, signed sticky writer, consistency routing for project/stats/workspace/platform_account/publish/prepublish/mediaasset/browser_session/extension, consistency-level inventories for dashboard/publish/collab-service, self-hosted PostgreSQL read replica, managed `postgres-reader` entry point, PgBouncer reader pool, replica lag monitoring and automatic fallback to writer when over threshold | None; Phase 4 continues partitioning, archiving, and recovery flows | | Phase 4: Single-database partitioning, archiving, and hot/cold tiering | 15% | 100% | Done | Collaborative editing already has state + update batch + compaction foundation; `collab_document_update_batches` has PostgreSQL `document_id` hash partition target schema; event and terminal-session history already have row-level R2/S3 archive worker; `publish_events`, `extension_execution_events`, `project_activities`, `workspace_activities`, and `remote_browser_sessions` have PostgreSQL monthly partition target schema; the archive worker exports whole cold monthly partitions to R2/S3 before detaching and dropping them; archive recovery procedure is documented | None | -| Phase 5: Citus preparation | 20% | 35% | In progress | Workspace model, `projects.workspace_id`, personal workspace ID, and explicit or derived `workspace_id` coverage for project-domain tables | Citus distribution column/colocation design, unique constraint and foreign-key review, worker payload routing | +| Phase 5: Citus preparation | 20% | 50% | In progress | Workspace model, `projects.workspace_id`, personal workspace ID, explicit or derived `workspace_id` coverage for project-domain tables, and Citus distributed/reference/colocated table-group design | Unique constraint and foreign-key review, worker payload routing | | Phase 6: Citus distributed PostgreSQL operation | 10% | 0% | Deferred | None | Future Citus cluster design, worker/coordinator monitoring and backup, large-tenant isolation strategy | ### 0.1 Progress Update Rules @@ -60,14 +60,14 @@ Atomic commit guidance: | Application connection pools | Done | backend/publish-worker support `DB_MAX_OPEN_CONNS`, `DB_MAX_IDLE_CONNS`, `DB_CONN_MAX_LIFETIME`, and `DB_CONN_MAX_IDLE_TIME`; collab-service node-postgres pool supports `DB_MAX_OPEN_CONNS`, `DB_CONN_MAX_LIFETIME`, and `DB_CONN_MAX_IDLE_TIME`; Redis client supports `REDIS_POOL_SIZE`, `REDIS_MIN_IDLE_CONNS`, `REDIS_MAX_IDLE_CONNS`, `REDIS_CONN_MAX_IDLE_TIME`, and `REDIS_CONN_MAX_LIFETIME`; Docker Compose and self-hosted Kubernetes reuse PostgreSQL connections through PgBouncer writer pool; GORM PostgreSQL driver uses simple protocol to remain compatible with transaction pooling; self-hosted Kubernetes added PgBouncer reader pool and points app `DB_READER_HOST` to the reader pool | None | `backend/internal/db/db.go`, `backend/internal/redisclient/redisclient.go`, `collab-service/src/config.ts`, `collab-service/src/persistence/document-persistence.ts`, `collab-service/src/persistence/document-persistence.test.ts`, `deploy/docker/docker-compose.yml`, `deploy/kubernetes/data-services/self-hosted/pgbouncer.yaml`, `deploy/kubernetes/app-baseline/app-config.yaml` | | Query observability | Done | GORM QueryObserver, slow-query logs, `mpp_db_queries_total`, `mpp_db_query_duration_seconds`, `mpp_db_slow_queries_total`, self-hosted PostgreSQL `pg_stat_statements`, PostgreSQL exporter table-level health metrics, Grafana table row count / 24h growth / table size / index size / dead tuples / vacuum panels | Threshold alerts can continue to be added by production baseline in Phase 1/4 | `backend/internal/db/query_observer.go`, `backend/internal/observability/observability.go`, `script/db/audit_database_baseline.sql`, `deploy/docker/observability/postgres-exporter/queries.yml`, `deploy/docker/observability/grafana/dashboards/mpp-observability-baseline.json` | | Dashboard query audit | Done | Existing audit script covers dashboard Count, list, platform filter, publication preload, account query, and active session query plans | Not yet turned into a periodic CI/ops gate | `script/db/audit_dashboard_query_plans.sql` | -| Tenant boundary | In progress | Existing `workspaces`, `workspace_members`, `projects.workspace_id`, and personal workspace rules; project-domain publishing, activity, comment, version, share-link, media, AI, extension, and collaboration rows now carry `workspace_id` directly, with model hooks or service write paths deriving it from the owning project, document, or media asset for new writes | Citus colocation groups, distributed-table constraints, and worker payload ownership are still open | `backend/internal/models/hooks.go`, `backend/internal/models/ai_hooks.go`, `backend/internal/models/collab.go`, `backend/internal/db/monthly_partitions.go`, `backend/internal/db/hash_partitions.go`, `collab-service/src/persistence/document-persistence.ts`, `backend/internal/db/db_test.go`, `collab-service/src/persistence/document-persistence.test.ts` | +| Tenant boundary | In progress | Existing `workspaces`, `workspace_members`, `projects.workspace_id`, and personal workspace rules; project-domain publishing, activity, comment, version, share-link, media, AI, extension, and collaboration rows now carry `workspace_id` directly, with model hooks or service write paths deriving it from the owning project, document, or media asset for new writes; Citus target table groups are mapped to the `tenant_workspace` colocation group | Distributed-table constraints and worker payload ownership are still open | `backend/internal/models/hooks.go`, `backend/internal/models/ai_hooks.go`, `backend/internal/models/collab.go`, `backend/internal/db/monthly_partitions.go`, `backend/internal/db/hash_partitions.go`, `collab-service/src/persistence/document-persistence.ts`, `backend/internal/db/db_test.go`, `collab-service/src/persistence/document-persistence.test.ts`, `script/db/citus_table_groups.yml`, `script/db/test_citus_table_groups.rb` | | Dashboard read models | Done | Added `workspace_dashboard_stats` and `project_list_summaries` read models, idempotently recomputed from fact tables by a centralized readmodel service; async refresh is triggered after project save, platform sync, publish completion, and member changes; admin stats and admin project list prefer read models when coverage is complete; admin rebuild API enqueues through Asynq, and API/worker processes can start readmodel workers for full rebuild from fact tables | None | `backend/internal/models/models.go`, `backend/internal/services/readmodel/service.go`, `backend/internal/services/readmodel/queue.go`, `backend/internal/services/readmodel/service_test.go`, `backend/internal/services/readmodel/queue_test.go`, `backend/internal/services/stats/overview.go`, `backend/internal/services/project/lifecycle.go`, `backend/internal/handlers/dashboard.go`, `backend/cmd/api/main.go`, `backend/cmd/publish-worker/main.go` | | Redis read cache | Done | Redis is already used for queues, locks, OAuth, browser sessions, and short-term coordination; admin dashboard stats, admin project list, and dashboard account summary use 15s TTL cache and bypass scoped/sticky-writer strong-consistency paths; stats/project list/account cache misses use singleflight to prevent process-local stampede; stats and account caches use versioned payloads and semantic validation, and Redis read-error fallback is also merged into one DB computation per key; project create/edit/platform save, prepublish sync/draft update, publish queue/execute/fail, and platform account write paths invalidate the related dashboard cache; full read-model rebuild reuses the Redis/Asynq queue | None | `backend/internal/services/stats/overview.go`, `backend/internal/services/stats/overview_test.go`, `backend/internal/services/project/list_cache.go`, `backend/internal/services/project/list_cache_test.go`, `backend/internal/services/prepublish/drafts.go`, `backend/internal/services/publish/service.go`, `backend/internal/services/publish/queue.go`, `backend/internal/services/publish/publication_flow_test.go`, `backend/internal/services/publish/queue_test.go`, `backend/internal/services/platform_account/account_cache.go`, `backend/internal/services/platform_account/account_cache_test.go`, `backend/internal/services/browser_session/complete.go`, `backend/internal/services/browser_session/service_test.go`, `backend/internal/services/readmodel/queue.go` | | Read/write splitting | Done | Supports optional `DB_READER_*` read-replica connection, `DefaultRouter`, and signed sticky writer; project/stats/workspace/platform_account/publish/prepublish/mediaasset/browser_session/extension are wired to strong/eventual/writer routing; dashboard, publish, and collab-service consistency-level inventories are complete, with collab-service online path kept writer-only; writer/reader pools are in self-hosted Kubernetes, and managed overlay provides a `postgres-reader` ExternalName entry point; `DB_READER_MAX_REPLICA_LAG` configures the replica lag threshold, eventual/analytics reads automatically fall back to writer when over threshold or lag is unknown, and `mpp_db_replica_lag_seconds` and `mpp_db_replica_healthy` metrics are exposed | None | `backend/internal/db/db.go`, `backend/internal/db/router.go`, `backend/internal/db/replica_lag.go`, `backend/internal/services/publish/service.go`, `backend/internal/services/prepublish/service.go`, `backend/internal/services/mediaasset/service.go`, `backend/internal/services/browser_session/service.go`, `backend/internal/services/extension/service.go`, `backend/internal/app/runtime.go`, `deploy/kubernetes/data-services/self-hosted/postgres.yaml`, `deploy/kubernetes/data-services/self-hosted/pgbouncer.yaml`, `deploy/kubernetes/data-services/managed/services.yaml`, `script/kubernetes/validation/data_services.rb` | | Event-table partitioning and archiving | Done | `publish_events`, `extension_execution_events`, `project_activities`, `workspace_activities`, and terminal `remote_browser_sessions` have default retention periods; the `archive` worker can batch-export JSONL to R2/S3 and delete old hot-table rows after successful upload; PostgreSQL schema initialization now creates monthly `created_at` partitions for `publish_events`, `extension_execution_events`, `project_activities`, `workspace_activities`, and `remote_browser_sessions`, with partition-compatible `(id, created_at)` primary keys and rolling partition creation; the archive worker exports whole cold monthly partitions as JSONL to R2/S3, then detaches and drops the partition after successful upload; PostgreSQL browser-session active-row fallback uses a scoped advisory transaction lock because partitioned unique constraints must include the partition key; the archive recovery procedure defines inspection, staging restore, optional hot-table reinsertion, and audit checks | None | `backend/internal/db/monthly_partitions.go`, `backend/internal/db/db.go`, `backend/internal/models/models.go`, `backend/internal/db/db_test.go`, `backend/internal/services/browser_session/start.go`, `backend/internal/services/browser_session/cleanup.go`, `backend/internal/services/archive/worker.go`, `backend/internal/services/archive/partitions.go`, `backend/internal/services/archive/worker_test.go`, `backend/internal/services/archive/partitions_test.go`, Phase 4 archive recovery procedure in this document | | Collaboration batch governance | In progress | `collab_document_states`, `collab_document_update_batches`, and compaction/retention foundations exist; PostgreSQL schema initialization creates a 16-way `document_id` hash-partitioned `collab_document_update_batches` target table and migrates existing regular-table rows into it | Cold archiving is not implemented | `backend/internal/db/hash_partitions.go`, `backend/internal/db/db.go`, `backend/internal/models/collab.go`, `backend/internal/db/db_test.go`, `collab-service/src/persistence/document-persistence.ts` | | Outbox/CDC/event stream | In progress | The publishing queue path has a transactional Outbox: `EnqueuePublishProject` writes `outbox_events` in the same transaction and dispatches immediately after commit; publish worker starts an outbox dispatcher and supports retries for failed/stale processing records; Asynq continues to serve as the task-execution queue, and `PublishEvent` continues to serve as publishing audit | Currently covers only `publish.job_requested`; general business-event outbox, Debezium, and Redpanda/Kafka CDC are not implemented | `backend/internal/services/publish/queue.go`, `backend/internal/services/publish/outbox.go`, `backend/internal/models/models.go` | -| Citus target state | In progress | Confirmed `workspace_id` as the most suitable distribution-column direction; project-domain tables now have explicit or stable derived tenant routing coverage, including direct-insert fallbacks for scheduled publishing, media, and AI rows | Citus distributed tables, reference tables, colocation, unique-constraint review, and worker payload routing are not implemented | Phase 5 checklist, `backend/internal/models/ai_hooks.go`, and `backend/internal/db/db_test.go` | +| Citus target state | In progress | Confirmed `workspace_id` as the most suitable distribution-column direction; project-domain tables now have explicit or stable derived tenant routing coverage, including direct-insert fallbacks for scheduled publishing, media, and AI rows; distributed tables, reference-table candidates, control-domain tables, and deferred colocation gaps are mapped in a machine-readable design manifest | Unique-constraint / foreign-key review and worker payload routing are not implemented | Phase 5 checklist, `script/db/citus_table_groups.yml`, `script/db/test_citus_table_groups.rb`, `backend/internal/models/ai_hooks.go`, and `backend/internal/db/db_test.go` | ### 0.3 Phase Checklist @@ -128,7 +128,7 @@ Atomic commit guidance: - [x] Confirm Workspace as the long-term tenant boundary. - [x] Confirm `workspace_id` as the preferred Citus distribution column. - [x] Complete `workspace_id` or a stable derivation path for project-domain tables. Verification entry point: `backend/internal/models/hooks.go`, `backend/internal/models/ai_hooks.go`, `backend/internal/models/collab.go`, `backend/internal/db/monthly_partitions.go`, `backend/internal/db/hash_partitions.go`, `collab-service/src/persistence/document-persistence.ts`, `backend/internal/db/db_test.go`, `collab-service/src/persistence/document-persistence.test.ts`. -- [ ] Design Citus distributed tables, reference tables, and colocated table groups. +- [x] Design Citus distributed tables, reference tables, and colocated table groups. Verification entry point: `script/db/citus_table_groups.yml`, `script/db/test_citus_table_groups.rb`, and Section 7.2.1 Citus table group design. - [ ] Review unique constraints, foreign keys, and cross-tenant joins. - [ ] Add `workspace_id` to worker payloads. @@ -576,6 +576,49 @@ Pre-deployment modeling strategy: - New project-domain tables should include `workspace_id` from the beginning when they need tenant-local querying. - `platform_accounts` is currently user + platform scoped and is better kept in the control domain first; if accounts later become workspace assets, model them in a colocated table group distributed by `workspace_id`. +#### 7.2.1 Citus Table Group Design + +The Phase 5 target design is captured in `script/db/citus_table_groups.yml` and checked by `script/db/test_citus_table_groups.rb` so the table grouping can be reviewed without requiring a live Citus cluster during normal local development. + +Physical colocation group: + +| Group | Distribution value | Tables | Notes | +| ----- | ------------------ | ------ | ----- | +| `tenant_workspace` | Workspace UUID; `workspaces.id` for the root table and `workspace_id` for child tables | `workspaces`, workspace membership/invite/activity/read-model rows, project rows, project publishing rows, project read models, media metadata, collaboration state, AI project context, extension events, notifications, and workspace metering | Use `workspaces` as the first distributed table, then distribute child tables with `colocate_with => 'workspaces'` after constraint review | + +Reference-table candidates: + +| Table | Decision | Notes | +| ----- | -------- | ----- | +| `users` | Use as a reference-table candidate for the first validation cluster | Keeps permission, author, and actor joins local to shards; revisit when user cardinality or PII replication makes this undesirable | +| System content templates | Split from `content_templates` before Citus and treat as reference data | Avoids nullable tenant routing for global template rows while workspace templates remain tenant-local | + +Control-domain tables for the first Citus validation: + +| Table | Why it stays outside the colocated group first | +| ----- | --------------------------------------------- | +| `platform_accounts`, `platform_account_grants` | Credential ownership is still user/platform scoped, and `platform_accounts.workspace_id` is nullable | +| `remote_browser_sessions` | Short-lived runtime state is governed by Redis and the existing monthly archive path | +| `outbox_events` | Queue dispatch remains coordinator/control-domain state until worker payloads carry `workspace_id` | +| `extension_execution_event_claims` | Global idempotency claims do not carry tenant routing yet | + +Deferred colocation gaps: + +| Table | Required follow-up before distribution | +| ----- | -------------------------------------- | +| `project_collaborators`, `collab_document_collaborators` | Add `workspace_id` and review composite primary keys | +| `publish_attempts` | Add `workspace_id` or make attempts dependent on a colocated scheduled-publication key | +| `content_templates` | Split system templates from workspace templates or make tenant-scoped rows non-null by design | +| `ai_drafting_messages`, `ai_tool_calls`, `ai_drafting_session_summaries`, `ai_session_events` | Add `workspace_id` derived from `ai_drafting_sessions` and review session-local uniqueness | + +Initial Citus validation DDL shape: + +```sql +SELECT create_distributed_table('workspaces', 'id'); +SELECT create_distributed_table('', 'workspace_id', colocate_with => 'workspaces'); +SELECT create_reference_table(''); +``` + ### 7.3 Table Partitioning Recommendations | Table | Partitioning method | Reason | diff --git a/script/db/citus_table_groups.yml b/script/db/citus_table_groups.yml new file mode 100644 index 00000000..8b53d628 --- /dev/null +++ b/script/db/citus_table_groups.yml @@ -0,0 +1,220 @@ +version: 1 +distribution_key: workspace_id +physical_colocation_groups: + tenant_workspace: + distribution_type: hash + distribution_value: workspace_uuid + colocate_with: workspaces + description: Tenant-owned tables that should share shards by workspace UUID once Citus validation begins. + tables: + - table: workspaces + domain: tenant_core + distribution_column: id + readiness: ready_after_constraint_review + ddl: "create_distributed_table('workspaces', 'id')" + notes: Root tenant table; the distributed value is the workspace UUID used by child workspace_id columns. + - table: workspace_members + domain: tenant_core + distribution_column: workspace_id + readiness: ready_after_constraint_review + ddl: "create_distributed_table('workspace_members', 'workspace_id', colocate_with => 'workspaces')" + notes: Permission checks should remain shard-local with the workspace row. + - table: workspace_invites + domain: tenant_core + distribution_column: workspace_id + readiness: target_after_unique_constraint_review + ddl: "create_distributed_table('workspace_invites', 'workspace_id', colocate_with => 'workspaces')" + notes: Token uniqueness must be reviewed before distribution. + - table: workspace_activities + domain: tenant_core + distribution_column: workspace_id + readiness: target_after_partition_constraint_review + ddl: "create_distributed_table('workspace_activities', 'workspace_id', colocate_with => 'workspaces')" + notes: Monthly partition primary key currently follows id and created_at. + - table: workspace_dashboard_stats + domain: read_model + distribution_column: workspace_id + readiness: ready_after_constraint_review + ddl: "create_distributed_table('workspace_dashboard_stats', 'workspace_id', colocate_with => 'workspaces')" + notes: Workspace-scoped aggregate read model. + - table: notifications + domain: tenant_core + distribution_column: workspace_id + readiness: target_after_constraint_review + ddl: "create_distributed_table('notifications', 'workspace_id', colocate_with => 'workspaces')" + notes: User notification fan-out stays tenant-local; cross-workspace inboxes need read-model aggregation. + - table: workspace_quota_aggregates + domain: metering + distribution_column: workspace_id + readiness: ready_after_constraint_review + ddl: "create_distributed_table('workspace_quota_aggregates', 'workspace_id', colocate_with => 'workspaces')" + notes: Workspace quota row is naturally keyed by workspace_id. + - table: ai_usage_records + domain: metering + distribution_column: workspace_id + readiness: target_after_constraint_review + ddl: "create_distributed_table('ai_usage_records', 'workspace_id', colocate_with => 'workspaces')" + notes: Usage history is tenant-local and can aggregate into workspace_quota_aggregates. + - table: projects + domain: project + distribution_column: workspace_id + readiness: target_after_unique_constraint_review + ddl: "create_distributed_table('projects', 'workspace_id', colocate_with => 'workspaces')" + notes: Primary project row; project_id-only uniqueness must be reviewed before distribution. + - table: project_platform_publications + domain: project + distribution_column: workspace_id + readiness: target_after_unique_constraint_review + ddl: "create_distributed_table('project_platform_publications', 'workspace_id', colocate_with => 'workspaces')" + notes: Draft/status rows should colocate with projects; project/platform uniqueness needs workspace_id. + - table: project_list_summaries + domain: read_model + distribution_column: workspace_id + readiness: target_after_constraint_review + ddl: "create_distributed_table('project_list_summaries', 'workspace_id', colocate_with => 'workspaces')" + notes: Dashboard list read model follows the project workspace. + - table: project_activities + domain: project + distribution_column: workspace_id + readiness: target_after_partition_constraint_review + ddl: "create_distributed_table('project_activities', 'workspace_id', colocate_with => 'workspaces')" + notes: Monthly partition primary key currently follows id and created_at. + - table: project_comments + domain: project + distribution_column: workspace_id + readiness: target_after_constraint_review + ddl: "create_distributed_table('project_comments', 'workspace_id', colocate_with => 'workspaces')" + notes: Comment reads and writes should remain local to the project workspace. + - table: project_versions + domain: project + distribution_column: workspace_id + readiness: target_after_constraint_review + ddl: "create_distributed_table('project_versions', 'workspace_id', colocate_with => 'workspaces')" + notes: Version history should colocate with projects and collaboration state. + - table: project_share_links + domain: project + distribution_column: workspace_id + readiness: target_after_unique_constraint_review + ddl: "create_distributed_table('project_share_links', 'workspace_id', colocate_with => 'workspaces')" + notes: Token uniqueness must be reviewed before distribution. + - table: scheduled_publications + domain: publishing + distribution_column: workspace_id + readiness: target_after_constraint_review + ddl: "create_distributed_table('scheduled_publications', 'workspace_id', colocate_with => 'workspaces')" + notes: Scheduled publishing workers should route by workspace_id. + - table: publish_events + domain: publishing + distribution_column: workspace_id + readiness: target_after_partition_constraint_review + ddl: "create_distributed_table('publish_events', 'workspace_id', colocate_with => 'workspaces')" + notes: Monthly partition primary key currently follows id and created_at. + - table: extension_callback_tokens + domain: extension + distribution_column: workspace_id + readiness: target_after_unique_constraint_review + ddl: "create_distributed_table('extension_callback_tokens', 'workspace_id', colocate_with => 'workspaces')" + notes: Token uniqueness must be reviewed before distribution. + - table: extension_execution_events + domain: extension + distribution_column: workspace_id + readiness: target_after_partition_constraint_review + ddl: "create_distributed_table('extension_execution_events', 'workspace_id', colocate_with => 'workspaces')" + notes: Monthly partition primary key currently follows id and created_at. + - table: media_assets + domain: media + distribution_column: workspace_id + readiness: target_after_unique_constraint_review + ddl: "create_distributed_table('media_assets', 'workspace_id', colocate_with => 'workspaces')" + notes: Object key uniqueness must be reviewed; object bytes stay in R2/S3. + - table: media_asset_usages + domain: media + distribution_column: workspace_id + readiness: target_after_unique_constraint_review + ddl: "create_distributed_table('media_asset_usages', 'workspace_id', colocate_with => 'workspaces')" + notes: Asset/resource uniqueness must include workspace routing before distribution. + - table: brand_profiles + domain: content_setup + distribution_column: workspace_id + readiness: target_after_constraint_review + ddl: "create_distributed_table('brand_profiles', 'workspace_id', colocate_with => 'workspaces')" + notes: Workspace-owned brand profiles should colocate with projects and AI context. + - table: collab_documents + domain: collaboration + distribution_column: workspace_id + readiness: target_after_constraint_review + ddl: "create_distributed_table('collab_documents', 'workspace_id', colocate_with => 'workspaces')" + notes: Collaboration document metadata follows the workspace tenant. + - table: collab_document_states + domain: collaboration + distribution_column: workspace_id + readiness: target_after_constraint_review + ddl: "create_distributed_table('collab_document_states', 'workspace_id', colocate_with => 'workspaces')" + notes: State snapshots should colocate with their document metadata. + - table: collab_document_update_batches + domain: collaboration + distribution_column: workspace_id + readiness: target_after_partition_constraint_review + ddl: "create_distributed_table('collab_document_update_batches', 'workspace_id', colocate_with => 'workspaces')" + notes: Existing document_id hash partitioning must be rehearsed with Citus partitioned-table support. + - table: ai_context_snapshots + domain: ai + distribution_column: workspace_id + readiness: target_after_constraint_review + ddl: "create_distributed_table('ai_context_snapshots', 'workspace_id', colocate_with => 'workspaces')" + notes: Context snapshots follow the owning project workspace. + - table: ai_growth_optimization_runs + domain: ai + distribution_column: workspace_id + readiness: target_after_constraint_review + ddl: "create_distributed_table('ai_growth_optimization_runs', 'workspace_id', colocate_with => 'workspaces')" + notes: Growth runs should colocate with snapshots and proposals. + - table: ai_proposals + domain: ai + distribution_column: workspace_id + readiness: target_after_constraint_review + ddl: "create_distributed_table('ai_proposals', 'workspace_id', colocate_with => 'workspaces')" + notes: Proposal decisions should stay shard-local with the project. + - table: ai_drafting_sessions + domain: ai + distribution_column: workspace_id + readiness: target_after_constraint_review + ddl: "create_distributed_table('ai_drafting_sessions', 'workspace_id', colocate_with => 'workspaces')" + notes: Drafting sessions are the parent route for AI chat child rows. +reference_tables: + - table: users + readiness: reference_candidate_for_validation + ddl: "create_reference_table('users')" + notes: Needed for distributed permission and authorship joins; revisit when user volume or PII replication becomes a concern. + - table: content_templates_system + readiness: split_from_content_templates_first + ddl: "create_reference_table('content_templates_system')" + notes: System templates should become a separate small reference table before Citus. +control_domain_tables: + - table: platform_accounts + reason: Credential ownership is user/platform scoped today and workspace_id is nullable. + - table: platform_account_grants + reason: Grant rows depend on platform_accounts staying in the control domain until accounts become workspace assets. + - table: remote_browser_sessions + reason: Short-lived runtime state is governed by Redis and monthly archive paths before Citus. + - table: outbox_events + reason: Queue dispatch remains coordinator/control-domain state until worker payloads carry workspace_id. + - table: extension_execution_event_claims + reason: Global idempotency claim table does not have workspace routing today. +deferred_colocation_tables: + - table: project_collaborators + required_change: Add workspace_id and review the primary key before distribution. + - table: collab_document_collaborators + required_change: Add workspace_id and review the document/user primary key before distribution. + - table: publish_attempts + required_change: Add workspace_id or make attempts dependent on a colocated scheduled_publications key. + - table: content_templates + required_change: Split global system templates from workspace templates or make tenant-scoped rows non-null by design. + - table: ai_drafting_messages + required_change: Add workspace_id derived from ai_drafting_sessions for distributed AI history. + - table: ai_tool_calls + required_change: Add workspace_id derived from ai_drafting_sessions for distributed AI history. + - table: ai_drafting_session_summaries + required_change: Add workspace_id derived from ai_drafting_sessions and review session uniqueness. + - table: ai_session_events + required_change: Add workspace_id derived from ai_drafting_sessions for distributed event history. diff --git a/script/db/test_citus_table_groups.rb b/script/db/test_citus_table_groups.rb new file mode 100644 index 00000000..7d58cb15 --- /dev/null +++ b/script/db/test_citus_table_groups.rb @@ -0,0 +1,134 @@ +# frozen_string_literal: true + +require "minitest/autorun" +require "yaml" + +class CitusTableGroupsTest < Minitest::Test + DESIGN_PATH = File.expand_path("citus_table_groups.yml", __dir__) + + REQUIRED_COLOCATED_TABLES = %w[ + workspaces + workspace_members + projects + project_platform_publications + project_list_summaries + project_activities + project_comments + project_versions + project_share_links + scheduled_publications + publish_events + extension_callback_tokens + extension_execution_events + media_assets + media_asset_usages + brand_profiles + collab_documents + collab_document_states + collab_document_update_batches + ai_context_snapshots + ai_growth_optimization_runs + ai_proposals + ai_drafting_sessions + ].freeze + + REQUIRED_CONTROL_TABLES = %w[ + platform_accounts + platform_account_grants + remote_browser_sessions + outbox_events + extension_execution_event_claims + ].freeze + + REQUIRED_DEFERRED_TABLES = %w[ + project_collaborators + collab_document_collaborators + publish_attempts + content_templates + ai_drafting_messages + ai_tool_calls + ai_drafting_session_summaries + ai_session_events + ].freeze + + def setup + @design = YAML.safe_load_file(DESIGN_PATH, aliases: false) + end + + def test_manifest_declares_workspace_distribution + assert_equal 1, @design.fetch("version") + assert_equal "workspace_id", @design.fetch("distribution_key") + + group = tenant_workspace_group + assert_equal "hash", group.fetch("distribution_type") + assert_equal "workspace_uuid", group.fetch("distribution_value") + assert_equal "workspaces", group.fetch("colocate_with") + end + + def test_colocated_tables_use_workspace_distribution + colocated_tables.each do |entry| + table = entry.fetch("table") + expected_column = table == "workspaces" ? "id" : "workspace_id" + + assert_equal expected_column, entry.fetch("distribution_column"), table + assert_includes entry.fetch("ddl"), "create_distributed_table", table + assert entry.fetch("domain").length.positive?, table + assert entry.fetch("readiness").length.positive?, table + assert entry.fetch("notes").length.positive?, table + end + end + + def test_design_covers_phase_five_anchor_tables + missing_colocated = REQUIRED_COLOCATED_TABLES - table_names(colocated_tables) + missing_control = REQUIRED_CONTROL_TABLES - table_names(@design.fetch("control_domain_tables")) + missing_deferred = REQUIRED_DEFERRED_TABLES - table_names(@design.fetch("deferred_colocation_tables")) + + assert_empty missing_colocated, "missing colocated tables: #{missing_colocated.join(", ")}" + assert_empty missing_control, "missing control-domain tables: #{missing_control.join(", ")}" + assert_empty missing_deferred, "missing deferred tables: #{missing_deferred.join(", ")}" + end + + def test_reference_tables_use_reference_ddl + reference_tables = @design.fetch("reference_tables") + + refute_empty reference_tables + reference_tables.each do |entry| + assert_includes entry.fetch("ddl"), "create_reference_table", entry.fetch("table") + assert entry.fetch("notes").length.positive?, entry.fetch("table") + end + end + + def test_every_table_has_one_classification + all_tables = table_names(colocated_tables) + + table_names(@design.fetch("reference_tables")) + + table_names(@design.fetch("control_domain_tables")) + + table_names(@design.fetch("deferred_colocation_tables")) + + duplicates = all_tables.tally.select { |_table, count| count > 1 }.keys + + assert_empty duplicates, "tables classified more than once: #{duplicates.join(", ")}" + end + + def test_non_distributed_entries_explain_why_they_are_not_colocated + @design.fetch("control_domain_tables").each do |entry| + assert entry.fetch("reason").length.positive?, entry.fetch("table") + end + @design.fetch("deferred_colocation_tables").each do |entry| + assert entry.fetch("required_change").length.positive?, entry.fetch("table") + end + end + + private + + def tenant_workspace_group + @design.fetch("physical_colocation_groups").fetch("tenant_workspace") + end + + def colocated_tables + tenant_workspace_group.fetch("tables") + end + + def table_names(entries) + entries.map { |entry| entry.fetch("table") } + end +end