Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
53 changes: 48 additions & 5 deletions doc/plan/database-optimization.md
Original file line number Diff line number Diff line change
Expand Up @@ -11,7 +11,7 @@ Status definitions:
- `Not started`: no clear implementation has been found yet.
- `Deferred`: not recommended for the current business stage; only trigger conditions are retained.

Current overall progress: about `77%`. This number is manually estimated by phase weight and can be adjusted later according to actual completed items.
Current overall progress: about `80%`. This number is manually estimated by phase weight and can be adjusted later according to actual completed items.

| Phase | Weight | Current completion | Status | Completed | Not done / next steps |
| ----- | ------ | ------------------ | ------ | --------- | --------------------- |
Expand All @@ -20,7 +20,7 @@ Current overall progress: about `77%`. This number is manually estimated by phas
| Phase 2: Read models and cache first | 15% | 100% | Done | Redis and Asynq dependencies are reusable; admin dashboard stats, admin project list, and dashboard account summary have short-TTL Redis cache; stats/project list/account cache misses are merged with singleflight; project, prepublish, publish, and account write paths invalidate the related dashboard cache; `workspace_dashboard_stats` and `project_list_summaries` read models are in place, and APIs prefer read models when coverage is complete; async refresh triggers after project save, platform sync, publish completion, and member changes; admin rebuild API, Asynq queue, and worker support full read-model rebuild | None |
| Phase 3: Read/write splitting | 15% | 100% | Done | Optional `DB_READER_*` connection, application-level DB Router, signed sticky writer, consistency routing for project/stats/workspace/platform_account/publish/prepublish/mediaasset/browser_session/extension, consistency-level inventories for dashboard/publish/collab-service, self-hosted PostgreSQL read replica, managed `postgres-reader` entry point, PgBouncer reader pool, replica lag monitoring and automatic fallback to writer when over threshold | None; Phase 4 continues partitioning, archiving, and recovery flows |
| Phase 4: Single-database partitioning, archiving, and hot/cold tiering | 15% | 100% | Done | Collaborative editing already has state + update batch + compaction foundation; `collab_document_update_batches` has PostgreSQL `document_id` hash partition target schema; event and terminal-session history already have row-level R2/S3 archive worker; `publish_events`, `extension_execution_events`, `project_activities`, `workspace_activities`, and `remote_browser_sessions` have PostgreSQL monthly partition target schema; the archive worker exports whole cold monthly partitions to R2/S3 before detaching and dropping them; archive recovery procedure is documented | None |
| Phase 5: Citus preparation | 20% | 35% | In progress | Workspace model, `projects.workspace_id`, personal workspace ID, and explicit or derived `workspace_id` coverage for project-domain tables | Citus distribution column/colocation design, unique constraint and foreign-key review, worker payload routing |
| Phase 5: Citus preparation | 20% | 50% | In progress | Workspace model, `projects.workspace_id`, personal workspace ID, explicit or derived `workspace_id` coverage for project-domain tables, and Citus distributed/reference/colocated table-group design | Unique constraint and foreign-key review, worker payload routing |
| Phase 6: Citus distributed PostgreSQL operation | 10% | 0% | Deferred | None | Future Citus cluster design, worker/coordinator monitoring and backup, large-tenant isolation strategy |

### 0.1 Progress Update Rules
Expand Down Expand Up @@ -60,14 +60,14 @@ Atomic commit guidance:
| Application connection pools | Done | backend/publish-worker support `DB_MAX_OPEN_CONNS`, `DB_MAX_IDLE_CONNS`, `DB_CONN_MAX_LIFETIME`, and `DB_CONN_MAX_IDLE_TIME`; collab-service node-postgres pool supports `DB_MAX_OPEN_CONNS`, `DB_CONN_MAX_LIFETIME`, and `DB_CONN_MAX_IDLE_TIME`; Redis client supports `REDIS_POOL_SIZE`, `REDIS_MIN_IDLE_CONNS`, `REDIS_MAX_IDLE_CONNS`, `REDIS_CONN_MAX_IDLE_TIME`, and `REDIS_CONN_MAX_LIFETIME`; Docker Compose and self-hosted Kubernetes reuse PostgreSQL connections through PgBouncer writer pool; GORM PostgreSQL driver uses simple protocol to remain compatible with transaction pooling; self-hosted Kubernetes added PgBouncer reader pool and points app `DB_READER_HOST` to the reader pool | None | `backend/internal/db/db.go`, `backend/internal/redisclient/redisclient.go`, `collab-service/src/config.ts`, `collab-service/src/persistence/document-persistence.ts`, `collab-service/src/persistence/document-persistence.test.ts`, `deploy/docker/docker-compose.yml`, `deploy/kubernetes/data-services/self-hosted/pgbouncer.yaml`, `deploy/kubernetes/app-baseline/app-config.yaml` |
| Query observability | Done | GORM QueryObserver, slow-query logs, `mpp_db_queries_total`, `mpp_db_query_duration_seconds`, `mpp_db_slow_queries_total`, self-hosted PostgreSQL `pg_stat_statements`, PostgreSQL exporter table-level health metrics, Grafana table row count / 24h growth / table size / index size / dead tuples / vacuum panels | Threshold alerts can continue to be added by production baseline in Phase 1/4 | `backend/internal/db/query_observer.go`, `backend/internal/observability/observability.go`, `script/db/audit_database_baseline.sql`, `deploy/docker/observability/postgres-exporter/queries.yml`, `deploy/docker/observability/grafana/dashboards/mpp-observability-baseline.json` |
| Dashboard query audit | Done | Existing audit script covers dashboard Count, list, platform filter, publication preload, account query, and active session query plans | Not yet turned into a periodic CI/ops gate | `script/db/audit_dashboard_query_plans.sql` |
| Tenant boundary | In progress | Existing `workspaces`, `workspace_members`, `projects.workspace_id`, and personal workspace rules; project-domain publishing, activity, comment, version, share-link, media, AI, extension, and collaboration rows now carry `workspace_id` directly, with model hooks or service write paths deriving it from the owning project, document, or media asset for new writes | Citus colocation groups, distributed-table constraints, and worker payload ownership are still open | `backend/internal/models/hooks.go`, `backend/internal/models/ai_hooks.go`, `backend/internal/models/collab.go`, `backend/internal/db/monthly_partitions.go`, `backend/internal/db/hash_partitions.go`, `collab-service/src/persistence/document-persistence.ts`, `backend/internal/db/db_test.go`, `collab-service/src/persistence/document-persistence.test.ts` |
| Tenant boundary | In progress | Existing `workspaces`, `workspace_members`, `projects.workspace_id`, and personal workspace rules; project-domain publishing, activity, comment, version, share-link, media, AI, extension, and collaboration rows now carry `workspace_id` directly, with model hooks or service write paths deriving it from the owning project, document, or media asset for new writes; Citus target table groups are mapped to the `tenant_workspace` colocation group | Distributed-table constraints and worker payload ownership are still open | `backend/internal/models/hooks.go`, `backend/internal/models/ai_hooks.go`, `backend/internal/models/collab.go`, `backend/internal/db/monthly_partitions.go`, `backend/internal/db/hash_partitions.go`, `collab-service/src/persistence/document-persistence.ts`, `backend/internal/db/db_test.go`, `collab-service/src/persistence/document-persistence.test.ts`, `script/db/citus_table_groups.yml`, `script/db/test_citus_table_groups.rb` |
| Dashboard read models | Done | Added `workspace_dashboard_stats` and `project_list_summaries` read models, idempotently recomputed from fact tables by a centralized readmodel service; async refresh is triggered after project save, platform sync, publish completion, and member changes; admin stats and admin project list prefer read models when coverage is complete; admin rebuild API enqueues through Asynq, and API/worker processes can start readmodel workers for full rebuild from fact tables | None | `backend/internal/models/models.go`, `backend/internal/services/readmodel/service.go`, `backend/internal/services/readmodel/queue.go`, `backend/internal/services/readmodel/service_test.go`, `backend/internal/services/readmodel/queue_test.go`, `backend/internal/services/stats/overview.go`, `backend/internal/services/project/lifecycle.go`, `backend/internal/handlers/dashboard.go`, `backend/cmd/api/main.go`, `backend/cmd/publish-worker/main.go` |
| Redis read cache | Done | Redis is already used for queues, locks, OAuth, browser sessions, and short-term coordination; admin dashboard stats, admin project list, and dashboard account summary use 15s TTL cache and bypass scoped/sticky-writer strong-consistency paths; stats/project list/account cache misses use singleflight to prevent process-local stampede; stats and account caches use versioned payloads and semantic validation, and Redis read-error fallback is also merged into one DB computation per key; project create/edit/platform save, prepublish sync/draft update, publish queue/execute/fail, and platform account write paths invalidate the related dashboard cache; full read-model rebuild reuses the Redis/Asynq queue | None | `backend/internal/services/stats/overview.go`, `backend/internal/services/stats/overview_test.go`, `backend/internal/services/project/list_cache.go`, `backend/internal/services/project/list_cache_test.go`, `backend/internal/services/prepublish/drafts.go`, `backend/internal/services/publish/service.go`, `backend/internal/services/publish/queue.go`, `backend/internal/services/publish/publication_flow_test.go`, `backend/internal/services/publish/queue_test.go`, `backend/internal/services/platform_account/account_cache.go`, `backend/internal/services/platform_account/account_cache_test.go`, `backend/internal/services/browser_session/complete.go`, `backend/internal/services/browser_session/service_test.go`, `backend/internal/services/readmodel/queue.go` |
| Read/write splitting | Done | Supports optional `DB_READER_*` read-replica connection, `DefaultRouter`, and signed sticky writer; project/stats/workspace/platform_account/publish/prepublish/mediaasset/browser_session/extension are wired to strong/eventual/writer routing; dashboard, publish, and collab-service consistency-level inventories are complete, with collab-service online path kept writer-only; writer/reader pools are in self-hosted Kubernetes, and managed overlay provides a `postgres-reader` ExternalName entry point; `DB_READER_MAX_REPLICA_LAG` configures the replica lag threshold, eventual/analytics reads automatically fall back to writer when over threshold or lag is unknown, and `mpp_db_replica_lag_seconds` and `mpp_db_replica_healthy` metrics are exposed | None | `backend/internal/db/db.go`, `backend/internal/db/router.go`, `backend/internal/db/replica_lag.go`, `backend/internal/services/publish/service.go`, `backend/internal/services/prepublish/service.go`, `backend/internal/services/mediaasset/service.go`, `backend/internal/services/browser_session/service.go`, `backend/internal/services/extension/service.go`, `backend/internal/app/runtime.go`, `deploy/kubernetes/data-services/self-hosted/postgres.yaml`, `deploy/kubernetes/data-services/self-hosted/pgbouncer.yaml`, `deploy/kubernetes/data-services/managed/services.yaml`, `script/kubernetes/validation/data_services.rb` |
| Event-table partitioning and archiving | Done | `publish_events`, `extension_execution_events`, `project_activities`, `workspace_activities`, and terminal `remote_browser_sessions` have default retention periods; the `archive` worker can batch-export JSONL to R2/S3 and delete old hot-table rows after successful upload; PostgreSQL schema initialization now creates monthly `created_at` partitions for `publish_events`, `extension_execution_events`, `project_activities`, `workspace_activities`, and `remote_browser_sessions`, with partition-compatible `(id, created_at)` primary keys and rolling partition creation; the archive worker exports whole cold monthly partitions as JSONL to R2/S3, then detaches and drops the partition after successful upload; PostgreSQL browser-session active-row fallback uses a scoped advisory transaction lock because partitioned unique constraints must include the partition key; the archive recovery procedure defines inspection, staging restore, optional hot-table reinsertion, and audit checks | None | `backend/internal/db/monthly_partitions.go`, `backend/internal/db/db.go`, `backend/internal/models/models.go`, `backend/internal/db/db_test.go`, `backend/internal/services/browser_session/start.go`, `backend/internal/services/browser_session/cleanup.go`, `backend/internal/services/archive/worker.go`, `backend/internal/services/archive/partitions.go`, `backend/internal/services/archive/worker_test.go`, `backend/internal/services/archive/partitions_test.go`, Phase 4 archive recovery procedure in this document |
| Collaboration batch governance | In progress | `collab_document_states`, `collab_document_update_batches`, and compaction/retention foundations exist; PostgreSQL schema initialization creates a 16-way `document_id` hash-partitioned `collab_document_update_batches` target table and migrates existing regular-table rows into it | Cold archiving is not implemented | `backend/internal/db/hash_partitions.go`, `backend/internal/db/db.go`, `backend/internal/models/collab.go`, `backend/internal/db/db_test.go`, `collab-service/src/persistence/document-persistence.ts` |
| Outbox/CDC/event stream | In progress | The publishing queue path has a transactional Outbox: `EnqueuePublishProject` writes `outbox_events` in the same transaction and dispatches immediately after commit; publish worker starts an outbox dispatcher and supports retries for failed/stale processing records; Asynq continues to serve as the task-execution queue, and `PublishEvent` continues to serve as publishing audit | Currently covers only `publish.job_requested`; general business-event outbox, Debezium, and Redpanda/Kafka CDC are not implemented | `backend/internal/services/publish/queue.go`, `backend/internal/services/publish/outbox.go`, `backend/internal/models/models.go` |
| Citus target state | In progress | Confirmed `workspace_id` as the most suitable distribution-column direction; project-domain tables now have explicit or stable derived tenant routing coverage, including direct-insert fallbacks for scheduled publishing, media, and AI rows | Citus distributed tables, reference tables, colocation, unique-constraint review, and worker payload routing are not implemented | Phase 5 checklist, `backend/internal/models/ai_hooks.go`, and `backend/internal/db/db_test.go` |
| Citus target state | In progress | Confirmed `workspace_id` as the most suitable distribution-column direction; project-domain tables now have explicit or stable derived tenant routing coverage, including direct-insert fallbacks for scheduled publishing, media, and AI rows; distributed tables, reference-table candidates, control-domain tables, and deferred colocation gaps are mapped in a machine-readable design manifest | Unique-constraint / foreign-key review and worker payload routing are not implemented | Phase 5 checklist, `script/db/citus_table_groups.yml`, `script/db/test_citus_table_groups.rb`, `backend/internal/models/ai_hooks.go`, and `backend/internal/db/db_test.go` |

### 0.3 Phase Checklist

Expand Down Expand Up @@ -128,7 +128,7 @@ Atomic commit guidance:
- [x] Confirm Workspace as the long-term tenant boundary.
- [x] Confirm `workspace_id` as the preferred Citus distribution column.
- [x] Complete `workspace_id` or a stable derivation path for project-domain tables. Verification entry point: `backend/internal/models/hooks.go`, `backend/internal/models/ai_hooks.go`, `backend/internal/models/collab.go`, `backend/internal/db/monthly_partitions.go`, `backend/internal/db/hash_partitions.go`, `collab-service/src/persistence/document-persistence.ts`, `backend/internal/db/db_test.go`, `collab-service/src/persistence/document-persistence.test.ts`.
- [ ] Design Citus distributed tables, reference tables, and colocated table groups.
- [x] Design Citus distributed tables, reference tables, and colocated table groups. Verification entry point: `script/db/citus_table_groups.yml`, `script/db/test_citus_table_groups.rb`, and Section 7.2.1 Citus table group design.
- [ ] Review unique constraints, foreign keys, and cross-tenant joins.
- [ ] Add `workspace_id` to worker payloads.

Expand Down Expand Up @@ -576,6 +576,49 @@ Pre-deployment modeling strategy:
- New project-domain tables should include `workspace_id` from the beginning when they need tenant-local querying.
- `platform_accounts` is currently user + platform scoped and is better kept in the control domain first; if accounts later become workspace assets, model them in a colocated table group distributed by `workspace_id`.

#### 7.2.1 Citus Table Group Design

The Phase 5 target design is captured in `script/db/citus_table_groups.yml` and checked by `script/db/test_citus_table_groups.rb` so the table grouping can be reviewed without requiring a live Citus cluster during normal local development.

Physical colocation group:

| Group | Distribution value | Tables | Notes |
| ----- | ------------------ | ------ | ----- |
| `tenant_workspace` | Workspace UUID; `workspaces.id` for the root table and `workspace_id` for child tables | `workspaces`, workspace membership/invite/activity/read-model rows, project rows, project publishing rows, project read models, media metadata, collaboration state, AI project context, extension events, notifications, and workspace metering | Use `workspaces` as the first distributed table, then distribute child tables with `colocate_with => 'workspaces'` after constraint review |

Reference-table candidates:

| Table | Decision | Notes |
| ----- | -------- | ----- |
| `users` | Use as a reference-table candidate for the first validation cluster | Keeps permission, author, and actor joins local to shards; revisit when user cardinality or PII replication makes this undesirable |
| System content templates | Split from `content_templates` before Citus and treat as reference data | Avoids nullable tenant routing for global template rows while workspace templates remain tenant-local |

Control-domain tables for the first Citus validation:

| Table | Why it stays outside the colocated group first |
| ----- | --------------------------------------------- |
| `platform_accounts`, `platform_account_grants` | Credential ownership is still user/platform scoped, and `platform_accounts.workspace_id` is nullable |
| `remote_browser_sessions` | Short-lived runtime state is governed by Redis and the existing monthly archive path |
| `outbox_events` | Queue dispatch remains coordinator/control-domain state until worker payloads carry `workspace_id` |
| `extension_execution_event_claims` | Global idempotency claims do not carry tenant routing yet |

Deferred colocation gaps:

| Table | Required follow-up before distribution |
| ----- | -------------------------------------- |
| `project_collaborators`, `collab_document_collaborators` | Add `workspace_id` and review composite primary keys |
| `publish_attempts` | Add `workspace_id` or make attempts dependent on a colocated scheduled-publication key |
| `content_templates` | Split system templates from workspace templates or make tenant-scoped rows non-null by design |
| `ai_drafting_messages`, `ai_tool_calls`, `ai_drafting_session_summaries`, `ai_session_events` | Add `workspace_id` derived from `ai_drafting_sessions` and review session-local uniqueness |

Initial Citus validation DDL shape:

```sql
SELECT create_distributed_table('workspaces', 'id');
SELECT create_distributed_table('<tenant_table>', 'workspace_id', colocate_with => 'workspaces');
SELECT create_reference_table('<small_global_dimension_table>');
```

### 7.3 Table Partitioning Recommendations

| Table | Partitioning method | Reason |
Expand Down
Loading
Loading