-
Notifications
You must be signed in to change notification settings - Fork 125
Trt 1989 migration queries #3542
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Changes from all commits
4db054c
9681e27
aa8bb16
009ed46
70f67bf
39e0a06
18f3b7f
35565d2
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,128 @@ | ||
| # TRT-1989 Phase 2: Composite Indexes on Denormalized Columns | ||
|
|
||
| **Date:** 2026-05-19 | ||
| **JIRA:** [TRT-1989](https://redhat.atlassian.net/browse/TRT-1989) | ||
| **Depends on:** Phase 1 — column prep (`trt-1989-partitioning-prep.md`) | ||
|
|
||
| ## Purpose | ||
|
|
||
| Phase 1 added denormalized `release` and `timestamp` columns to every table | ||
| that will be partitioned or holds a FK into a partitioned table. Phase 2 | ||
| adds composite indexes on those columns so the query planner can use them | ||
| immediately — before partitioning is applied. | ||
|
|
||
| These indexes mirror the future partition key `(release, timestamp)`. Once | ||
| the tables are partitioned, each partition inherits a local copy of the | ||
| index, and the planner uses partition pruning instead. The indexes added | ||
| here serve two purposes: | ||
|
|
||
| 1. **Immediate benefit** — queries migrated in Phase 3 to filter on the | ||
| denormalized columns will use these indexes on the current | ||
| non-partitioned tables. | ||
| 2. **Validation** — exercising the indexes under production workload | ||
| confirms the column data is correct before committing to partitioning. | ||
|
|
||
| ## Changes | ||
|
|
||
| All changes are GORM index tags on model structs in | ||
| `pkg/db/models/prow.go`. GORM `AutoMigrate` creates the indexes | ||
| automatically on the next migration run. | ||
|
|
||
| ### prow_job_runs | ||
|
|
||
| Added composite index `idx_prow_job_runs_release_timestamp` across | ||
| `ProwJobRelease` and `Timestamp`. | ||
|
|
||
| Also added a standalone index on `ProwJobRunTest.ProwJobID` to support | ||
| variant queries that previously required joining through `prow_job_runs`. | ||
|
Comment on lines
+31
to
+37
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Move the standalone index documentation to the correct section. Lines 36-37 describe an index on 📝 Proposed reorganizationMove lines 36-37 to appear under the The SQL at lines 79-80 correctly shows this index on 🤖 Prompt for AI Agents |
||
|
|
||
| ### prow_job_run_tests | ||
|
|
||
| Added composite index `idx_prow_job_run_tests_release_timestamp` across | ||
| `ProwJobRunTimestamp` and `ProwJobRunRelease`. | ||
|
|
||
| ### prow_job_run_test_outputs | ||
|
|
||
| Added composite index `idx_prow_job_run_test_outputs_release_timestamp` | ||
| across `ProwJobRunTestTimestamp` and `ProwJobRunTestRelease`. | ||
|
|
||
| ### prow_job_run_prow_pull_requests | ||
|
|
||
| Added composite index | ||
| `idx_prow_job_run_prow_pull_requests_release_timestamp` across | ||
| `ProwJobRunRelease` and `ProwJobRunTimestamp`. | ||
|
|
||
| ### prow_job_run_annotations | ||
|
|
||
| Added composite index `idx_prow_job_run_annotations_release_timestamp` | ||
| across `ProwJobRunRelease` and `ProwJobRunTimestamp`. | ||
|
|
||
| ## Explicit SQL | ||
|
|
||
| GORM `AutoMigrate` will create these indexes on the next migration run. | ||
| If you prefer to create them manually — for example, using `CONCURRENTLY` | ||
| to avoid locking production tables — run these statements directly. | ||
|
|
||
| `CREATE INDEX CONCURRENTLY` cannot run inside a transaction, so each | ||
| statement must be executed individually (not wrapped in `BEGIN`/`COMMIT`). | ||
|
|
||
| ### prow_job_runs | ||
|
|
||
| ```sql | ||
| CREATE INDEX CONCURRENTLY IF NOT EXISTS idx_prow_job_runs_release_timestamp | ||
| ON prow_job_runs (prow_job_release, "timestamp"); | ||
| ``` | ||
|
|
||
| ### prow_job_run_tests | ||
|
|
||
| ```sql | ||
| CREATE INDEX CONCURRENTLY IF NOT EXISTS idx_prow_job_run_tests_prow_job_id | ||
| ON prow_job_run_tests (prow_job_id); | ||
|
|
||
| CREATE INDEX CONCURRENTLY IF NOT EXISTS idx_prow_job_run_tests_release_timestamp | ||
| ON prow_job_run_tests (prow_job_run_timestamp, prow_job_run_release); | ||
| ``` | ||
|
|
||
| ### prow_job_run_test_outputs | ||
|
|
||
| ```sql | ||
| CREATE INDEX CONCURRENTLY IF NOT EXISTS idx_prow_job_run_test_outputs_release_timestamp | ||
| ON prow_job_run_test_outputs (prow_job_run_test_timestamp, prow_job_run_test_release); | ||
| ``` | ||
|
|
||
| ### prow_job_run_prow_pull_requests | ||
|
|
||
| ```sql | ||
| CREATE INDEX CONCURRENTLY IF NOT EXISTS idx_prow_job_run_prow_pull_requests_release_timestamp | ||
| ON prow_job_run_prow_pull_requests (prow_job_run_release, prow_job_run_timestamp); | ||
| ``` | ||
|
|
||
| ### prow_job_run_annotations | ||
|
|
||
| ```sql | ||
| CREATE INDEX CONCURRENTLY IF NOT EXISTS idx_prow_job_run_annotations_release_timestamp | ||
| ON prow_job_run_annotations (prow_job_run_release, prow_job_run_timestamp); | ||
| ``` | ||
|
|
||
| ## Notes | ||
|
|
||
| - **Safe to create before deploying model updates.** GORM `AutoMigrate` | ||
| only adds — it never drops indexes, columns, or tables it doesn't | ||
| recognize. Indexes created manually will persist through any number of | ||
| `AutoMigrate` runs on the old model. Once the updated model with index | ||
| tags is deployed, `AutoMigrate` sees the indexes already exist and | ||
| skips them. There is no rollback risk. | ||
| - `CONCURRENTLY` avoids taking an exclusive lock on the table, allowing | ||
| reads and writes to continue during index creation. It is slower but | ||
| safe for production use. | ||
| - If the index already exists (e.g., GORM created it during a prior | ||
| migration), `IF NOT EXISTS` makes the statement a no-op. | ||
| - GORM `AutoMigrate` does **not** use `CONCURRENTLY` — it takes a brief | ||
| lock. On large tables this can block writes for the duration of the | ||
| index build. For production deployments, prefer creating the indexes | ||
| manually with the SQL above ahead of the code deploy, so that | ||
| `AutoMigrate` finds them already in place. | ||
| - Column order in the index matches the expected query pattern: most | ||
| queries filter on release first (equality), then timestamp (range). | ||
| The `prow_job_run_tests` index leads with timestamp because the | ||
| materialized view queries filter primarily on timestamp ranges. | ||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,172 @@ | ||
| # TRT-1989 Phase 3: Query Optimization Using Denormalized Columns | ||
|
|
||
| **Date:** 2026-05-19 | ||
| **JIRA:** [TRT-1989](https://redhat.atlassian.net/browse/TRT-1989) | ||
| **Depends on:** Phase 1 (column prep), Phase 2 (indexes) | ||
|
|
||
| ## Purpose | ||
|
|
||
| Phase 1 added denormalized `release` and `timestamp` columns to child | ||
| tables (`prow_job_run_tests`, `prow_job_run_test_outputs`, | ||
| `prow_job_run_prow_pull_requests`, `prow_job_run_annotations`). Phase 2 | ||
| added composite indexes on those columns. | ||
|
|
||
| Nearly every significant query in sippy filters on | ||
| `prow_job_runs.timestamp` and/or `prow_jobs.release` via joins. Once these | ||
| tables are partitioned, those join-based filters **won't help the planner | ||
| prune child table partitions** — the planner needs WHERE clauses on each | ||
| partitioned table's own partition key columns. | ||
|
|
||
| This phase adds filters on the denormalized columns and drops joins where | ||
| all referenced columns have local replacements. This is safe to ship | ||
| before partitioning — the extra WHERE clauses let the planner use the | ||
| composite indexes from Phase 2. After partitioning, they become the | ||
| primary mechanism for partition pruning. | ||
|
|
||
| ## Guiding Principles | ||
|
|
||
| 1. **Add filters first, then replace when validated** — keep existing | ||
| join-based filters alongside new local filters during rollout. | ||
| After local denormalized columns are validated, replace old filters | ||
| and drop no-longer-needed joins where safe. | ||
|
|
||
| 2. **Drop joins only when safe** — a join can be dropped only if *every* | ||
| column it provides (in SELECT, WHERE, GROUP BY, ORDER BY, FILTER) has | ||
| a local replacement. | ||
|
|
||
| 3. **Materialized views use `|||TIMENOW|||` templates** — filters added | ||
| to mat views must use the same template tokens, not `$1`-style params. | ||
|
|
||
| 4. **SQL functions use `$N` params** — new WHERE clauses reuse existing | ||
| params from the function signature. | ||
|
|
||
| 5. **GORM queries use `?` placeholders** — pass the same Go variables | ||
| already available in the function scope. | ||
|
|
||
| ## Changes by Query | ||
|
|
||
| ### Group A: Queries starting from `prow_job_run_tests` | ||
|
|
||
| #### A1. `prowJobFailedTestsMatView` — `pkg/db/views.go` | ||
|
|
||
| Rewritten to start from `prow_job_run_tests`. Replaced | ||
| `prow_job_runs."timestamp"` with `pjrt.prow_job_run_timestamp` and | ||
| `prow_job_runs.prow_job_id` with `pjrt.prow_job_id`. **Dropped JOIN | ||
| `prow_job_runs`**. | ||
|
|
||
| #### A2. `testAnalysisByJobMatView` — `pkg/db/views.go` | ||
|
|
||
| Replaced all `prow_job_runs."timestamp"` references with | ||
| `prow_job_run_tests.prow_job_run_timestamp`. Replaced `prow_jobs.release` | ||
| with `prow_job_run_tests.prow_job_run_release`. **Dropped JOIN | ||
| `prow_job_runs`**. Rewired JOIN `prow_jobs` via | ||
| `prow_job_run_tests.prow_job_id`. | ||
|
|
||
| #### A3. `testReportMatView` — `pkg/db/views.go` | ||
|
|
||
| Replaced all `prow_job_runs."timestamp"` in WHERE and FILTER clauses with | ||
| `prow_job_run_tests.prow_job_run_timestamp`. Replaced `prow_jobs.release` | ||
| with `prow_job_run_tests.prow_job_run_release`. **Dropped JOIN | ||
| `prow_job_runs`**. Rewired JOIN `prow_jobs` via | ||
| `prow_job_run_tests.prow_job_id`. JOIN `prow_jobs` kept for | ||
| `prow_jobs.variants`. | ||
|
|
||
| #### A4. `test_results()` function — `pkg/db/functions.go` | ||
|
|
||
| Added `WHERE prow_job_run_tests.prow_job_run_timestamp BETWEEN $1 AND $3` | ||
| to limit the scan. Replaced `timestamp` in all CASE expressions with | ||
| `prow_job_run_tests.prow_job_run_timestamp`. Replaced `prow_jobs.release` | ||
| with `prow_job_run_tests.prow_job_run_release`. **Dropped JOINs | ||
| `prow_job_runs` and `prow_jobs`**. | ||
|
|
||
| #### A5. `ProwJobHistoricalTestCounts` — `pkg/db/query/job_queries.go` | ||
|
|
||
| Replaced `prow_job_runs.prow_job_id` with | ||
| `prow_job_run_tests.prow_job_id` and `prow_job_runs.timestamp` with | ||
| `prow_job_run_tests.prow_job_run_timestamp`. **Dropped JOIN | ||
| `prow_job_runs`**. | ||
|
|
||
| #### A6. `GetRecentTestFailures` — `pkg/api/recent_test_failures.go` | ||
|
|
||
| Added redundant local filters (`prow_job_run_tests.prow_job_run_timestamp` | ||
| and `prow_job_run_tests.prow_job_run_release`) to all four queries: | ||
| main query, NOT EXISTS subquery, last-pass lookback, and failure outputs. | ||
| Joins kept — `prow_job_runs` still needed for `timestamp` in SELECT and | ||
| `url`; `prow_jobs` still needed for `name`. | ||
|
|
||
| #### A7. `testStatusQuery` (CR) — `pkg/api/componentreadiness/.../provider.go` | ||
|
|
||
| Added `pjrt.prow_job_run_release = ?` and `pjrt.prow_job_run_timestamp` | ||
| range filters to the CTE WHERE clause. Joins kept — `prow_job_runs` | ||
| needed for `labels`, `prow_jobs` needed for variant lookup. | ||
|
|
||
| #### A8. `testDetailQuery` (CR) — `pkg/api/componentreadiness/.../provider.go` | ||
|
|
||
| Same pattern as A7 — added local release and timestamp filters. Joins | ||
| kept — `pjr.url`, `pjr.timestamp`, `pjr.labels`, `pj.name`, `pj.id` | ||
| still needed in SELECT. | ||
|
|
||
| #### A9. `payloadTestFailuresMatView` — `pkg/db/views.go` | ||
|
|
||
| Added `pjrt.prow_job_run_timestamp > (|||TIMENOW||| - '14 days'::interval)` | ||
| to WHERE. Joins kept — `release_tags`, `release_job_runs`, `prow_jobs`, | ||
| `prow_job_runs` still needed for other columns. | ||
|
|
||
| ### Group B: Queries starting from `prow_job_run_test_outputs` | ||
|
|
||
| #### B1. `TestOutputs` — `pkg/db/query/test_queries.go` | ||
|
|
||
| Added `prow_job_run_test_outputs.prow_job_run_test_timestamp`, | ||
| `prow_job_run_test_outputs.prow_job_run_test_release`, | ||
| `prow_job_run_tests.prow_job_run_timestamp`, and | ||
| `prow_job_run_tests.prow_job_run_release` filters. Joins kept — | ||
| `prow_job_runs` for URL, `prow_jobs` for variants. | ||
|
|
||
| #### B2. `TestDurations` — `pkg/db/query/test_queries.go` | ||
|
|
||
| Replaced `prow_job_runs.timestamp` filter with | ||
| `prow_job_run_tests.prow_job_run_timestamp`. Replaced ambiguous | ||
| `"timestamp"` in SELECT/GROUP BY/ORDER BY with explicit | ||
| `prow_job_run_tests.prow_job_run_timestamp`. Replaced `prow_jobs.release` | ||
| with `prow_job_run_tests.prow_job_run_release`. **Dropped JOIN | ||
| `prow_job_runs`**. JOIN `prow_jobs` rewired via | ||
| `prow_job_run_tests.prow_job_id` (needed for variants). | ||
|
|
||
| ### Group C: Queries on `prow_job_runs` directly | ||
|
|
||
| #### C1. `BuildClusterHealth` — `pkg/db/query/build_clusters.go` | ||
|
|
||
| Added `WHERE prow_job_runs.timestamp BETWEEN @start AND @end` so the | ||
| planner can use the timestamp index to limit the scan. The `@start` and | ||
| `@end` params already exist in the function signature. | ||
|
|
||
| #### C2-C4. No changes | ||
|
|
||
| - `BuildClusterAnalysis` — already has timestamp in WHERE, cross-release | ||
| - `HasBuildClusterData` — existence check, timestamp bound would be wrong | ||
| - `ProwJobRunIDs` — simple lookup, already indexed | ||
|
|
||
| ### Group D: SQL functions with PR join tables | ||
|
|
||
| #### D1. `job_results()` — `pkg/db/functions.go` | ||
|
|
||
| - **`repo_org_jobs` CTE**: Added `WHERE prow_job_runs.prow_job_release = $1` | ||
| - **`merged_prs` CTE**: Added `AND prow_job_runs.timestamp BETWEEN $2 AND $4` | ||
| - **`results` CTE**: Added `WHERE prow_job_runs.prow_job_release = $1` | ||
| - **`last_pass` CTE**: No change — intentionally cross-release | ||
|
|
||
| #### D2. `jobRunsReportMatView` — No change | ||
|
|
||
| The CTEs materialize all data. Adding filters would require | ||
| parameterizing the mat view. Deferred to a future change. | ||
|
|
||
| ## Joins Dropped Summary | ||
|
|
||
| | Query | Join dropped | Columns replaced | | ||
| |-------|-------------|-----------------| | ||
| | `prowJobFailedTestsMatView` | `prow_job_runs` | `timestamp` → `pjrt.prow_job_run_timestamp`, `prow_job_id` → `pjrt.prow_job_id` | | ||
| | `testAnalysisByJobMatView` | `prow_job_runs` | `timestamp` → local, `prow_job_id` → local; `prow_jobs` rewired via `prow_job_run_tests.prow_job_id` | | ||
| | `testReportMatView` | `prow_job_runs` | `timestamp` → local; `prow_jobs` rewired via `prow_job_run_tests.prow_job_id` | | ||
| | `test_results()` | `prow_job_runs` + `prow_jobs` | `timestamp` → local, `release` → local | | ||
| | `ProwJobHistoricalTestCounts` | `prow_job_runs` | `prow_job_id` → local, `timestamp` → local | | ||
| | `TestDurations` | `prow_job_runs` | `timestamp` → local; `prow_jobs` rewired via `prow_job_run_tests.prow_job_id` | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Correct the model/table reference for the standalone index.
The text points to
ProwJobRunTest.ProwJobID, but this index is onprow_job_run_tests(ProwJobRunTestsmodel field), notprow_job_runs.Proposed doc fix
📝 Committable suggestion
🤖 Prompt for AI Agents