Skip to content

Optimize brain-region filter queries with a materialized CTE#631

Open
DriesVerachtert wants to merge 1 commit into
mainfrom
brain_region_filters_with_cte
Open

Optimize brain-region filter queries with a materialized CTE#631
DriesVerachtert wants to merge 1 commit into
mainfrom
brain_region_filters_with_cte

Conversation

@DriesVerachtert

Copy link
Copy Markdown
Contributor

When filtering by within_brain_region, the PostgreSQL planner previously chose to scan all public entities via the ix_entity_public_creation_date_id partial index (driven by the ORDER BY creation_date DESC + LIMIT), then join into the brain-region recursive CTE — resulting in O(all_public_entities × CTE_rows) comparisons.

The fix introduces a MATERIALIZED CTE (candidate_artifacts) that pre-computes the set of matching entity IDs and their creation_date values before the outer query runs. Because a materialized CTE acts as an optimization fence, the planner cannot inline it and is forced to evaluate it first.

When the primary sort is creation_date DESC (the default for all affected endpoints), the ORDER BY is redirected to reference candidate_artifacts.creation_date instead of entity.creation_date. This makes the planner start from the small pre-filtered CTE rather than from the full entity index.

Changes:

  • filter_by_region now builds and returns the MATERIALIZED CTE alongside the updated query (was a bare recursive-CTE join before).
  • InBrainRegionQuery stores the returned CTE via PrivateAttr and exposes it as candidate_cte so callers can reference it after the filter is applied.
  • router_read_many always applies filter_model.sort() first, then replaces the ORDER BY with CTE-based columns only when the primary sort is creation_date DESC.

https://github.com/openbraininstitute/prod-platform-architecture/issues/216

@DriesVerachtert DriesVerachtert force-pushed the brain_region_filters_with_cte branch from 89fc777 to 535cd13 Compare June 11, 2026 07:22
@codecov

codecov Bot commented Jun 11, 2026

Copy link
Copy Markdown

Codecov Report

❌ Patch coverage is 89.47368% with 2 lines in your changes missing coverage. Please review.

Files with missing lines Patch % Lines
app/queries/common.py 77.77% 1 Missing and 1 partial ⚠️
Flag Coverage Δ
pytest 97.68% <89.47%> (-0.02%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Files with missing lines Coverage Δ
app/dependencies/common.py 100.00% <100.00%> (ø)
app/filters/brain_region.py 100.00% <100.00%> (ø)
app/queries/common.py 97.10% <77.77%> (-1.08%) ⬇️
🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@DriesVerachtert DriesVerachtert force-pushed the brain_region_filters_with_cte branch from 535cd13 to a81a220 Compare June 11, 2026 07:53
When filtering by within_brain_region, the PostgreSQL planner previously chose to scan all public entities via the ix_entity_public_creation_date_id partial index (driven by the ORDER BY creation_date DESC + LIMIT), then join into the brain-region recursive CTE — resulting in O(all_public_entities × CTE_rows) comparisons.

The fix introduces a MATERIALIZED CTE (candidate_artifacts) that pre-computes the set of matching entity IDs and their creation_date values before the outer query runs. Because a materialized CTE acts as an optimization fence, the planner cannot inline it and is forced to evaluate it first.

When the primary sort is creation_date DESC (the default for all affected endpoints), the ORDER BY is redirected to reference candidate_artifacts.creation_date instead of entity.creation_date. This makes the planner start from the small pre-filtered CTE rather than from the full entity index.

Changes:
- filter_by_region now builds and returns the MATERIALIZED CTE alongside the updated query (was a bare recursive-CTE join before).
- InBrainRegionQuery stores the returned CTE via PrivateAttr and exposes it as candidate_cte so callers can reference it after the filter is applied.
- router_read_many always applies filter_model.sort() first, then replaces the ORDER BY with CTE-based columns only when the primary sort is creation_date DESC.

openbraininstitute/prod-platform-architecture#216
@DriesVerachtert DriesVerachtert force-pushed the brain_region_filters_with_cte branch from a81a220 to f3a9b4e Compare June 11, 2026 10:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant