perf(storage): conditional MATERIALIZED CTE fence for ListTransactions#1409
perf(storage): conditional MATERIALIZED CTE fence for ListTransactions#1409sylr wants to merge 4 commits into
Conversation
A selective wallet-history ListTransactions attached ORDER BY id DESC LIMIT n to the same select as a JSONB @> filter, so the planner chose an abort-early transactions_id_desc walk that scanned ~2.3M rows (16.1s) to return 16. Wrapping the filtered dataset in a MATERIALIZED CTE and moving ORDER BY + LIMIT to the outer select lets the planner pick the GIN BitmapOr over the filtered set (~7.7ms, verified on prod read-replica). The fence is applied conditionally, only when the filter contains a "needle" containment predicate (account/source/destination/metadata), via the new opt-in DatasetFencer interface implemented by the transactions handler. Unfiltered/range-only lists keep the historical non-materialized shape, where abort-early is the faster plan. The Paginator contract is split into ApplyCursorPredicate (keyset/offset predicate that stays inside the fence) and ApplyWindow (LIMIT/OFFSET that move to the outer select); the existing outer ORDER BY remains the single order source and is applied before the window. Non-fence SQL is unchanged. Constraint: JSONB @> selectivity + value<->id correlation can't be fixed by statistics or extended stats; pg_hint_plan not installed Constraint: GIN indexes on sources/destinations/metadata are schema migrations (43/51/52), present on all migrated clusters Rejected: Blanket-wrap every list in a MATERIALIZED CTE | regresses broad/unfiltered lists (materializes millions then sorts) Rejected: Runtime cost probe to detect selectivity | static filter-shape decision is simpler and sufficient per findings doc Confidence: high Scope-risk: moderate Directive: ApplyWindow must NOT emit ORDER BY — the caller applies the qualified outer ORDER BY first; do not fold ordering back in Directive: The fenced dataset CTE must never carry an inner LIMIT/OFFSET, or the abort-early walk re-triggers Directive: Do not extend the moved-LIMIT pattern to a fan-out (non-1:1) expand without re-checking; effectiveVolumes is 1:1 per tx Not-tested: negated-only metadata filter (NOT metadata @>) and non-selective needle values (account="world") fence without benefit but never change results
WalkthroughSeparates cursor predicate application from LIMIT/OFFSET in the paginator contract, adds a DatasetFencer interface to optionally emit a MATERIALIZED dataset CTE, rewires repository pagination to choose fenced vs unfenced SQL shapes, implements transaction fencing heuristics, and adds tests and mocks validating behavior. ChangesDataset Fencing for Paginated Queries
Sequence Diagram(s)sequenceDiagram
participant Caller
participant Paginate
participant resolveBuildContext
participant DatasetFencer
participant buildFilteredDataset
participant expand
participant Paginator
Caller->>Paginate: Paginate(query)
Paginate->>resolveBuildContext: resolve and validate filters
resolveBuildContext-->>Paginate: BuildContext
Paginate->>DatasetFencer: ShouldFenceDataset(context)?
alt Fenced Path
DatasetFencer-->>Paginate: true
Paginate->>buildFilteredDataset: build with context
buildFilteredDataset-->>Paginate: filtered dataset query
Paginate->>expand: expand(WITH MATERIALIZED)
expand-->>Paginate: materialized CTE
Paginate->>Paginator: ApplyCursorPredicate
Paginator-->>Paginate: cursor-filtered query
Paginate->>Paginator: ApplyWindow
Paginator-->>Paginate: windowed query
Paginate->>Paginate: Final qualified ORDER BY
else Unfenced Path
DatasetFencer-->>Paginate: false
Paginate->>buildFilteredDataset: build with context
buildFilteredDataset-->>Paginate: filtered dataset query
Paginate->>expand: expand(non-materialized)
expand-->>Paginate: CTE
Paginate->>Paginator: Paginate (ORDER BY + ApplyWindow)
Paginator-->>Paginate: ordered, windowed query
Paginate->>Paginate: Final qualified ORDER BY
end
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Possibly related PRs
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Warning There were issues while running some tools. Please review the errors and either fix the tool's configuration or disable the tool if it's a critical failure. 🔧 golangci-lint (2.12.2)level=error msg="[linters_context] typechecking error: pattern ./...: directory prefix . does not contain main module or its selected dependencies" Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## main #1409 +/- ##
==========================================
- Coverage 80.55% 78.79% -1.77%
==========================================
Files 206 206
Lines 11293 11382 +89
==========================================
- Hits 9097 8968 -129
- Misses 1646 1696 +50
- Partials 550 718 +168 ☔ View full report in Codecov by Harness. 🚀 New features to boost your workflow:
|
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (1)
internal/storage/common/resource.go (1)
411-444: 💤 Low valueClarify the dual ORDER BY application in the unfenced path.
In the unfenced path (lines 419-426),
paginator.Paginateapplies cursor predicate + ORDER BY + LIMIT inside the dataset, then line 433-435 applies ORDER BY again on the outer select after expand joins. While this is intentional (per the comment on lines 431-432), the double ordering could be confusing or wasteful if the inner ORDER BY is redundant.The comment states "This is the single source of the final ORDER BY for both paths," but in the unfenced path there are actually two ORDER BY clauses. Consider clarifying whether the inner ORDER BY is required for abort-early optimization or if the outer one alone would suffice.
💡 Suggested clarification
Add a comment explaining why both ORDER BYs are needed in the unfenced path:
// cannot introduce an ambiguous column name (e.g. a future expand that also // selects "id"). No expand today conflicts, but this makes it safe by construction. + // NOTE: In the unfenced path, ORDER BY is applied twice: once inside the dataset CTE + // (by Paginate, combined with LIMIT for abort-early index walk) and once here on the + // outer select (to ensure correct ordering after any expand LEFT JOINs). The inner + // ORDER BY is required for the query planner's abort-early optimization. // This is the single source of the final ORDER BY for both paths; in the fenced path it🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@internal/storage/common/resource.go` around lines 411 - 444, The comment is unclear about why the unfenced path has two ORDER BYs (paginator.Paginate applies an inner ORDER BY and later finalQuery.Order applies an outer ORDER BY using paginator.OrderExpression); update the code comments to explicitly state why the inner ORDER BY from paginator.Paginate is necessary (e.g., for correct cursor predicate evaluation or abort-early optimization inside the dataset) and that the outer finalQuery.Order (using paginator.OrderExpression and applied after r.expand) is the canonical final ordering; reference the symbols paginator.Paginate, paginator.OrderExpression, finalQuery.Order, paginator.ApplyWindow and r.expand so reviewers can find the logic and make it clear that the inner ORDER BY is intentional and not redundant.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@internal/storage/ledger/resource_transactions_test.go`:
- Around line 37-41: The test case named "metadata needle" uses a map value for
the "metadata" filter but ResolveFilter expects a string key for metadata (see
ResolveFilter and the metadata property check). Update the test's filters entry
from map[string]any{"wallet_id": "x"} to just the string "wallet_id" so the
filter value matches the expected semantics (i.e., filters:
map[string][]any{"metadata": {"wallet_id"}}) and keep the rest of the test logic
unchanged.
---
Nitpick comments:
In `@internal/storage/common/resource.go`:
- Around line 411-444: The comment is unclear about why the unfenced path has
two ORDER BYs (paginator.Paginate applies an inner ORDER BY and later
finalQuery.Order applies an outer ORDER BY using paginator.OrderExpression);
update the code comments to explicitly state why the inner ORDER BY from
paginator.Paginate is necessary (e.g., for correct cursor predicate evaluation
or abort-early optimization inside the dataset) and that the outer
finalQuery.Order (using paginator.OrderExpression and applied after r.expand) is
the canonical final ordering; reference the symbols paginator.Paginate,
paginator.OrderExpression, finalQuery.Order, paginator.ApplyWindow and r.expand
so reviewers can find the logic and make it clear that the inner ORDER BY is
intentional and not redundant.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: c3766c71-10aa-448e-9b16-bf8936cd0b9f
📒 Files selected for processing (7)
internal/storage/common/paginator.gointernal/storage/common/paginator_column.gointernal/storage/common/paginator_offset.gointernal/storage/common/resource.gointernal/storage/ledger/resource_transactions.gointernal/storage/ledger/resource_transactions_fence_test.gointernal/storage/ledger/resource_transactions_test.go
The new DatasetFencer interface in storage/common/resource.go is picked up by `go generate` (mocks_test.go is generated from resource.go), so the generated mock must be committed or the `Dirty` CI check fails. Directive: mocks_test.go is generated from internal/storage/common/resource.go — re-run `just generate` (mockgen) after changing interfaces there, do not hand-edit
…case Addresses a CodeRabbit review note: the metadata filter value should be a string key, not a map. The fence decision only checks filter presence, so behavior is unchanged — this is for semantic accuracy of the test fixture.
Example: how a selective
|
Addresses @flemzord's review on #1409: in the fenced path the `dataset` CTE held the entire materialized filtered set, and the `effectiveVolumes` expand (`transactions_id in (select id from dataset)`) therefore aggregated over every matching transaction to return a single page — linear in account history, even for a selective (but busy) account, not just the documented account="world" case. Restructure the fenced shape so the MATERIALIZED fence and the page window are separate, nested CTEs: WITH dataset AS ( WITH filtered AS MATERIALIZED (<filter + PIT + keyset>) SELECT * FROM filtered ORDER BY id <dir> LIMIT pageSize+1 ) SELECT * FROM dataset LEFT JOIN <expands on "select id from dataset"> ... ORDER BY dataset.id <dir> The LIMIT stays outside the *materialized* CTE (so the planner still evaluates `filtered` once via the GIN BitmapOr and the page is a cheap top-N over it), but now lives inside the `dataset` CTE the expands reference — so expand work is bounded to the page again, restoring the implicit "dataset = rows returned" contract. The expand() helper drops its now-unused `materialized` parameter. Constraint: effectiveVolumes expand references "select id from dataset" — the page LIMIT must be inside that CTE or the expand scans the full filtered set Rejected: Keep LIMIT on the outermost select | expands then aggregate over the whole filtered set (the bug this fixes) Confidence: high Scope-risk: moderate Directive: do not move the page ORDER BY/LIMIT back to the outer select — it must stay inside the "dataset" CTE so expands see only the page
Addresses @flemzord's review on #1409: in the fenced path the `dataset` CTE held the entire materialized filtered set, and the `effectiveVolumes` expand (`transactions_id in (select id from dataset)`) therefore aggregated over every matching transaction to return a single page — linear in account history, even for a selective (but busy) account, not just the documented account="world" case. Restructure the fenced shape so the MATERIALIZED fence and the page window are separate, nested CTEs: WITH dataset AS ( WITH filtered AS MATERIALIZED (<filter + PIT + keyset>) SELECT * FROM filtered ORDER BY id <dir> LIMIT pageSize+1 ) SELECT * FROM dataset LEFT JOIN <expands on "select id from dataset"> ... ORDER BY dataset.id <dir> The LIMIT stays outside the *materialized* CTE (so the planner still evaluates `filtered` once via the GIN BitmapOr and the page is a cheap top-N over it), but now lives inside the `dataset` CTE the expands reference — so expand work is bounded to the page again, restoring the implicit "dataset = rows returned" contract. The expand() helper drops its now-unused `materialized` parameter. Constraint: effectiveVolumes expand references "select id from dataset" — the page LIMIT must be inside that CTE or the expand scans the full filtered set Rejected: Keep LIMIT on the outermost select | expands then aggregate over the whole filtered set (the bug this fixes) Confidence: high Scope-risk: moderate Directive: do not move the page ORDER BY/LIMIT back to the outer select — it must stay inside the "dataset" CTE so expands see only the page
There was a problem hiding this comment.
🧹 Nitpick comments (1)
internal/storage/common/resource.go (1)
116-127: ⚡ Quick winUpdate the fencing docs to match the nested-CTE shape.
Lines 120-126 and Lines 393-395 still describe the older “ORDER BY + LIMIT on the outer select” plan. The code now pages in the intermediate
datasetselect afterfiltered AS MATERIALIZEDand beforeexpand(...), so these comments no longer match the contract that keeps expands page-bounded.Also applies to: 393-395
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@internal/storage/common/resource.go` around lines 116 - 127, Update the DatasetFencer docs to reflect the new nested-CTE paging shape: change the description around DatasetFencer and ShouldFenceDataset (currently describing "ORDER BY + LIMIT on the outer select") to explain that the code now materializes an inner filtered CTE (e.g., filtered AS MATERIALIZED), then pages from the intermediate dataset select before expands, so the ORDER BY + LIMIT apply inside the nested CTE that bounds the dataset prior to expand(...) rather than on the outermost select; update the explanatory example SQL and the text at the DatasetFencer comment (and the similar text near lines referenced for the other occurrence) to describe the nested-CTE form and that expands remain page-bounded.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Nitpick comments:
In `@internal/storage/common/resource.go`:
- Around line 116-127: Update the DatasetFencer docs to reflect the new
nested-CTE paging shape: change the description around DatasetFencer and
ShouldFenceDataset (currently describing "ORDER BY + LIMIT on the outer select")
to explain that the code now materializes an inner filtered CTE (e.g., filtered
AS MATERIALIZED), then pages from the intermediate dataset select before
expands, so the ORDER BY + LIMIT apply inside the nested CTE that bounds the
dataset prior to expand(...) rather than on the outermost select; update the
explanatory example SQL and the text at the DatasetFencer comment (and the
similar text near lines referenced for the other occurrence) to describe the
nested-CTE form and that expands remain page-bounded.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: 78846a6d-dcd9-4239-8414-41062bc1cdc4
📒 Files selected for processing (2)
internal/storage/common/resource.gointernal/storage/ledger/resource_transactions_fence_test.go
🚧 Files skipped from review as they are similar to previous changes (1)
- internal/storage/ledger/resource_transactions_fence_test.go
Problem
A selective wallet-history
ListTransactionsattachedORDER BY id DESC LIMIT nto the same select as a JSONB@>filter, so the planner chose an abort-earlytransactions_id_descindex walk that scanned ~2.3M rows (16.1s) to return 16. Statistics can't fix this (JSONB@>selectivity over-estimate + a value↔id correlation no stats object can express).Fix
Wrap the filtered dataset in a
MATERIALIZEDCTE (optimization fence) and moveORDER BY id DESC LIMIT nto the outer select. The planner then uses the GINBitmapOrover the filtered set and theLIMITbecomes a trivial top-N — ~7.7ms (~2,000×), verified on a prod read-replica. No schema change; the required GIN indexes already exist (migrations 43/51/52).The fence is conditional — applied only when the filter contains a "needle" containment predicate (
account/source/destination/metadata), via a new opt-inDatasetFencerinterface implemented only by the transactions handler. Unfiltered/range-only lists keep the historical non-materialized shape, where abort-early is the faster plan.How
DatasetFencer[Opts]opt-in interface;transactionsResourceHandler.ShouldFenceDatasetimplements the static needle rule.Paginatorsplit intoApplyCursorPredicate(keyset/offset predicate stays inside the fenced CTE) andApplyWindow(LIMIT/OFFSETon the outer select, noORDER BY). The existing outerORDER BYremains the single order source and is applied before the window. Non-fence SQL is unchanged.expand()gains amaterialized bool(usesbun.NewWithQuery(...).Materialized());GetOne/Countare unaffected (passfalse).Tests
ShouldFenceDataset(needle → fence; id/timestamp/reference/reverted → no fence).bun.QueryHookassertingAS MATERIALIZEDpresent for account/metadata filters and absent for unfiltered/timestamp-only; multi-pagenext/previoustraversal under the fence;effectiveVolumesunder the fence. ExistingTestTransactionsList(every filtered case now runs through the fence) still passes.Known limitation (correctness-safe)
A negated containment (
NOT metadata @> {x}) or a non-selective needle value (account="world") may fence without benefit, but never changes the result. Deliberately deferred per a static-vs-cost-probe trade-off.