Skip to content

perf(storage): conditional MATERIALIZED CTE fence for ListTransactions (backport v2.4)#1410

Draft
sylr wants to merge 4 commits into
release/v2.4from
backport/v2.4/materialized-cte-fence-listtransactions
Draft

perf(storage): conditional MATERIALIZED CTE fence for ListTransactions (backport v2.4)#1410
sylr wants to merge 4 commits into
release/v2.4from
backport/v2.4/materialized-cte-fence-listtransactions

Conversation

@sylr

@sylr sylr commented Jun 11, 2026

Copy link
Copy Markdown
Contributor

Backport of #1409 to release/v2.4.

Problem

A selective wallet-history ListTransactions attached ORDER BY id DESC LIMIT n to the same select as a JSONB @> filter, so the planner chose an abort-early transactions_id_desc index walk that scanned ~2.3M rows (16.1s) to return 16.

Fix

Wrap the filtered dataset in a MATERIALIZED CTE and move ORDER BY id DESC LIMIT n to the outer select, so the planner uses the GIN BitmapOr (~7.7ms, ~2,000×). The fence is conditional — applied only when the filter contains a "needle" containment predicate (account/source/destination/metadata) via the opt-in DatasetFencer interface; unfiltered/range-only lists keep the historical shape. No schema change (GIN indexes already exist).

Backport notes

  • Cherry-picked cleanly from main (PR perf(storage): conditional MATERIALIZED CTE fence for ListTransactions #1409, commit 801a5619); the only auto-merge was context in paginator.go.
  • Verified on the v2.4 base: go build, go vet, and the fence integration tests (TestListTransactionsMaterializedFence, TestListTransactionsFencedPagination, TestListTransactionsFencedWithEffectiveVolumes, TestShouldFenceTransactionsDataset) all pass.

See #1409 for full details and the tri-model review.

A selective wallet-history ListTransactions attached ORDER BY id DESC
LIMIT n to the same select as a JSONB @> filter, so the planner chose an
abort-early transactions_id_desc walk that scanned ~2.3M rows (16.1s) to
return 16. Wrapping the filtered dataset in a MATERIALIZED CTE and moving
ORDER BY + LIMIT to the outer select lets the planner pick the GIN
BitmapOr over the filtered set (~7.7ms, verified on prod read-replica).

The fence is applied conditionally, only when the filter contains a
"needle" containment predicate (account/source/destination/metadata),
via the new opt-in DatasetFencer interface implemented by the
transactions handler. Unfiltered/range-only lists keep the historical
non-materialized shape, where abort-early is the faster plan.

The Paginator contract is split into ApplyCursorPredicate (keyset/offset
predicate that stays inside the fence) and ApplyWindow (LIMIT/OFFSET that
move to the outer select); the existing outer ORDER BY remains the single
order source and is applied before the window. Non-fence SQL is unchanged.

Constraint: JSONB @> selectivity + value<->id correlation can't be fixed by statistics or extended stats; pg_hint_plan not installed
Constraint: GIN indexes on sources/destinations/metadata are schema migrations (43/51/52), present on all migrated clusters
Rejected: Blanket-wrap every list in a MATERIALIZED CTE | regresses broad/unfiltered lists (materializes millions then sorts)
Rejected: Runtime cost probe to detect selectivity | static filter-shape decision is simpler and sufficient per findings doc
Confidence: high
Scope-risk: moderate
Directive: ApplyWindow must NOT emit ORDER BY — the caller applies the qualified outer ORDER BY first; do not fold ordering back in
Directive: The fenced dataset CTE must never carry an inner LIMIT/OFFSET, or the abort-early walk re-triggers
Directive: Do not extend the moved-LIMIT pattern to a fan-out (non-1:1) expand without re-checking; effectiveVolumes is 1:1 per tx
Not-tested: negated-only metadata filter (NOT metadata @>) and non-selective needle values (account="world") fence without benefit but never change results
@sylr sylr requested a review from a team as a code owner June 11, 2026 14:36
@codecov

codecov Bot commented Jun 11, 2026

Copy link
Copy Markdown

Codecov Report

❌ Patch coverage is 53.57143% with 39 lines in your changes missing coverage. Please review.
✅ Project coverage is 79.80%. Comparing base (52a9943) to head (fc36949).

Files with missing lines Patch % Lines
internal/storage/common/resource.go 54.54% 8 Missing and 12 partials ⚠️
internal/storage/common/paginator_column.go 51.85% 3 Missing and 10 partials ⚠️
internal/storage/ledger/resource_transactions.go 55.55% 2 Missing and 2 partials ⚠️
internal/storage/common/paginator_offset.go 50.00% 2 Missing ⚠️
Additional details and impacted files
@@               Coverage Diff                @@
##           release/v2.4    #1410      +/-   ##
================================================
- Coverage         80.20%   79.80%   -0.40%     
================================================
  Files               206      206              
  Lines             11280    11324      +44     
================================================
- Hits               9047     9037      -10     
- Misses             1598     1603       +5     
- Partials            635      684      +49     

☔ View full report in Codecov by Harness.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@coderabbitai

coderabbitai Bot commented Jun 11, 2026

Copy link
Copy Markdown
Contributor

Review Change Stack

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 1d474395-860c-4c71-870f-c3665fbb5fe2

📥 Commits

Reviewing files that changed from the base of the PR and between 7806522 and fc36949.

📒 Files selected for processing (2)
  • internal/storage/common/resource.go
  • internal/storage/ledger/resource_transactions_fence_test.go

Walkthrough

Splits pagination into cursor-keyset predicate vs LIMIT/OFFSET window, adds an opt-in DatasetFencer to emit MATERIALIZED dataset CTEs for fenced queries, refactors ResourceRepository to resolve build context and choose fenced vs unfenced pagination flows, implements transaction fencing heuristics, and adds tests and a gomock.

Changes

Dataset fencing and pagination refactoring

Layer / File(s) Summary
Paginator interface and method split
internal/storage/common/paginator.go, internal/storage/common/paginator_column.go, internal/storage/common/paginator_offset.go
Paginator interface extended with ApplyCursorPredicate (cursor-keyset filtering only) and ApplyWindow (LIMIT/OFFSET sizing only). columnPaginator and OffsetPaginator refactored to use these methods instead of building the whole pagination query in Paginate.
ResourceRepository fencing and query refactoring
internal/storage/common/resource.go
Adds DatasetFencer interface and resolveBuildContext to validate filters once. buildFilteredDataset accepts the resolved context. Paginate now decides fenced (cursor predicate inside materialized dataset, outer window applied after ORDER BY) vs unfenced flows; GetOne and Count use resolved build context.
Transaction-specific fencing logic
internal/storage/ledger/resource_transactions.go
Implements ShouldFenceDataset for transactionsResourceHandler using shouldFenceTransactionsDataset that detects account/source/destination/metadata filters.
Fencing behavior tests and mocks
internal/storage/ledger/resource_transactions_fence_test.go, internal/storage/ledger/resource_transactions_test.go, internal/controller/ledger/mocks_test.go
Integration tests record emitted SQL to assert AS MATERIALIZED usage, validate fenced keyset pagination and expansion behavior, add unit tests for fence heuristics, and provide MockDatasetFencer for tests.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related PRs

  • formancehq/ledger#1359: Both PRs modify the shared pagination plumbing in internal/storage/common/paginator*.go and how ordering is propagated/used by PaginatedResourceRepository.
  • formancehq/ledger#1351: Related pagination/ORDER BY changes that interact with the paginator ordering propagation.

Suggested reviewers

  • gfyrag

Poem

🐰 I hopped through queries, split cursor and frame,

MATERIALIZED fences hum my name.
Pages step forward, then back with delight,
Rows bounded and tidy, traversed through the night.
A rabbit cheers this tidy SQL sight.

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 71.43% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The title accurately describes the main change: a performance optimization adding a conditional MATERIALIZED CTE fence for ListTransactions queries, backported to v2.4.
Description check ✅ Passed The description comprehensively explains the problem (slow abort-early index scan), the fix (MATERIALIZED CTE wrapping), and provides backport notes confirming verification on v2.4.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch backport/v2.4/materialized-cte-fence-listtransactions

Warning

There were issues while running some tools. Please review the errors and either fix the tool's configuration or disable the tool if it's a critical failure.

🔧 golangci-lint (2.12.2)

level=error msg="[linters_context] typechecking error: pattern ./...: directory prefix . does not contain main module or its selected dependencies"


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

The new DatasetFencer interface in storage/common/resource.go is picked up
by `go generate` (mocks_test.go is generated from resource.go), so the
generated mock must be committed or the `Dirty` CI check fails.

Directive: mocks_test.go is generated from internal/storage/common/resource.go — re-run `just generate` (mockgen) after changing interfaces there, do not hand-edit

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
internal/storage/ledger/resource_transactions_test.go (1)

1-80: ⚠️ Potential issue | 🔴 Critical | ⚡ Quick win

Missing mockUseFilter type definition causes compilation failure.

The test instantiates &mockUseFilter{filters: tc.filters} at line 75, but the file does not define the mockUseFilter type or its required UseFilter method. Since this is a new file shown in its entirety (lines 1–80 all annotated with ~), the missing definition will cause a compilation error.

🐛 Proposed fix: add mockUseFilter implementation

Insert the following definition before the test function:

 import (
 	"testing"
 
 	"github.com/stretchr/testify/assert"
 )
 
+type mockUseFilter struct {
+	filters map[string][]any
+}
+
+func (m *mockUseFilter) UseFilter(key string, matchers ...func(any) bool) bool {
+	values, ok := m.filters[key]
+	if !ok {
+		return false
+	}
+	if len(matchers) == 0 {
+		return true
+	}
+	for _, value := range values {
+		allMatch := true
+		for _, matcher := range matchers {
+			if !matcher(value) {
+				allMatch = false
+				break
+			}
+		}
+		if allMatch {
+			return true
+		}
+	}
+	return false
+}
+
 func TestShouldFenceTransactionsDataset(t *testing.T) {

This matches the RepositoryHandlerBuildContext.UseFilter semantics from internal/storage/common/resource.go:83-106.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@internal/storage/ledger/resource_transactions_test.go` around lines 1 - 80,
The test fails to compile because mockUseFilter is not defined; add a mock
implementation named mockUseFilter with a field filters map[string][]any and
implement the UseFilter(ctx context.Context) (map[string][]any, error) method
(matching the RepositoryHandlerBuildContext.UseFilter semantics) so
shouldFenceTransactionsDataset(mock) can call it; place this type and method
definition above TestShouldFenceTransactionsDataset and ensure it returns the
stored filters and nil error.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Outside diff comments:
In `@internal/storage/ledger/resource_transactions_test.go`:
- Around line 1-80: The test fails to compile because mockUseFilter is not
defined; add a mock implementation named mockUseFilter with a field filters
map[string][]any and implement the UseFilter(ctx context.Context)
(map[string][]any, error) method (matching the
RepositoryHandlerBuildContext.UseFilter semantics) so
shouldFenceTransactionsDataset(mock) can call it; place this type and method
definition above TestShouldFenceTransactionsDataset and ensure it returns the
stored filters and nil error.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 91fa43b1-15a0-4962-9900-87511bf94400

📥 Commits

Reviewing files that changed from the base of the PR and between 52a9943 and 3e4b8d8.

📒 Files selected for processing (7)
  • internal/storage/common/paginator.go
  • internal/storage/common/paginator_column.go
  • internal/storage/common/paginator_offset.go
  • internal/storage/common/resource.go
  • internal/storage/ledger/resource_transactions.go
  • internal/storage/ledger/resource_transactions_fence_test.go
  • internal/storage/ledger/resource_transactions_test.go

…case

Addresses a CodeRabbit review note: the metadata filter value should be a
string key, not a map. The fence decision only checks filter presence, so
behavior is unchanged — this is for semantic accuracy of the test fixture.
@sylr

sylr commented Jun 11, 2026

Copy link
Copy Markdown
Contributor Author

@coderabbitai — re: the "outside diff range" finding that mockUseFilter is missing and the test won't compile:

This is a false positive. mockUseFilter (and its UseFilter method) is already defined in the same package, in internal/storage/ledger/utils_test.go:191:

type mockUseFilter struct {
	filters map[string][]any
}
func (m *mockUseFilter) UseFilter(key string, matchers ...func(any) bool) bool { ... }

Both resource_transactions_test.go and utils_test.go are package ledger, so the test files compile together and &mockUseFilter{...} resolves fine. The Tests CI job is green on this PR, which confirms compilation.

Applying the suggested fix (adding a second mockUseFilter definition to resource_transactions_test.go) would actually break the build with a redeclaration error in the ledger package — so I'm intentionally not applying it. Reusing the existing shared mock is the deliberate choice here (it's the same helper TestCollectAddressFilters already uses).

No change needed; resolving.

@coderabbitai

coderabbitai Bot commented Jun 11, 2026

Copy link
Copy Markdown
Contributor

Tip

For best results, initiate chat on the files or code changes.

@sylr You're absolutely right — thank you for the clear explanation. Go test files sharing the same package compile as a single unit, so mockUseFilter defined in utils_test.go is fully visible to resource_transactions_test.go without any additional declaration. My suggestion would have introduced a redeclaration compile error. I've noted this for future reviews.


✏️ Learnings added
Learnt from: sylr
URL: https://github.com/formancehq/ledger/pull/1410

Timestamp: 2026-06-11T15:32:30.551Z
Learning: In the `internal/storage/ledger` package (Go), `mockUseFilter` (with its `UseFilter` method) is defined in `utils_test.go` (around line 191) and is shared across all `package ledger` test files, including `resource_transactions_test.go`. Do not flag its use in other test files in the same package as a missing declaration — Go test files in the same package compile together.

Note: Learnings are effective only in the context of similar code segments. To apply general coding guidelines, please configure review instructions. You can manage existing learnings in the app.

Addresses @flemzord's review on #1409: in the fenced path the `dataset` CTE
held the entire materialized filtered set, and the `effectiveVolumes` expand
(`transactions_id in (select id from dataset)`) therefore aggregated over every
matching transaction to return a single page — linear in account history, even
for a selective (but busy) account, not just the documented account="world" case.

Restructure the fenced shape so the MATERIALIZED fence and the page window are
separate, nested CTEs:

  WITH dataset AS (
    WITH filtered AS MATERIALIZED (<filter + PIT + keyset>)
    SELECT * FROM filtered ORDER BY id <dir> LIMIT pageSize+1
  )
  SELECT * FROM dataset LEFT JOIN <expands on "select id from dataset"> ...
  ORDER BY dataset.id <dir>

The LIMIT stays outside the *materialized* CTE (so the planner still evaluates
`filtered` once via the GIN BitmapOr and the page is a cheap top-N over it), but
now lives inside the `dataset` CTE the expands reference — so expand work is
bounded to the page again, restoring the implicit "dataset = rows returned"
contract. The expand() helper drops its now-unused `materialized` parameter.

Constraint: effectiveVolumes expand references "select id from dataset" — the page LIMIT must be inside that CTE or the expand scans the full filtered set
Rejected: Keep LIMIT on the outermost select | expands then aggregate over the whole filtered set (the bug this fixes)
Confidence: high
Scope-risk: moderate
Directive: do not move the page ORDER BY/LIMIT back to the outer select — it must stay inside the "dataset" CTE so expands see only the page
@sylr sylr marked this pull request as draft June 11, 2026 17:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant