perf(storage): sparse-wallet query fix — functional indexes + adaptive fallback by ariel-formance · Pull Request #1408 · formancehq/ledger

ariel-formance · 2026-06-11T11:45:19Z

Background

Deriv experiences 20+ second transaction-list queries. Root cause: metadata @> '{"key":"val"}' gives the Postgres planner no selectivity signal — it falls back to a backward index scan on id DESC, scanning 2M+ rows to return ~50.

Part 1 — Adaptive fallback (interim mitigation)

Merged earlier on this branch. Probe-then-retry: run the query with a 5 s timeout; on SQLSTATE 57014 retry once with enable_indexscan=off forcing the GIN bitmap path. Dense wallets pay only the read-only transaction wrapper overhead. Full detail in the original description below.

Part 2 — Functional-index routing (proper fix)

What changed

INDEXED_METADATA_KEYS per-ledger feature flag (comma-separated key names).

When a key is listed, the query builder emits:

metadata ->> 'key' = 'value'   -- BTree equality, matches partial functional index

instead of:

metadata @> '{"key": "value"}' -- GIN-only, breaks ORDER BY id DESC planner path

Migrations 53 & 54 — partial functional indexes scoped to ledger = 'deriv':

CREATE INDEX ON transactions ((metadata->>'source_wallet_id'))      WHERE ledger = 'deriv';
CREATE INDEX ON transactions ((metadata->>'destination_wallet_id')) WHERE ledger = 'deriv';

Simulation — 1M rows, 0.005% selectivity (Deriv production scale)

plain  @>  path (no flag):        497 ms  — backward scan, 999 950 rows filtered
indexed ->> path (flag + index):    1 ms  — functional index, 50 rows fetched
speedup: 452×

Run it yourself:

DOCKER_HOST=unix:///~/.colima/default/docker.sock \
  go test -tags it -run TestDerivSparseWalletSimulation -v -timeout 600s \
  ./internal/storage/ledger/...
# DERIV_SIM_ROWS=20000 for a quick 6-second sanity run

Activating for Deriv

After migrations 53/54 have run on Deriv's bucket:

PATCH /ledgers/deriv
{"features": {"INDEXED_METADATA_KEYS": "source_wallet_id,destination_wallet_id"}}

The flag is per-ledger and opt-in. No other ledger is affected. The adaptive fallback remains active as a safety net in case the flag is not yet set.

Files changed (Part 2)

File	Change
`pkg/features/features.go`	`INDEXED_METADATA_KEYS` constant; open-ended (nil) validation
`internal/ledger.go`	`GetIndexedMetadataKeys() []string`
`internal/storage/ledger/resource_transactions.go`	Route metadata filter to `->>` for flagged keys
`migrations/53-…/up.sql`	Partial functional index on `source_wallet_id` for Deriv
`migrations/54-…/up.sql`	Partial functional index on `destination_wallet_id` for Deriv
`transactions_metadata_index_test.go`	5 integration tests (flag on/off, semantic equivalence)
`transactions_deriv_sim_test.go`	1M-row simulation with GIN drop/rebuild for fast seeding

Original adaptive-fallback description

What this is

Adaptive mitigation, not the real fix.

The real fix is a composite or denormalised index that serves both the wallet filter predicate and the id ORDER BY without forcing a sort step. This PR buys time until that index work is scheduled and deployed to prod-us-east-1-deriv.

Background

SELECT … ORDER BY id DESC LIMIT N with JSONB @> predicates leads Postgres to an Index Scan Backward on the id B-tree. For dense/recent wallets this plan is fast (matches cluster near the head of the index). For sparse wallets (few matching rows scattered across a large id range) the scan walks most of the table before accumulating N results — observed ~50 s at prod-us-east-1-deriv (ledger v2.4.9).

Forcing a GIN bitmap scan globally would hurt dense wallets, so a deployment-wide planner override is not viable.

Approach: probe-then-retry

Detect the pathological plan at runtime and recover within the same request.

Execute the transactions-list SELECT inside an explicit read-only transaction with SET LOCAL statement_timeout = <firstAttemptTimeout> (default 5 s).
If Postgres cancels with SQLSTATE 57014 AND the request context is still alive, retry once with SET LOCAL enable_indexscan = off and SET LOCAL statement_timeout = <retryTimeout> (default 40 s).
Any other error, or a client-context cancellation, is returned unchanged.

Transaction hygiene: SET LOCAL is strictly scoped to each explicit BEGIN/COMMIT block.

coderabbitai · 2026-06-11T11:45:32Z

Walkthrough

Adds an adaptive probe‑then‑retry transactions-list paginator with configurable timeouts and wires it through FX/CLI. Separately introduces an indexed-metadata-keys feature flag enabling query predicate rewriting, adds ledger-scoped functional indices on wallet IDs, and includes integration tests for both features.

Changes

Adaptive transactions-list pagination and configuration

Layer / File(s)	Summary
TransactionListConfig contract and adaptive paginator implementation `internal/storage/ledger/store.go`	Adds `TransactionListConfig` with enable flag and two statement-timeout durations; implements `transactionsAdaptivePaginator` that probes with a short timeout, detects Postgres statement timeouts, retries once with a longer timeout and `enable_indexscan=off`, scopes `SET LOCAL` in a read-only tx, and records fallback metrics. `GetOne`/`Count` delegate to base repo. Exposes `DefaultTransactionListConfig()` and `WithTransactionListConfig()`.
FX module and storage configuration wiring `internal/storage/module.go`, `internal/storage/driver/module.go`	Adds `TransactionListConfig` to `ModuleConfig`; changes `driver.NewFXModule()` to accept a `ModuleConfig` and forwards `TransactionListConfig` into ledgerstore factory options.
CLI flags and ServeCommandConfig `cmd/serve.go`	Adds `TxListAdaptiveFallback`, `TxListFirstAttemptTimeoutMs`, `TxListRetryTimeoutMs` to `ServeCommandConfig`; exports flag name constants; registers three Cobra flags; wires CLI values into `ledgerstore.TransactionListConfig` when constructing the storage FX module.
Integration tests for adaptive transactions-list pagination `internal/storage/ledger/transactions_planner_hints_test.go`	Adds integration tests validating: fast-path parity with baseline, fallback triggered by tight probe timeout, fallback results matching baseline, `SET LOCAL` no-leakage, retry timeout error propagation, disabled fallback path, pagination cursor integrity across fallback pages, `GetOne`/`Count` unaffected, and client-cancel causing no retry.

Indexed metadata keys feature

Layer / File(s)	Summary
Feature flag definition and validation semantics `pkg/features/features.go`	Adds `FeatureIndexedMetadataKeys` constant and `FeatureConfigurations` entry with nil value; changes `ValidateFeatureWithValue` to accept any value when `possibleConfigurations` is empty, enabling open-ended feature values.
Ledger feature accessor for indexed metadata keys `internal/ledger.go`	Adds `Ledger.GetIndexedMetadataKeys()` method that parses the comma-separated `FeatureIndexedMetadataKeys` value from ledger features and returns a slice of indexed key names.
Database migrations for metadata indices `internal/storage/bucket/migrations/53-add-metadata-source-wallet-id-index-deriv/up.sql`, `internal/storage/bucket/migrations/54-add-metadata-destination-wallet-id-index-deriv/up.sql`	Adds two migrations creating ledger-scoped (`ledger = 'deriv'`) partial functional indices on `metadata->>'source_wallet_id'` and `metadata->>'destination_wallet_id'` using `CREATE INDEX CONCURRENTLY IF NOT EXISTS`.
Query predicate rewriting for indexed keys `internal/storage/ledger/resource_transactions.go`	Updates `ResolveFilter` to branch on whether a metadata key is indexed: indexed keys use `metadata ->> ? = ?` equality predicates, while non-indexed keys fall back to `metadata @> ?` containment.
Integration tests for indexed metadata keys feature `internal/storage/ledger/transactions_metadata_index_test.go`	Adds tests validating indexed-key query correctness, non-indexed-key containment fallback, semantic equivalence between indexed and non-indexed stores, destination wallet ID indexing, and no-flag behavior.
Deriv sparse-wallet query planner simulation tests `internal/storage/ledger/transactions_deriv_sim_test.go`	Adds integration tests seeding Deriv sparse-wallet transaction patterns, comparing query plans between `metadata @>` and `metadata ->>` predicates, measuring pagination timing across both approaches, and validating semantic equivalence on deterministic datasets.

Sequence Diagram

sequenceDiagram
  participant Client
  participant Store
  participant AdaptivePaginator
  participant PostgresDB
  Client->>Store: Transactions()
  Store->>AdaptivePaginator: instantiate with TransactionListConfig
  Client->>AdaptivePaginator: Paginate(query, cursor)
  AdaptivePaginator->>PostgresDB: SELECT with SET LOCAL statement_timeout=firstAttemptMs
  alt probe times out (SQLSTATE 57014)
    PostgresDB-->>AdaptivePaginator: query canceled
    AdaptivePaginator->>PostgresDB: SELECT with SET LOCAL statement_timeout=retryMs and SET LOCAL enable_indexscan=off
    PostgresDB-->>AdaptivePaginator: rows
    AdaptivePaginator->>AdaptivePaginator: record fallback metric
  else probe succeeds
    PostgresDB-->>AdaptivePaginator: rows
  end
  AdaptivePaginator-->>Client: page, nextCursor

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Suggested reviewers

flemzord

Poem

🐰 I probed the timeout, then retried with grace,
Indexed metadata keys found their rightful place,
Index scans off, predicates rewritten clean,
Wallet queries faster than ever seen,
Hops of joy echo through this PR's embrace.

🚥 Pre-merge checks | ✅ 5

✅ Passed checks (5 passed)

Check name	Status	Explanation
Title check	✅ Passed	The PR title clearly identifies the two main changes: adaptive fallback (an interim mitigation for sparse-wallet timeout issues) and functional indexes (the proper fix). It is concise, specific, and matches the core implementation across the changeset.
Description check	✅ Passed	The PR description thoroughly explains the background, root cause, and two-part solution (adaptive fallback + functional-index routing). It includes a simulation showing 452× speedup, deployment instructions, affected files, and original adaptive-fallback details. The description is directly related to all major changes in the changeset.
Docstring Coverage	✅ Passed	Docstring coverage is 80.00% which is sufficient. The required threshold is 80.00%.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

📝 Generate docstrings

Create stacked PR
Commit on current branch

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch fix/v2.4/tx-list-planner-hints

Warning

There were issues while running some tools. Please review the errors and either fix the tool's configuration or disable the tool if it's a critical failure.

🔧 golangci-lint (2.12.2)

level=error msg="[linters_context] typechecking error: pattern ./...: directory prefix . does not contain main module or its selected dependencies"

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 2

🧹 Nitpick comments (1)

internal/storage/common/resource.go (1)
345-350: 💤 Low value

Consider validating the OrderExpression format.

The code assumes paginator.OrderExpression() always returns "column direction" format and uses strings.Cut to split on space. If a future Paginator implementation returns a different format (e.g., no space), the parsing silently produces invalid SQL. Consider checking the boolean return value from strings.Cut or adding a comment documenting the required format.
🛡️ Defensive validation
 orderExpr := paginator.OrderExpression()
 col, dir, _ := strings.Cut(orderExpr, " ")
+if dir == "" {
+    return nil, fmt.Errorf("OrderExpression must return 'column direction' format, got: %q", orderExpr)
+}
 finalQuery = finalQuery.Order(fmt.Sprintf("dataset.%s %s", col, dir))
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@internal/storage/common/resource.go` around lines 345 - 350, Validate the
format returned by paginator.OrderExpression() before using strings.Cut: capture
the boolean ok from strings.Cut(orderExpr, " ") and if false, handle the invalid
format (e.g., return an error from the enclosing function or fall back to a safe
default order) instead of blindly using col/dir; then only call
finalQuery.Order(fmt.Sprintf("dataset.%s %s", col, dir)) when ok is true. Also
consider adding a brief comment on the expected "column direction" format next
to the paginator.OrderExpression() usage to document the requirement.

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@internal/storage/ledger/store.go`:
- Around line 123-128: The Transactions() fast-path check treats only 0 as "no
timeout", causing negative StatementTimeoutMs to still use the wrapper; change
the guard in Store.Transactions from checking == 0 to <= 0 so negative/zero
values disable the hints and return transactionsBase(), and ensure the code that
applies statement_timeout (currently gated by > 0) remains unchanged; update the
condition involving txListConfig.StatementTimeoutMs in Transactions() (and any
analogous checks) to use <= 0 so behavior is consistent with the timeout
application in the RunInTx wrapper (transactionsPaginationWithHints).
- Around line 196-206: The current bun.Tx branch calls issueSetLocal on the
caller-managed transaction which leaves planner/session settings active for the
rest of that outer transaction; instead, start a nested transaction
(savepoint-style) for the Paginate work so the SET LOCAL doesn't leak.
Concretely: in the case bun.Tx branch, begin a new transaction from the outer tx
(create a nested bun.Tx via BeginTx on the store/outer tx), call issueSetLocal
on that nested tx, invoke transactionsBase() against that nested tx and call
Paginate(ctx, q), then commit/rollback the nested tx and return the Paginate
result; keep using the existing symbols issueSetLocal, transactionsBase(),
Paginate and the bun.Tx branch to locate where to change.

---

Nitpick comments:
In `@internal/storage/common/resource.go`:
- Around line 345-350: Validate the format returned by
paginator.OrderExpression() before using strings.Cut: capture the boolean ok
from strings.Cut(orderExpr, " ") and if false, handle the invalid format (e.g.,
return an error from the enclosing function or fall back to a safe default
order) instead of blindly using col/dir; then only call
finalQuery.Order(fmt.Sprintf("dataset.%s %s", col, dir)) when ok is true. Also
consider adding a brief comment on the expected "column direction" format next
to the paginator.OrderExpression() usage to document the requirement.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: de0298dd-34ef-4ba0-80be-243ec3dcfd96

📥 Commits

Reviewing files that changed from the base of the PR and between 52a9943 and 1012b5e.

📒 Files selected for processing (9)

cmd/serve.go
internal/storage/common/paginator.go
internal/storage/common/paginator_column.go
internal/storage/common/paginator_offset.go
internal/storage/common/resource.go
internal/storage/driver/module.go
internal/storage/ledger/store.go
internal/storage/ledger/transactions_planner_hints_test.go
internal/storage/module.go

coderabbitai

♻️ Duplicate comments (1)

internal/storage/ledger/store.go (1)
271-278: ⚠️ Potential issue | 🟡 Minor | ⚖️ Poor tradeoff

SET LOCAL leaks to subsequent queries when caller is already in a transaction.

When store.db is already a bun.Tx (lines 273-278), the code issues SET LOCAL directly on the caller's transaction. Those settings (enable_indexscan = off, statement_timeout) persist for all subsequent statements within that outer transaction—not just this Paginate call.

This means if a caller does:
store, tx, _ := baseStore.BeginTX(ctx, nil)
defer tx.Rollback()
store.Transactions().Paginate(ctx, q) // triggers fallback → enable_indexscan = off
store.Logs().Paginate(ctx, q2)        // now runs with enable_indexscan = off
The workaround documented in the past review (nested transaction via savepoint) would fully isolate the settings. Given this is a defensive branch for an uncommon path, consider whether the current behavior is acceptable or if savepoint isolation is worth adding.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@internal/storage/ledger/store.go` around lines 271 - 278, The current
defensive branch mutates the caller's bun.Tx by calling issueSetLocal directly
on store.db (when it implements bun.Tx), leaking SET LOCAL state; instead, wrap
the scope in a SAVEPOINT to isolate settings: on detecting store.db as bun.Tx,
execute a SAVEPOINT (unique name), call issueSetLocal using that same tx, invoke
a.store.transactionsBase().Paginate(ctx, q) while the savepoint is active, then
ROLLBACK TO SAVEPOINT (to undo the session-local settings) and RELEASE the
savepoint; update the branch handling store.db.(bun.Tx) around issueSetLocal and
transactionsBase().Paginate to use this savepoint pattern so outer transaction
settings are not mutated.

🧹 Nitpick comments (1)

internal/storage/ledger/store.go (1)
621-638: ⚡ Quick win

DefaultTransactionListConfig() is not applied automatically — zero-valued config disables fallback.

The struct field comment on line 50-53 says "Default true", but defaultOptions (line 640) doesn't include WithTransactionListConfig(DefaultTransactionListConfig()). If a Store is created without explicitly passing WithTransactionListConfig, the zero-valued TransactionListConfig has EnableAdaptiveFallback: false.

This is fine for the CLI path (which always supplies the config via FX wiring), but direct callers of ledgerstore.New() must remember to pass the option if they want the documented default behavior. Consider either:

Updating the doc comment to clarify "default when wired through FX" vs "zero-value when created directly", or

Adding WithTransactionListConfig(DefaultTransactionListConfig()) to defaultOptions so the behavior matches the documentation.
Option 2: Apply default config automatically
 var defaultOptions = []Option{
 	WithMeter(noopmetrics.Meter{}),
 	WithTracer(nooptracer.Tracer{}),
+	WithTransactionListConfig(DefaultTransactionListConfig()),
 }
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@internal/storage/ledger/store.go` around lines 621 - 638,
DefaultTransactionListConfig() is never applied automatically so a zero-valued
TransactionListConfig disables adaptive fallback; update the store defaults by
adding WithTransactionListConfig(DefaultTransactionListConfig()) into the
defaultOptions slice so new Store instances get the documented defaults (adjust
symbols: DefaultTransactionListConfig, WithTransactionListConfig,
defaultOptions, TransactionListConfig, Store).

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Duplicate comments:
In `@internal/storage/ledger/store.go`:
- Around line 271-278: The current defensive branch mutates the caller's bun.Tx
by calling issueSetLocal directly on store.db (when it implements bun.Tx),
leaking SET LOCAL state; instead, wrap the scope in a SAVEPOINT to isolate
settings: on detecting store.db as bun.Tx, execute a SAVEPOINT (unique name),
call issueSetLocal using that same tx, invoke
a.store.transactionsBase().Paginate(ctx, q) while the savepoint is active, then
ROLLBACK TO SAVEPOINT (to undo the session-local settings) and RELEASE the
savepoint; update the branch handling store.db.(bun.Tx) around issueSetLocal and
transactionsBase().Paginate to use this savepoint pattern so outer transaction
settings are not mutated.

---

Nitpick comments:
In `@internal/storage/ledger/store.go`:
- Around line 621-638: DefaultTransactionListConfig() is never applied
automatically so a zero-valued TransactionListConfig disables adaptive fallback;
update the store defaults by adding
WithTransactionListConfig(DefaultTransactionListConfig()) into the
defaultOptions slice so new Store instances get the documented defaults (adjust
symbols: DefaultTransactionListConfig, WithTransactionListConfig,
defaultOptions, TransactionListConfig, Store).

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: f0313601-844c-4765-9ac0-8f2fc7402a49

📥 Commits

Reviewing files that changed from the base of the PR and between 1012b5e and 577f762.

📒 Files selected for processing (5)

cmd/serve.go
internal/storage/driver/module.go
internal/storage/ledger/store.go
internal/storage/ledger/transactions_planner_hints_test.go
internal/storage/module.go

🚧 Files skipped from review as they are similar to previous changes (2)

internal/storage/driver/module.go
internal/storage/module.go

…imeout Background ---------- SELECT … ORDER BY id DESC LIMIT N with JSONB @> predicates leads Postgres to an Index Scan Backward on the id B-tree. For dense/recent wallets that plan is fast. For sparse wallets (few matching rows scattered across a large id range) the scan walks most of the table before accumulating N results — observed ~50 s at prod-us-east-1-deriv (ledger v2.4.9). Forcing GIN bitmap scan globally would hurt dense wallets, so a deployment-wide planner override is not viable. Adaptive mitigation (this PR) ------------------------------ This is a stopgap, not the real fix. The real fix is a composite or denormalised index that serves both the wallet filter and id ordering without a full sort step. The deliberate cost is up to <firstAttemptTimeout> of wasted work per uncached sparse-wallet request. Strategy (exactly one retry, no loop): 1. Execute the transactions-list SELECT inside an explicit read-only transaction with SET LOCAL statement_timeout = <firstAttemptTimeout> (default 5 s). Dense wallets finish well inside the budget — no fallback, no overhead beyond the transaction wrapper. 2. If Postgres cancels with SQLSTATE 57014 AND the request context is still alive (i.e. our timeout fired, not the client disconnecting), roll back and retry once with SET LOCAL enable_indexscan = off plus SET LOCAL statement_timeout = <retryTimeout> (default 40 s). Disabling index scans forces the planner onto the GIN bitmap path for the JSONB @> predicates. 3. Any other error, or a client-context cancellation, is returned unchanged. Exactly one retry. Never a loop. Transaction hygiene: SET LOCAL is strictly scoped to each explicit BEGIN/COMMIT block, so no planner setting or timeout leaks to subsequent queries on the same pooled connection. Changes ------- - internal/storage/ledger/store.go · TransactionListConfig struct (EnableAdaptiveFallback, FirstAttemptTimeoutMs, RetryTimeoutMs) + DefaultTransactionListConfig() + WithTransactionListConfig() · transactionsAdaptivePaginator: probe-then-retry Paginate; GetOne/Count delegate straight to the base repository (no adaptive logic needed) · paginateInTx / issueSetLocal helpers for the explicit-tx + SET LOCAL dance · OTel metrics: store.tx_list_fallback_total (counter), store.tx_list_first_attempt_duration (histogram), store.tx_list_retry_duration (histogram, outcome attribute) · Structured log on every fallback event - internal/storage/driver/module.go — ModuleConfig carries TransactionListConfig - internal/storage/module.go — forwards TransactionListConfig to driver - cmd/serve.go · --tx-list-adaptive-fallback (default true) · --tx-list-first-attempt-timeout-ms (default 5000) · --tx-list-retry-timeout-ms (default 40000) - internal/storage/ledger/transactions_planner_hints_test.go (//go:build it) · FastPathUnchanged — generous timeout, dense wallet, result == baseline · FallbackTriggered — 1 ms probe, retry succeeds, rows correct + ordered DESC · FallbackRowsMatch — fallback result == baseline result row-for-row · NoLeakage — pinned conn: enable_indexscan + statement_timeout restored · RetryAlsoTimesOut — both timeouts 1 ms → error propagated, no loop · DisabledFallback — EnableAdaptiveFallback=false → plain path, no overhead · CursorIntegrity — next-page cursor from fallback path is decodable and correct · GetOneAndCountUnaffected — adaptive store GetOne/Count == base store

Three gaps addressed: 1. ClientCancelNoRetry (new test): proves that a cancelled request context returns context.Canceled immediately and never triggers the retry. This is the critical client-disconnect safety property — if the ctx.Err() guard in Paginate were wrong, a dead-client request would fire a wasted retry. 2. RetryAlsoTimesOut: bumped data set from 3 to 200 rows so the 1 ms statement_timeout reliably fires SQLSTATE 57014 even on fast CI runners. With only 3 rows the probe could complete in < 1 ms and require.Error would fail spuriously. 3. FallbackTriggeredByTimeout / FallbackRowsMatchBaseline: same 200-row fix — both tests used 1 ms probes with tiny data sets that might not timeout, causing the assertions to pass without exercising the fallback path at all.

…d edge-case tests Code fix -------- paginateInTx previously issued SET LOCAL directly onto an outer bun.Tx when store.db was already a transaction (BeginTX path). If the probe timed out (57014), Postgres put the outer transaction into an error state. The retry then attempted to issue another SET LOCAL on the aborted transaction, which failed with InFailedSqlTransaction (25P02), leaving the caller's transaction permanently unusable. This path is unreachable today (list operations are never called within an explicit BeginTX), but it was a silent trap for future callers. Fix: when store.db is a bun.Tx, skip the adaptive machinery entirely and delegate straight to the base paginator. New tests --------- - EmptyResultSet: zero wallet rows → empty cursor, no error, HasMore false. Also confirms the fast path doesn't blow up on a zero-row response. - AlreadyInTxSkipsAdaptive: opens an outer transaction via BeginTX, then calls Transactions().Paginate() with a 1 ms probe timeout. Verifies the call succeeds, returns the right rows, and that statement_timeout on the outer transaction is still "0" (the adaptive SET LOCAL never ran).

The test was flaky on GitHub Actions because the GIN bitmap scan (used on the retry path) completed in < 1ms even with 200 rows, so statement_timeout never fired and require.Error always saw nil. Add a testHookBeforePaginateSelect field to Store and a setter SetTestHookBeforePaginateSelect that tests can use to inject arbitrary SQL inside paginateInTx, sharing the same transaction and therefore the same SET LOCAL statement_timeout scope. The test now installs SELECT pg_sleep(0.005), which reliably exceeds a 1ms timeout regardless of how fast the real SELECT would have been. Also fix gofmt alignment on three struct fields in Store (extra spaces → gofmt canonical alignment) and a double-space in the test (1, // → 1, //).

codecov · 2026-06-11T15:09:48Z

Codecov Report

❌ Patch coverage is 89.83051% with 12 lines in your changes missing coverage. Please review.
✅ Project coverage is 80.84%. Comparing base (140ed9b) to head (c68e786).

Files with missing lines	Patch %	Lines
internal/storage/ledger/store.go	88.00%	6 Missing and 6 partials ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #1408      +/-   ##
==========================================
+ Coverage   80.55%   80.84%   +0.29%     
==========================================
  Files         206      206              
  Lines       11293    11372      +79     
==========================================
+ Hits         9097     9194      +97     
+ Misses       1646     1619      -27     
- Partials      550      559       +9

☔ View full report in Codecov by Harness.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

The FallbackTriggeredByTimeout / FallbackRowsMatchBaseline / NoLeakage tests relied on dataset size (200 rows) to make the 1 ms probe fire SQLSTATE 57014. On a fast CI runner those queries complete in < 1 ms and the fallback never triggers, leaving the retry-success branch in Paginate (the else { Infof } block) and several issueSetLocal branches permanently uncovered. Add three targeted tests that use the pg_sleep hook introduced in the previous commit so the fallback fires deterministically regardless of query speed: * FallbackRetrySucceedsViaPgSleep — hook makes the probe fire 57014 and is a no-op on the retry, covering the retry-success path and its logging. * CtxCancelAfterProbe — hook cancels the outer ctx then fires 57014, covering the ctx.Err() != nil branch in Paginate (no retry when client disconnects). * ZeroTimeoutNoSetLocal — FirstAttemptTimeoutMs = 0 skips SET LOCAL entirely, covering the if timeoutMs > 0 { } false branch in issueSetLocal; also exercises DefaultTransactionListConfig(). Also add a no-op hook to FastPathUnchanged to cover the "hook set but returns nil" fall-through path in paginateInTx.

coderabbitai

🧹 Nitpick comments (1)

internal/storage/ledger/transactions_planner_hints_test.go (1)

553-561: ⚡ Quick win

Strengthen no-retry verification in the cancel-after-probe test.

Line 559 currently verifies only that an error is returned. A retry regression could still pass this assertion. Track hook calls and assert exactly one attempt.

Suggested test hardening

 func TestTransactionListAdaptive_CtxCancelAfterProbe(t *testing.T) {
@@
-	adaptive.SetTestHookBeforePaginateSelect(func(_ context.Context, tx bun.Tx) error {
+	callCount := 0
+	adaptive.SetTestHookBeforePaginateSelect(func(_ context.Context, tx bun.Tx) error {
+		callCount++
 		cancel()
 		_, err := tx.ExecContext(context.Background(), "SELECT pg_sleep(0.005)")
 		return err // SQLSTATE 57014; ctx is now cancelled
 	})
@@
 	_, err := adaptive.Transactions().Paginate(ctx, walletQuery(15))
 	require.Error(t, err, "must return an error when context is cancelled after probe timeout")
+	require.Equal(t, 1, callCount, "must not retry after context cancellation")
 }

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@internal/storage/ledger/transactions_planner_hints_test.go` around lines 553
- 561, The test should verify the paginate hook is invoked exactly once to
ensure no retries: add a counter (e.g. attempts := 0) captured by the closure
passed to adaptive.SetTestHookBeforePaginateSelect and increment it each time
the hook runs, leaving the existing cancel() and tx.ExecContext sleep behavior
unchanged; after calling adaptive.Transactions().Paginate(ctx, walletQuery(15))
assert that an error was returned and that the counter equals 1 (no retries),
referencing the existing hook setup function
adaptive.SetTestHookBeforePaginateSelect, the call
adaptive.Transactions().Paginate, and walletQuery(15) to locate where to insert
the counter and final assertion.

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Nitpick comments:
In `@internal/storage/ledger/transactions_planner_hints_test.go`:
- Around line 553-561: The test should verify the paginate hook is invoked
exactly once to ensure no retries: add a counter (e.g. attempts := 0) captured
by the closure passed to adaptive.SetTestHookBeforePaginateSelect and increment
it each time the hook runs, leaving the existing cancel() and tx.ExecContext
sleep behavior unchanged; after calling adaptive.Transactions().Paginate(ctx,
walletQuery(15)) assert that an error was returned and that the counter equals 1
(no retries), referencing the existing hook setup function
adaptive.SetTestHookBeforePaginateSelect, the call
adaptive.Transactions().Paginate, and walletQuery(15) to locate where to insert
the counter and final assertion.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 2a7d7c3f-8ea1-4db1-97d8-0932f9cf3b7e

📥 Commits

Reviewing files that changed from the base of the PR and between 7cbc3b5 and 9f1a0c2.

⛔ Files ignored due to path filters (1)

go.mod is excluded by !**/*.mod

📒 Files selected for processing (2)

internal/storage/ledger/store.go
internal/storage/ledger/transactions_planner_hints_test.go

…obe test

NumaryBot · 2026-06-12T08:25:20Z

🛑 Changes requested — multi-model review

The adaptive fallback implementation is well-structured overall — the in-transaction bypass is a sound safety measure, the fx/config/metrics plumbing is clean, and the test coverage is a good foundation. However, several correctness and robustness issues require attention before merging.

The most critical concerns are: (1) the test hook field is read and written on a shared Store without synchronization, creating a real data race; (2) SQLSTATE 57014 is shared between statement_timeout, pg_cancel_backend, and other server-side cancellations, so the current classification can trigger spurious 40-second retries on non-timeout cancellations; (3) there is no integration test verifying that a real statement_timeout error propagates unwrapped through the bun/paginate stack to actually trigger the fallback; and (4) existing callers that construct ModuleConfig without the new TransactionListConfig field silently receive the zero value, disabling the fallback outside the cmd/serve path despite the documented defaults.

Minor issues include: the SET LOCAL statement_timeout = 0 omission meaning inherited session timeouts are not overridden as documented; no validation of negative or implausibly large timeout values in the config; metric instruments that are unconditionally dereferenced and could panic in non-standard construction paths; the in-transaction bypass being silent and unobservable; and probe timeouts being logged at error level for what is an expected, handled condition.

🟠 [major] testHookBeforePaginateSelect mutates shared Store without synchronization

internal/storage/ledger/store.go:85 — reported by claude

SetTestHookBeforePaginateSelect writes store.testHookBeforePaginateSelect while paginateInTx reads it concurrently across goroutines serving HTTP requests. A Store is shared across goroutines, so even though the hook is intended for tests only, the nil-check and read on every Paginate call creates a data race. Exposing a test-only mutation method on the production Store type is also a maintenance and security smell.

Suggestion: Gate the hook behind a build tag, or inject it via an unexported field set only at construction time (e.g. an Option used solely in tests), so it cannot be mutated on a live shared Store. At minimum document that it must only be set before the store serves any traffic.

🟠 [major] SQLSTATE 57014 covers client/admin cancellation as well as statement_timeout

internal/storage/ledger/store.go:74 — reported by claude

isStatementTimeout treats any pgerrcode.QueryCanceled (57014) as a probe timeout. Postgres returns 57014 for statement_timeout, pg_cancel_backend, and lock_timeout-adjacent cancellations alike. The ctx.Err()==nil guard only filters client disconnects detected on the Go side; a server-side cancel (e.g. an operator running pg_cancel_backend) would be misclassified as a probe timeout and trigger a spurious retry with enable_indexscan=off, potentially running for up to RetryTimeoutMs (40s) doing unwanted work.

Suggestion: Distinguish statement_timeout from generic cancellation by also inspecting pgErr.Message (e.g. checking for 'canceling statement due to statement timeout') in addition to the code. At minimum document this residual ambiguity in a comment on isStatementTimeout.

🟠 [major] Error classification depends on raw *pgconn.PgError surviving the bun/paginate stack

internal/storage/ledger/store.go:80 — reported by claude

isStatementTimeout relies on errors.As reaching the underlying *pgconn.PgError. If any layer in the paginator stack (bun, the common paginate package, etc.) wraps or replaces the driver error with a non-wrapping type or calls ResolveError internally, errors.As will fail and the fallback will silently never trigger — the exact pathological case this PR targets would still time out. There is no test asserting that a real statement_timeout from the SELECT is classified correctly through the full stack.

Suggestion: Add an integration test that drives a genuine statement_timeout on the actual transactions SELECT (without the hook) and asserts the fallback fires, to lock in that the error type propagates unwrapped through the paginator.

🟠 [major] Zero-value TransactionListConfig silently disables the adaptive fallback for existing callers

internal/storage/module.go:35 — reported by gpt

TransactionListConfig is forwarded unconditionally. Existing callers that construct storage.ModuleConfig without the new field will pass the zero value through to ledgerstore.WithTransactionListConfig, leaving EnableAdaptiveFallback=false and both timeouts at 0. This contradicts the PR's documented default of fallback enabled with 5s/40s timeouts and means the mitigation may be silently disabled outside the cmd serve path where Cobra provides defaults.

Suggestion: Normalize defaults at the module/store boundary. Initialize from ledgerstore.DefaultTransactionListConfig() and only override explicitly supplied values, or make ModuleConfig.TransactionListConfig a pointer so nil means defaults while an explicit config can disable the fallback.

🟡 [minor] No validation that probe+retry budgets are non-negative or fit within client timeout

cmd/serve.go:255 — reported by claude

The flags accept arbitrary int64 values. Negative timeouts would be passed straight into SET LOCAL statement_timeout = , which Postgres rejects at query time, surfacing as an opaque error. The design also requires RetryTimeoutMs ≤ client_timeout − FirstAttemptTimeoutMs, but nothing enforces or warns about this invariant.

Suggestion: Add validation in cfg.Validate() (or wherever serve config is validated) to reject negative timeouts and optionally warn when RetryTimeoutMs is implausibly large relative to known client timeouts.

🟡 [minor] timeoutMs=0 does not actually disable the statement timeout when the session already has one set

internal/storage/ledger/store.go:381 — reported by gpt

paginateInTx documents timeoutMs: 0 = no timeout, and issueSetLocal skips SET LOCAL statement_timeout when the value is 0. However, if the role, database, DSN, or session already has a nonzero statement_timeout, the query will still be bounded by that inherited timeout and may unexpectedly trigger the fallback.

Suggestion: If 0 is intended to disable the server-side timeout for that attempt, explicitly execute SET LOCAL statement_timeout = 0 when timeoutMs is 0. Otherwise update the comment and configuration documentation to clarify that 0 means 'do not override the existing session timeout'.

🟡 [minor] Fallback metric instruments may be nil if Store is constructed outside New()

internal/storage/ledger/store.go:175 — reported by claude

txListFallbackCounter, txListFirstAttemptDurationMs, and txListRetryDurationMs are only initialized in New(). They are dereferenced unconditionally in the fallback branch. While current construction paths go through New(), any future struct-literal construction or refactor would cause a nil-pointer panic in the fallback branch.

Suggestion: Defensively nil-check the metric instruments before calling Add/Record, or add a comment and construction invariant guaranteeing they are always non-nil when EnableAdaptiveFallback is true.

🟡 [minor] In-transaction bypass of adaptive fallback is silent and unobservable

internal/storage/ledger/store.go:230 — reported by claude

paginateInTx returns the plain base paginator whenever store.db is a bun.Tx, which is correct for safety, but means any transactions-list query inside an outer transaction silently loses the sparse-wallet protection and can still hit the long timeout. There is no metric or log indicating the mitigation was skipped, making it impossible for operators to tell whether a slow list ran with or without the fallback.

Suggestion: Emit a debug log or increment a counter when the adaptive path is bypassed due to an outer transaction, so the skipped-mitigation case is observable in production.

⚪ [nit] FirstAttemptTimeoutMs=0 silently makes the fallback unreachable

internal/storage/ledger/store.go:135 — reported by claude

When FirstAttemptTimeoutMs is 0, issueSetLocal skips SET LOCAL statement_timeout, so the probe runs with no server-side timeout and can only be cancelled by the Go context. In that configuration the fallback can effectively never fire on the probe (no 57014 can be raised), making the adaptive machinery a pure read-only-transaction wrapper. This behaviour is non-obvious to operators.

Suggestion: Document that FirstAttemptTimeoutMs=0 disables the probe-timeout trigger entirely, rendering the fallback unreachable under normal conditions.

⚪ [nit] Probe timeout logged at error level for expected, handled condition

internal/storage/ledger/store.go:115 — reported by claude

Every fallback event logs at Errorf level ('transactions list probe timed out'). For deployments with sparse wallets this is expected/benign behaviour, not an error, and will create noisy error-level log volume that may trip error-rate alerts.

Suggestion: Log the probe timeout at Warn or Info level since it is a handled, expected condition. Reserve Errorf for the retry-failure case.

_{Reviewed in parallel by claude (anthropic/claude-opus-4-8) and gpt (openai/gpt-5.5), then consolidated. This comment is updated on each push.}

…a queries Adds INDEXED_METADATA_KEYS feature flag (comma-separated key names). When a metadata key is flagged, the query builder emits metadata ->> 'key' = 'value' instead of metadata @> '{"key": "value"}' allowing Postgres to use a partial functional BTree index rather than falling back to a full backward scan on the id index. Migrations 53 and 54 create partial functional indexes scoped to ledger = 'deriv' for source_wallet_id and destination_wallet_id. Simulation at 1 M rows (0.005% selectivity, matching Deriv production scale): plain @> path: 497 ms indexed ->> path: 1 ms speedup: 452× To activate for Deriv after migrations have run: PATCH /ledgers/deriv {"features": {"INDEXED_METADATA_KEYS": "source_wallet_id,destination_wallet_id"}}

NumaryBot

🛑 Changes requested — automated review

The adaptive retry path is broadly structured correctly, but the newly added functional-index path has planner-visibility issues that can make the intended indexes unusable in realistic deployments. These are performance bugs in the mitigation/index work and should be addressed.

NumaryBot · 2026-06-18T23:58:04Z

@@ -0,0 +1,3 @@
+create index {{ if not .Transactional }}concurrently{{end}} if not exists transactions_metadata_source_wallet_id_deriv
+on "{{.Schema}}".transactions ((metadata->>'source_wallet_id'))
+where ledger = 'deriv';


🟠 [major] Keep ledger predicate usable for partial indexes

When the bucket contains only the deriv ledger, newScopedSelect() deliberately omits the ledger = ? filter, so a transactions-list query on that bucket will not imply this partial-index predicate. PostgreSQL therefore cannot use these new indexes in the single-ledger-bucket case, which is a likely deployment shape for the targeted Deriv ledger and makes the migration ineffective for the sparse-wallet queries it is meant to help.

NumaryBot · 2026-06-18T23:58:04Z

-		return "metadata @> ?", []any{map[string]any{
-			match[0][1]: value,
-		}}, nil
+		if slices.Contains(h.store.ledger.GetIndexedMetadataKeys(), key) {


🟠 [major] Inline the indexed metadata key in the expression

When INDEXED_METADATA_KEYS is enabled, this emits metadata ->> $1 = $2 with the key as a bind parameter rather than the indexed expression metadata ->> 'source_wallet_id' = $1. That parameterized expression is not reliably matched to the new functional index on (metadata->>'source_wallet_id'), especially once prepared/generic plans are involved, so the optimization can silently miss the index for the sparse-wallet path.

coderabbitai

Actionable comments posted: 5

🧹 Nitpick comments (1)

internal/storage/ledger/transactions_metadata_index_test.go (1)

36-40: ⚡ Quick win

Add a regression test for comma+space feature values.

Please add a case like "source_wallet_id, destination_wallet_id" to ensure indexed-key detection remains robust for human-edited config strings.

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@internal/storage/ledger/transactions_metadata_index_test.go` around lines 36
- 40, Add a new test case in the transactions_metadata_index_test.go file that
uses the withIndexedMetadataKeys helper function with a comma+space separated
string value like "source_wallet_id, destination_wallet_id" to verify that the
indexed-key detection logic correctly handles human-edited configuration strings
with spaces following commas, not just comma-separated values without spaces.
This regression test should validate that the feature value parsing remains
robust for both formats.

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@internal/ledger.go`:
- Around line 41-50: The GetIndexedMetadataKeys method does not normalize the
tokens parsed from the comma-separated feature value, preserving whitespace
around delimiters. When splitting "source_wallet_id, destination_wallet_id" by
comma, the result includes " destination_wallet_id" with leading space, which
causes downstream slices.Contains checks to fail. Modify the return statement to
iterate through each token from strings.Split, trim whitespace from each token
using strings.TrimSpace, skip any empty entries after trimming, and return only
the cleaned, non-empty tokens.

In `@internal/storage/ledger/resource_transactions.go`:
- Around line 151-154: The code is checking if a key exists in indexed metadata
keys using slices.Contains with h.store.ledger.GetIndexedMetadataKeys(), but
indexed metadata keys may contain space padding (like "source_wallet_id,
destination_wallet_id"), causing the exact match to fail and silently skipping
the indexed branch. Normalize both the key parameter and the indexed metadata
keys by trimming whitespace before performing the membership check in the
slices.Contains call to ensure space-padded keys are properly matched.

In `@internal/storage/ledger/transactions_deriv_sim_test.go`:
- Around line 229-233: The totalRows variable obtained from getEnvInt for
DERIV_SIM_ROWS is used as a divisor in the selectivity calculation without
validation, which will cause a panic if set to zero and could produce misleading
results if set to an invalid value. Add validation after retrieving totalRows
and walletRows using getEnvInt to ensure totalRows is greater than zero,
walletRows is positive, and walletRows does not exceed totalRows. If validation
fails, either use sensible defaults or fail the test early with a clear error
message before attempting the division in the t.Logf call.
- Around line 193-210: The EXPLAIN query in the test hardcodes the filter value
directly into the SQL string using fmt.Sprintf with %s in the else branch of the
filter condition, but the actual runtime paginator uses parameterized queries
with placeholders. To make the test query shape-equivalent to the real paginator
query, modify the predicate construction in the else block to use a
parameterized placeholder (?) instead of embedding the value directly with
string formatting, so the EXPLAIN plan accurately reflects what the runtime
query path will execute.

In `@pkg/features/features.go`:
- Around line 26-31: The FeatureIndexedMetadataKeys flag accepts arbitrary
metadata key lists without validation, but downstream code in ResolveFilter
rewrites these keys using text extraction (metadata ->> ? = ?) which only works
correctly for string-valued metadata. This causes incorrect behavior for
numeric, boolean, or object metadata values. Add a dedicated validator or parser
function that processes the INDEXED_METADATA_KEYS configuration value
(referenced at lines 56 and 65-66) to enforce that only string-valued indexed
metadata keys are allowed, either by validating against a whitelist of known
string-indexed keys like wallet IDs or by implementing a contract validator that
ensures type safety for the configured keys.

---

Nitpick comments:
In `@internal/storage/ledger/transactions_metadata_index_test.go`:
- Around line 36-40: Add a new test case in the
transactions_metadata_index_test.go file that uses the withIndexedMetadataKeys
helper function with a comma+space separated string value like
"source_wallet_id, destination_wallet_id" to verify that the indexed-key
detection logic correctly handles human-edited configuration strings with spaces
following commas, not just comma-separated values without spaces. This
regression test should validate that the feature value parsing remains robust
for both formats.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 4764d899-30cb-4f91-b5c0-d40c0bd32f72

📥 Commits

Reviewing files that changed from the base of the PR and between c68e786 and 3efcdc3.

⛔ Files ignored due to path filters (2)

internal/storage/bucket/migrations/53-add-metadata-source-wallet-id-index-deriv/notes.yaml is excluded by !**/*.yaml
internal/storage/bucket/migrations/54-add-metadata-destination-wallet-id-index-deriv/notes.yaml is excluded by !**/*.yaml

📒 Files selected for processing (7)

internal/ledger.go
internal/storage/bucket/migrations/53-add-metadata-source-wallet-id-index-deriv/up.sql
internal/storage/bucket/migrations/54-add-metadata-destination-wallet-id-index-deriv/up.sql
internal/storage/ledger/resource_transactions.go
internal/storage/ledger/transactions_deriv_sim_test.go
internal/storage/ledger/transactions_metadata_index_test.go
pkg/features/features.go

coderabbitai · 2026-06-19T00:02:00Z

+// GetIndexedMetadataKeys returns the list of metadata keys for which the query
+// builder will emit a functional-index-compatible predicate (metadata ->> 'key' = 'value')
+// instead of the default JSONB containment form. The list is stored as a comma-separated
+// string in Features[FeatureIndexedMetadataKeys].
+func (l Ledger) GetIndexedMetadataKeys() []string {
+	val := l.Features[features.FeatureIndexedMetadataKeys]
+	if val == "" {
+		return nil
+	}
+	return strings.Split(val, ",")


⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Normalize tokens from the comma-separated feature value.

Line 50 preserves whitespace and empty tokens, so a common config like source_wallet_id, destination_wallet_id returns " destination_wallet_id" and the downstream exact slices.Contains(..., key) check will not enable the indexed predicate for that key. Trim and skip empty entries before returning.

Proposed parsing fix

func (l Ledger) GetIndexedMetadataKeys() []string { val := l.Features[features.FeatureIndexedMetadataKeys] if val == "" { return nil } - return strings.Split(val, ",") + + keys := make([]string, 0, strings.Count(val, ",")+1) + for _, key := range strings.Split(val, ",") { + key = strings.TrimSpace(key) + if key != "" { + keys = append(keys, key) + } + } + return keys }

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

// GetIndexedMetadataKeys returns the list of metadata keys for which the query

// builder will emit a functional-index-compatible predicate (metadata ->> 'key' = 'value')

// instead of the default JSONB containment form. The list is stored as a comma-separated

// string in Features[FeatureIndexedMetadataKeys].

func (l Ledger) GetIndexedMetadataKeys() []string {

val := l.Features[features.FeatureIndexedMetadataKeys]

if val == "" {

return nil

}

return strings.Split(val, ",")

// GetIndexedMetadataKeys returns the list of metadata keys for which the query

// builder will emit a functional-index-compatible predicate (metadata ->> 'key' = 'value')

// instead of the default JSONB containment form. The list is stored as a comma-separated

// string in Features[FeatureIndexedMetadataKeys].

func (l Ledger) GetIndexedMetadataKeys() []string {

val := l.Features[features.FeatureIndexedMetadataKeys]

if val == "" {

return nil

}

keys := make([]string, 0, strings.Count(val, ",")+1)

for _, key := range strings.Split(val, ",") {

key = strings.TrimSpace(key)

if key != "" {

keys = append(keys, key)

}

}

return keys

}

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@internal/ledger.go` around lines 41 - 50, The GetIndexedMetadataKeys method does not normalize the tokens parsed from the comma-separated feature value, preserving whitespace around delimiters. When splitting "source_wallet_id, destination_wallet_id" by comma, the result includes " destination_wallet_id" with leading space, which causes downstream slices.Contains checks to fail. Modify the return statement to iterate through each token from strings.Split, trim whitespace from each token using strings.TrimSpace, skip any empty entries after trimming, and return only the cleaned, non-empty tokens.

coderabbitai · 2026-06-19T00:02:00Z

+		if slices.Contains(h.store.ledger.GetIndexedMetadataKeys(), key) {
+			return "metadata ->> ? = ?", []any{key, value}, nil
+		}
+		return "metadata @> ?", []any{map[string]any{key: value}}, nil


⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Normalize indexed metadata keys before membership checks.

Space-padded feature values (for example "source_wallet_id, destination_wallet_id") won’t match key exactly, so the indexed branch is silently skipped.

Suggested fix

// internal/ledger.go func (l Ledger) GetIndexedMetadataKeys() []string { val := l.Features[features.FeatureIndexedMetadataKeys] if val == "" { return nil } - return strings.Split(val, ",") + raw := strings.Split(val, ",") + keys := make([]string, 0, len(raw)) + for _, k := range raw { + k = strings.TrimSpace(k) + if k != "" { + keys = append(keys, k) + } + } + return keys }

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

if slices.Contains(h.store.ledger.GetIndexedMetadataKeys(), key) {

return "metadata ->> ? = ?", []any{key, value}, nil

}

return "metadata @> ?", []any{map[string]any{key: value}}, nil

func (l Ledger) GetIndexedMetadataKeys() []string {

val := l.Features[features.FeatureIndexedMetadataKeys]

if val == "" {

return nil

}

raw := strings.Split(val, ",")

keys := make([]string, 0, len(raw))

for _, k := range raw {

k = strings.TrimSpace(k)

if k != "" {

keys = append(keys, k)

}

}

return keys

}

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@internal/storage/ledger/resource_transactions.go` around lines 151 - 154, The code is checking if a key exists in indexed metadata keys using slices.Contains with h.store.ledger.GetIndexedMetadataKeys(), but indexed metadata keys may contain space padding (like "source_wallet_id, destination_wallet_id"), causing the exact match to fail and silently skipping the indexed branch. Normalize both the key parameter and the indexed metadata keys by trimming whitespace before performing the membership check in the slices.Contains call to ensure space-padded keys are properly matched.

coderabbitai · 2026-06-19T00:02:00Z

+	// Replicate what resource_transactions.ResolveFilter + BuildDataset produce.
+	// ledgerName and value are test-controlled alphanumeric strings, safe to embed.
+	var predicate string
+	if filter == "@>" {
+		// %q adds Go double-quotes; JSON string values need double-quotes, so this is correct.
+		predicate = fmt.Sprintf(`metadata @> '{"source_wallet_id": %q}'`, value)
+	} else {
+		// SQL string literals use single quotes.
+		predicate = fmt.Sprintf(`metadata ->> 'source_wallet_id' = '%s'`, value)
+	}
+
+	sql := fmt.Sprintf(`
+		EXPLAIN (FORMAT TEXT)
+		SELECT id FROM %q.transactions
+		WHERE ledger = '%s'
+		AND %s
+		ORDER BY id DESC LIMIT 16
+	`, schema, ledgerName, predicate)


⚠️ Potential issue | 🟠 Major | ⚡ Quick win

plan_after is not shape-equivalent to the real paginator query.

This EXPLAIN path hardcodes metadata ->> 'source_wallet_id' = ..., but runtime filtering uses metadata ->> ? = ?. That can make the logged plan unrepresentative of the actual query path.

Suggested fix

func explainAnalyze(t *testing.T, store *ledgerstore.Store, filter string, value string) string { @@ - var predicate string - if filter == "@>" { - predicate = fmt.Sprintf(`metadata @> '{"source_wallet_id": %q}'`, value) - } else { - predicate = fmt.Sprintf(`metadata ->> 'source_wallet_id' = '%s'`, value) - } - - sql := fmt.Sprintf(` - EXPLAIN (FORMAT TEXT) - SELECT id FROM %q.transactions - WHERE ledger = '%s' - AND %s - ORDER BY id DESC LIMIT 16 - `, schema, ledgerName, predicate) - - rows, err := store.GetDB().QueryContext(ctx, sql) + var ( + sql string + args []any + ) + if filter == "@>" { + sql = fmt.Sprintf(` + EXPLAIN (FORMAT TEXT) + SELECT id FROM %q.transactions + WHERE ledger = $1 + AND metadata @> $2::jsonb + ORDER BY id DESC LIMIT 16 + `, schema) + args = []any{ledgerName, fmt.Sprintf(`{"source_wallet_id":%q}`, value)} + } else { + sql = fmt.Sprintf(` + EXPLAIN (FORMAT TEXT) + SELECT id FROM %q.transactions + WHERE ledger = $1 + AND metadata ->> $2 = $3 + ORDER BY id DESC LIMIT 16 + `, schema) + args = []any{ledgerName, "source_wallet_id", value} + } + + rows, err := store.GetDB().QueryContext(ctx, sql, args...) require.NoError(t, err)

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

// Replicate what resource_transactions.ResolveFilter + BuildDataset produce.

// ledgerName and value are test-controlled alphanumeric strings, safe to embed.

var predicate string

if filter == "@>" {

// %q adds Go double-quotes; JSON string values need double-quotes, so this is correct.

predicate = fmt.Sprintf(`metadata @> '{"source_wallet_id": %q}'`, value)

} else {

// SQL string literals use single quotes.

predicate = fmt.Sprintf(`metadata ->> 'source_wallet_id' = '%s'`, value)

}

sql := fmt.Sprintf(`

EXPLAIN (FORMAT TEXT)

SELECT id FROM %q.transactions

WHERE ledger = '%s'

AND %s

ORDER BY id DESC LIMIT 16

`, schema, ledgerName, predicate)

// Replicate what resource_transactions.ResolveFilter + BuildDataset produce.

// ledgerName and value are test-controlled alphanumeric strings, safe to embed.

var (

sql string

args []any

)

if filter == "@>" {

sql = fmt.Sprintf(`

EXPLAIN (FORMAT TEXT)

SELECT id FROM %q.transactions

WHERE ledger = $1

AND metadata @> $2::jsonb

ORDER BY id DESC LIMIT 16

`, schema)

args = []any{ledgerName, fmt.Sprintf(`{"source_wallet_id":%q}`, value)}

} else {

sql = fmt.Sprintf(`

EXPLAIN (FORMAT TEXT)

SELECT id FROM %q.transactions

WHERE ledger = $1

AND metadata ->> $2 = $3

ORDER BY id DESC LIMIT 16

`, schema)

args = []any{ledgerName, "source_wallet_id", value}

}

rows, err := store.GetDB().QueryContext(ctx, sql, args...)

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@internal/storage/ledger/transactions_deriv_sim_test.go` around lines 193 - 210, The EXPLAIN query in the test hardcodes the filter value directly into the SQL string using fmt.Sprintf with %s in the else branch of the filter condition, but the actual runtime paginator uses parameterized queries with placeholders. To make the test query shape-equivalent to the real paginator query, modify the predicate construction in the else block to use a parameterized placeholder (?) instead of embedding the value directly with string formatting, so the EXPLAIN plan accurately reflects what the runtime query path will execute.

coderabbitai · 2026-06-19T00:02:00Z

+	totalRows := getEnvInt("DERIV_SIM_ROWS", 1_000_000)
+	walletRows := getEnvInt("DERIV_SIM_WALLET", 50)
+
+	t.Logf("simulation: %d total rows, %d wallet rows (%.4f%% selectivity)",
+		totalRows, walletRows, float64(walletRows)/float64(totalRows)*100)


⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Validate DERIV_SIM_ROWS / DERIV_SIM_WALLET bounds before use.

totalRows is used as a divisor; DERIV_SIM_ROWS=0 will panic, and invalid ranges can silently produce misleading simulations.

Suggested guardrails

totalRows := getEnvInt("DERIV_SIM_ROWS", 1_000_000) walletRows := getEnvInt("DERIV_SIM_WALLET", 50) +require.Greater(t, totalRows, 0, "DERIV_SIM_ROWS must be > 0") +require.GreaterOrEqual(t, walletRows, 0, "DERIV_SIM_WALLET must be >= 0") +require.LessOrEqual(t, walletRows, totalRows, "DERIV_SIM_WALLET must be <= DERIV_SIM_ROWS")

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

totalRows := getEnvInt("DERIV_SIM_ROWS", 1_000_000)

walletRows := getEnvInt("DERIV_SIM_WALLET", 50)

t.Logf("simulation: %d total rows, %d wallet rows (%.4f%% selectivity)",

totalRows, walletRows, float64(walletRows)/float64(totalRows)*100)

totalRows := getEnvInt("DERIV_SIM_ROWS", 1_000_000)

walletRows := getEnvInt("DERIV_SIM_WALLET", 50)

require.Greater(t, totalRows, 0, "DERIV_SIM_ROWS must be > 0")

require.GreaterOrEqual(t, walletRows, 0, "DERIV_SIM_WALLET must be >= 0")

require.LessOrEqual(t, walletRows, totalRows, "DERIV_SIM_WALLET must be <= DERIV_SIM_ROWS")

t.Logf("simulation: %d total rows, %d wallet rows (%.4f%% selectivity)",

totalRows, walletRows, float64(walletRows)/float64(totalRows)*100)

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@internal/storage/ledger/transactions_deriv_sim_test.go` around lines 229 - 233, The totalRows variable obtained from getEnvInt for DERIV_SIM_ROWS is used as a divisor in the selectivity calculation without validation, which will cause a panic if set to zero and could produce misleading results if set to an invalid value. Add validation after retrieving totalRows and walletRows using getEnvInt to ensure totalRows is greater than zero, walletRows is positive, and walletRows does not exceed totalRows. If validation fails, either use sensible defaults or fail the test early with a clear error message before attempting the division in the t.Logf call.

coderabbitai · 2026-06-19T00:02:00Z

+	// FeatureIndexedMetadataKeys is a comma-separated list of metadata keys for which the query builder
+	// emits a functional-index-compatible predicate (metadata ->> 'key' = 'value') instead of the default
+	// JSONB containment form (metadata @> '{"key":"value"}'). A matching partial functional index must
+	// exist on the ledger's transactions table for the rewrite to actually speed up the query.
+	// Value: comma-separated key names, e.g. "source_wallet_id,destination_wallet_id". Empty = disabled.
+	FeatureIndexedMetadataKeys = "INDEXED_METADATA_KEYS"


⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Constrain this open-ended flag to string-valued indexed metadata keys.

The nil entry on Line 56 plus Line 65-Line 66 accepts arbitrary key lists. Downstream ResolveFilter rewrites every configured key to metadata ->> ? = ?, which is text extraction and is not equivalent to JSONB containment for numeric/bool/object metadata values; misconfigured keys can return different rows or fail instead of falling back. Add a dedicated validator/parser for INDEXED_METADATA_KEYS that enforces the intended string-valued indexed-key contract, or restrict it to the currently indexed wallet-id keys.

Also applies to: 56-56, 65-66

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@pkg/features/features.go` around lines 26 - 31, The FeatureIndexedMetadataKeys flag accepts arbitrary metadata key lists without validation, but downstream code in ResolveFilter rewrites these keys using text extraction (metadata ->> ? = ?) which only works correctly for string-valued metadata. This causes incorrect behavior for numeric, boolean, or object metadata values. Add a dedicated validator or parser function that processes the INDEXED_METADATA_KEYS configuration value (referenced at lines 56 and 65-66) to enforce that only string-valued indexed metadata keys are allowed, either by validating against a whitelist of known string-indexed keys like wallet IDs or by implementing a contract validator that ensures type safety for the configured keys.

ariel-formance requested a review from a team as a code owner June 11, 2026 11:45

coderabbitai Bot reviewed Jun 11, 2026

View reviewed changes

Comment thread internal/storage/ledger/store.go Outdated

Comment thread internal/storage/ledger/store.go Outdated

ariel-formance force-pushed the fix/v2.4/tx-list-planner-hints branch from 1012b5e to 577f762 Compare June 11, 2026 12:41

ariel-formance changed the title ~~fix(storage): add tx-list planner hints to mitigate sparse-wallet timeout~~ fix(storage): adaptive fallback for sparse-wallet transactions-list timeout Jun 11, 2026

coderabbitai Bot reviewed Jun 11, 2026

View reviewed changes

ariel-formance changed the base branch from release/v2.4 to main June 11, 2026 13:16

ariel-formance force-pushed the fix/v2.4/tx-list-planner-hints branch from 577f762 to e27296c Compare June 11, 2026 13:33

ariel-formance added 4 commits June 11, 2026 14:42

chore: go mod tidy — promote pgerrcode to direct dependency

09a7630

coderabbitai Bot reviewed Jun 11, 2026

View reviewed changes

test(storage): assert hook is called exactly once in CtxCancelAfterPr…

c68e786

…obe test

ariel-formance changed the title ~~fix(storage): adaptive fallback for sparse-wallet transactions-list timeout~~ perf(storage): sparse-wallet query fix — functional indexes + adaptive fallback Jun 18, 2026

NumaryBot reviewed Jun 18, 2026

View reviewed changes

coderabbitai Bot reviewed Jun 19, 2026

View reviewed changes

Conversation

ariel-formance commented Jun 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Background

Part 1 — Adaptive fallback (interim mitigation)

Part 2 — Functional-index routing (proper fix)

What changed

Simulation — 1M rows, 0.005% selectivity (Deriv production scale)

Activating for Deriv

Files changed (Part 2)

What this is

Background

Approach: probe-then-retry

Uh oh!

coderabbitai Bot commented Jun 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram

Estimated code review effort

Suggested reviewers

Poem

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

codecov Bot commented Jun 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

NumaryBot commented Jun 12, 2026

🛑 Changes requested — multi-model review

🟠 [major] testHookBeforePaginateSelect mutates shared Store without synchronization

🟠 [major] SQLSTATE 57014 covers client/admin cancellation as well as statement_timeout

🟠 [major] Error classification depends on raw *pgconn.PgError surviving the bun/paginate stack

🟠 [major] Zero-value TransactionListConfig silently disables the adaptive fallback for existing callers

🟡 [minor] No validation that probe+retry budgets are non-negative or fit within client timeout

🟡 [minor] timeoutMs=0 does not actually disable the statement timeout when the session already has one set

🟡 [minor] Fallback metric instruments may be nil if Store is constructed outside New()

🟡 [minor] In-transaction bypass of adaptive fallback is silent and unobservable

⚪ [nit] FirstAttemptTimeoutMs=0 silently makes the fallback unreachable

⚪ [nit] Probe timeout logged at error level for expected, handled condition

Uh oh!

NumaryBot left a comment

Choose a reason for hiding this comment

🛑 Changes requested — automated review

Uh oh!

NumaryBot Jun 18, 2026

Choose a reason for hiding this comment

Uh oh!

NumaryBot Jun 18, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Jun 19, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Jun 19, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Jun 19, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Jun 19, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Jun 19, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

ariel-formance commented Jun 11, 2026 •

edited

Loading

coderabbitai Bot commented Jun 11, 2026 •

edited

Loading

codecov Bot commented Jun 11, 2026 •

edited

Loading