feat(db): partial AND index optimization via residual predicate#1562
feat(db): partial AND index optimization via residual predicate#1562v-anton wants to merge 5 commits into
Conversation
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: defaults Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (5)
✅ Files skipped from review due to trivial changes (2)
🚧 Files skipped from review as they are similar to previous changes (3)
📝 WalkthroughWalkthroughAdds optional residual predicates to index optimization results, partitions AND branches into indexed vs residual parts, tightens OR to require fully-optimizable branches, rechecks residual predicates at emission time, and adds tests/benchmarks plus changesets. ChangesIndex Optimization Soundness
Sequence Diagram(s)sequenceDiagram
participant Query as Query (where)
participant Optimizer as optimizeAndExpression
participant Store as Collection Store
participant Evaluator as residualPredicate evaluator
participant Emitter as currentStateAsChanges
Query->>Optimizer: produce matchingKeys + optional residualPredicate
Optimizer-->>Query: matchingKeys + residualPredicate
Query->>Store: fetch values for matchingKeys
Store-->>Query: candidate rows
Query->>Evaluator: apply residualPredicate to candidates
Evaluator-->>Query: validated candidates
Query->>Emitter: emit insert for validated candidates
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes Possibly related issues
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
8a856f8 to
292bacc
Compare
There was a problem hiding this comment.
Actionable comments posted: 2
🧹 Nitpick comments (2)
packages/db/tests/collection-indexes.test.ts (1)
1095-1131: ⚡ Quick winAssert strict-OR fallback behavior explicitly, not only result correctness.
These tests prove row correctness, but they don’t prove the optimizer rejected partial OR optimization. Please wrap strict-OR cases with
withIndexTracking(...)and assert a full-scan path (shouldUseIndex: false,shouldUseFullScan: true) to lock in the intended contract.Suggested tightening pattern
- it(`should reject partial optimization inside OR (strict OR)`, () => { - const result = collection.currentStateAsChanges({ - where: or( - eq(new PropRef([`status`]), `active`), - eq(new PropRef([`name`]), `Alice`), - ), - })! + it(`should reject partial optimization inside OR (strict OR)`, () => { + withIndexTracking(collection, (tracker) => { + const result = collection.currentStateAsChanges({ + where: or( + eq(new PropRef([`status`]), `active`), + eq(new PropRef([`name`]), `Alice`), + ), + })! - expect(result).toHaveLength(3) - const names = result.map((r) => r.value.name).sort() - expect(names).toEqual([`Alice`, `Charlie`, `Eve`]) + expect(result).toHaveLength(3) + const names = result.map((r) => r.value.name).sort() + expect(names).toEqual([`Alice`, `Charlie`, `Eve`]) + expectIndexUsage(tracker.stats, { + shouldUseIndex: false, + shouldUseFullScan: true, + indexCallCount: 0, + fullScanCallCount: 1, + }) + }) })As per coding guidelines, "**/*.test.{ts,tsx,js}: Always add unit tests that reproduce a bug before fixing it to ensure the bug is fixed and prevent regression".
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@packages/db/tests/collection-indexes.test.ts` around lines 1095 - 1131, Wrap the two strict-OR tests that currently call collection.currentStateAsChanges(...) inside withIndexTracking(...) and assert the optimizer chose a full-scan path by checking the tracked plan fields (e.g., shouldUseIndex: false and shouldUseFullScan: true) so the tests verify the optimizer rejected partial OR optimization; locate the calls using symbols collection.currentStateAsChanges, or, and, eq, PropRef and replace/augment the test bodies to call withIndexTracking(() => collection.currentStateAsChanges(...)) and assert the tracking result contains shouldUseIndex:false and shouldUseFullScan:true in addition to the existing row assertions.packages/db/tests/index-optimization.bench.ts (1)
86-86: ⚡ Quick winRemove non-null assertion and handle undefined explicitly.
currentStateAsChanges()can returnundefinedwhenoptimizedOnly: trueand optimization fails. While this benchmark doesn't set that option, the!assertion masks potential issues and reduces defensive coding.🛡️ Proposed fix for explicit undefined handling
- const result = col.currentStateAsChanges({ where: whereExpr })! + const result = col.currentStateAsChanges({ where: whereExpr }) + if (!result) { + throw new Error( + `Bench setup soundness check failed for ${label}: ` + + `currentStateAsChanges returned undefined`, + ) + } if (result.length !== expectedMatches) {🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@packages/db/tests/index-optimization.bench.ts` at line 86, Remove the non-null assertion on the result of col.currentStateAsChanges({ where: whereExpr }) and explicitly handle the undefined case: call currentStateAsChanges without the trailing "!", assign to a const (e.g., result), then check if result === undefined and handle it appropriately for the benchmark (throw an error, fail the test, or skip the benchmark with a clear message). This change should reference the col.currentStateAsChanges call and the result variable so the code defensively handles the optimizedOnly failure path instead of masking it.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@packages/db/src/utils/index-optimization.ts`:
- Around line 433-465: The compound-range optimization path currently returns
only matchingKeys when optimizeCompoundRangeQuery(...) succeeds, allowing
non-range conjuncts to be bypassed; modify the success branch of
optimizeCompoundRangeQuery in optimizeQueryRecursive (and related callers) to
also compute and return a residualPredicate consisting of any
non-range/unevaluated child expressions (folding in partial residuals from
optimized children), e.g., build a residual from remaining branches (single
branch as-is or new Func('and', ...)) and include it in the returned object
alongside matchingKeys so mixed range+non-range ANDs are re-evaluated.
- Around line 522-523: canOptimizeOrExpression currently uses
canOptimizeExpression(arg) for each child which incorrectly returns true for
partial-AND children; change canOptimizeOrExpression so it also rejects any
child that can produce a residual. Concretely, keep the existing
canOptimizeExpression(arg) check but add a guard that if arg.kind === 'and' (or
use canOptimizeAndExpression) you require all AND branches to be optimizable
(not just one); i.e., reject the OR if any child is a partial AND or otherwise
can produce residuals so canOptimizeOrExpression and optimizeOrExpression remain
consistent.
---
Nitpick comments:
In `@packages/db/tests/collection-indexes.test.ts`:
- Around line 1095-1131: Wrap the two strict-OR tests that currently call
collection.currentStateAsChanges(...) inside withIndexTracking(...) and assert
the optimizer chose a full-scan path by checking the tracked plan fields (e.g.,
shouldUseIndex: false and shouldUseFullScan: true) so the tests verify the
optimizer rejected partial OR optimization; locate the calls using symbols
collection.currentStateAsChanges, or, and, eq, PropRef and replace/augment the
test bodies to call withIndexTracking(() =>
collection.currentStateAsChanges(...)) and assert the tracking result contains
shouldUseIndex:false and shouldUseFullScan:true in addition to the existing row
assertions.
In `@packages/db/tests/index-optimization.bench.ts`:
- Line 86: Remove the non-null assertion on the result of
col.currentStateAsChanges({ where: whereExpr }) and explicitly handle the
undefined case: call currentStateAsChanges without the trailing "!", assign to a
const (e.g., result), then check if result === undefined and handle it
appropriately for the benchmark (throw an error, fail the test, or skip the
benchmark with a clear message). This change should reference the
col.currentStateAsChanges call and the result variable so the code defensively
handles the optimizedOnly failure path instead of masking it.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 3f3b4d00-4b84-4442-bc61-b36191bfe2d1
📒 Files selected for processing (7)
.changeset/fix-or-partial-union.md.changeset/partial-and-residual-predicate.mdpackages/db/src/collection/change-events.tspackages/db/src/utils/index-optimization.tspackages/db/tests/collection-indexes.test.tspackages/db/tests/index-optimization.bench.tspackages/db/tests/query/or-partial-optimization.test.ts
292bacc to
3b47dbd
Compare
|
Addressed CodeRabbit review in 3b47dbd:
Also restructured the PR description to match the repo template. Suite: 2379 passed (5 skipped). Bench reruns confirm ~125× partial-index speedup over full scan. |
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (1)
packages/db/tests/collection-indexes.test.ts (1)
1078-1093: ⚡ Quick winAssert the optimization path, not only the final rows.
This test currently passes even if execution falls back to full scan. Add index-tracking assertions so it specifically guards the OR-hoist + residual optimization behavior.
Proposed test hardening
it(`should hoist nested-OR result into AND with unindexed residual`, () => { - // or(status='active', status='pending') is fully indexed → 4 candidates. - // name='Alice' is the AND residual, narrows to Alice only. - const result = collection.currentStateAsChanges({ - where: and( - or( - eq(new PropRef([`status`]), `active`), - eq(new PropRef([`status`]), `pending`), - ), - eq(new PropRef([`name`]), `Alice`), - ), - })! - - expect(result).toHaveLength(1) - expect(result[0]?.value.name).toBe(`Alice`) + // or(status='active', status='pending') is fully indexed → 4 candidates. + // name='Alice' is the AND residual, narrows to Alice only. + withIndexTracking(collection, (tracker) => { + const result = collection.currentStateAsChanges({ + where: and( + or( + eq(new PropRef([`status`]), `active`), + eq(new PropRef([`status`]), `pending`), + ), + eq(new PropRef([`name`]), `Alice`), + ), + })! + + expect(result).toHaveLength(1) + expect(result[0]?.value.name).toBe(`Alice`) + + expectIndexUsage(tracker.stats, { + shouldUseIndex: true, + shouldUseFullScan: false, + indexCallCount: 2, + fullScanCallCount: 0, + }) + }) })🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@packages/db/tests/collection-indexes.test.ts` around lines 1078 - 1093, The test verifies final rows but not that the OR-hoist + residual index optimization was used; update the test (the case named "should hoist nested-OR result into AND with unindexed residual") to assert the query plan/plan metadata in addition to results: call the same predicate through whatever explain API exists (e.g., collection.explainQuery(...) or inspect result.queryPlan/result._meta) when invoking collection.currentStateAsChanges with and/or/eq/PropRef and assert the plan shows the OR hoist (i.e., the OR on PropRef([`status`]) produced 4 indexed candidates) and that name (`PropRef(['name'])`) is treated as the unindexed residual, failing the test if the plan falls back to a full scan. Ensure these checks reference collection.currentStateAsChanges and the same predicate construction so the test protects the optimization path.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@packages/db/tests/collection-indexes.test.ts`:
- Around line 1175-1179: Update the incorrect test comment that states the age
range matches 3 rows—change it to reflect 4 rows (Alice, Bob, Charlie, Diana) in
the block around collection.currentStateAsChanges; this is the comment
referencing optimizeCompoundRangeQuery and optimizeAndExpression in the test, so
simply edit that comment to say "age range matches 4 rows" (or list the four
names) to match the actual test data.
---
Nitpick comments:
In `@packages/db/tests/collection-indexes.test.ts`:
- Around line 1078-1093: The test verifies final rows but not that the OR-hoist
+ residual index optimization was used; update the test (the case named "should
hoist nested-OR result into AND with unindexed residual") to assert the query
plan/plan metadata in addition to results: call the same predicate through
whatever explain API exists (e.g., collection.explainQuery(...) or inspect
result.queryPlan/result._meta) when invoking collection.currentStateAsChanges
with and/or/eq/PropRef and assert the plan shows the OR hoist (i.e., the OR on
PropRef([`status`]) produced 4 indexed candidates) and that name
(`PropRef(['name'])`) is treated as the unindexed residual, failing the test if
the plan falls back to a full scan. Ensure these checks reference
collection.currentStateAsChanges and the same predicate construction so the test
protects the optimization path.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: c7e6bf10-e586-4449-b2ca-5a93385b724c
📒 Files selected for processing (5)
.changeset/partial-and-residual-predicate.mdpackages/db/src/collection/change-events.tspackages/db/src/utils/index-optimization.tspackages/db/tests/collection-indexes.test.tspackages/db/tests/index-optimization.bench.ts
✅ Files skipped from review due to trivial changes (1)
- .changeset/partial-and-residual-predicate.md
🚧 Files skipped from review as they are similar to previous changes (2)
- packages/db/src/collection/change-events.ts
- packages/db/tests/index-optimization.bench.ts
Fix soundness bug where `optimizeAndExpression` returned the intersection of index-optimizable branches and the caller treated those keys as exact matches, silently dropping the constraint from unindexed branches. Add an optional `residualPredicate` field to `OptimizationResult`. When set, `currentStateAsChanges` re-evaluates the predicate against each candidate and drops rows that fail. AND now collects non-optimizable branches (and any residuals returned by partially-optimized children) into a single residual, preserving the index-narrowed candidate set while restoring correctness. OR rejects partial optimization (including children with residuals) — an unindexed OR branch can match any row, so the partial result would be unsound. Benchmark on a 10k-row collection (and(eq(age,25), eq(name,X))): - full scan (no indexes): 145 ops/sec - partial index (age only, residual): 17,360 ops/sec (~120× faster) - full index (age + name): 277,956 ops/sec (~16× over partial)
3b47dbd to
ca2128b
Compare
🎯 Changes
Fixes a soundness bug in
optimizeAndExpressionwhere partial AND optimization returned the intersection of only the index-optimizable branches and the caller treated those keys as exact matches, silently dropping the constraint from unindexed branches. The bug was masked today by the existing partial-AND test only because the test data had a single row peragevalue.The fix adds candidate-set semantics: indexed branches narrow the candidate set; unindexed branches are re-evaluated as a residual predicate against that set. Standard query-optimizer pattern, and for the in-memory case it preserves the index speedup almost entirely.
API change — additive, non-breaking:
When
residualPredicateis set, the caller MUST evaluate it against eachmatchingKeysentry and drop rows that fail. Whenundefined,matchingKeysis exact.Optimizer:
matchingKeys= intersection of optimizable branches' key sets. Residual =and(...childResiduals, ...nonOptimizableBranches)(or the single element directly, orundefined).optimizeCompoundRangeQueryfuses range ops on one field inside a larger AND, the args it consumed are tracked, and the remaining AND args participate in the residual. Previously the compound-range path short-circuited and bypassed residual composition entirely (e.g.and(gte(age, 18), lte(age, 65), eq(name, 'Alice'))silently dropped the name constraint).canFullyOptimizeExpressionhelper expresses the strict-OR contract (an AND child must have all branches optimizable, not just one);canOptimizeOrExpressionandoptimizeOrExpressionboth use it. The runtime path now short-circuits up front via this predicate, so no wasted index lookups occur when the OR will ultimately reject — strict-OR tests assertshouldUseIndex: false, fullScanCallCount: 1.Caller (
currentStateAsChanges) builds a filter fromresidualPredicate(if any) and drops candidates that fail it.Why this shape for an in-memory DB: dominant cost is per-row JS predicate evaluation, not I/O. Index lookups against in-memory
Map/BTreeare cheap enough that taking the full intersection up front is fine — no cost-based "most-selective driver" needed. The saving comes from skipping residual evaluation on theN − |matchingKeys|rows the index already eliminated. For OR with a non-indexed branch, you'd still have to scan the complement to find matches, so partial OR is intentionally out of scope.Benchmark (
tests/index-optimization.bench.ts, 10k rows,and(eq(age, 25), eq(name, 'user-2500'))):Tests (added to
tests/collection-indexes.test.ts): hardened soundness for partial AND, multi-arity AND, compound-range with non-range residual, nested OR-inside-AND with residual, strict-OR rejecting both leaf-unindexed and AND-with-residual children (both assert no index calls + 1 full scan), and a regression guard that fully indexed AND produces no residual. All 2379 tests pass (5 skipped); 0 lint errors; types clean.✅ Checklist
pnpm test.🚀 Release Impact
Summary by CodeRabbit
Bug Fixes
Tests