Skip to content

[Kernel] Add data skipping support for IN predicate#6865

Open
nnguyen168 wants to merge 1 commit into
delta-io:masterfrom
nnguyen168:feature/kernel-in-data-skipping
Open

[Kernel] Add data skipping support for IN predicate#6865
nnguyen168 wants to merge 1 commit into
delta-io:masterfrom
nnguyen168:feature/kernel-in-data-skipping

Conversation

@nnguyen168
Copy link
Copy Markdown

Summary

  • Implement IN predicate data skipping in Delta Kernel to improve query performance when filtering with IN clauses
  • For column IN (v1, v2, ..., vn), check if file's min/max range overlaps with range spanned by IN-list values
  • Skip files where file_min > max(values) OR file_max < min(values)

Implementation Details

  • Filter out NULL values from the IN-list (they can't match anyway)
  • Find min and max of remaining literal values
  • Build data skipping predicate: file_min <= max_val AND file_max >= min_val
  • Return ALWAYS_FALSE if all values are NULL or list is empty

Follows the same approach used in delta-spark's DataFiltersBuilder.constructLiteralInListDataFilters.

Test plan

  • Added unit tests for IN data skipping in DataSkippingUtilsSuite
  • Tests cover: basic integers, single value, strings, non-eligible column, non-literal values, non-column first child
  • All existing tests pass (7 total)

Addresses #6864

Implement IN predicate data skipping in Delta Kernel to improve query
performance when filtering with IN clauses. For a predicate
`column IN (v1, v2, ..., vn)`, we check if the file's min/max range
overlaps with the range spanned by the IN-list values.

Data skipping logic:
- Filter out NULL values from the IN-list
- Find min_val = min(values) and max_val = max(values)
- Skip file if: file_min > max_val OR file_max < min_val
- Keep file if: file_min <= max_val AND file_max >= min_val

This follows the same approach used in delta-spark's
DataFiltersBuilder.constructLiteralInListDataFilters.

Addresses delta-io#6864
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant