fix(datafusion): coerce filter literals for dictionary-encoded columns#7003
fix(datafusion): coerce filter literals for dictionary-encoded columns#7003valkum wants to merge 1 commit into
Conversation
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 7df6035e4c
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| if value.is_null() { | ||
| return Some(value.clone()); |
There was a problem hiding this comment.
Preserve dictionary type for typed null literals
When the filter literal is already a typed null, such as an API-built Expr::Literal(ScalarValue::Utf8(None), _) or an IN value that has been typed as Utf8, this early is_null() return leaves the literal as Utf8(None) instead of producing a Dictionary(Int16, Utf8) scalar for a dictionary column. resolve_value then installs that mismatched literal next to the dictionary column, so the same dictionary filter path fixed for non-null strings can still fail planning/evaluation for typed-null predicates; this guard should only preserve the untyped ScalarValue::Null case or coerce the inner null and wrap it in the dictionary type.
Useful? React with 👍 / 👎.
SQL string filters on a dictionary-encoded column (e.g. `Dictionary(Int16, Utf8)`) failed to plan with "could not convert to literal of type 'Dictionary(...)'" because `safe_coerce_scalar` had no arm for a dictionary target and no `ScalarValue::Dictionary` input arm. This also silently lost scalar-index pushdown for `=`/`IN` predicates on dictionary columns. Add two generic guard clauses to `safe_coerce_scalar`: unwrap a dictionary literal and coerce its inner value, and coerce a value to a dictionary target by recursing on the value type and re-wrapping. Nulls keep their untyped form, matching the existing behavior for all targets. Enabling pushdown exposed a `todo!()` in `OrderableScalarValue::cmp` for `Dictionary` vs `Dictionary` (scalar indexes store dictionary columns as `Dictionary` scalar keys in a BTreeMap), which would have turned a previously-working full-scan query into a panic. Implement the comparison by delegating to the inner values. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
|
Rebased on top of the recent BinaryView PR |
Codecov Report❌ Patch coverage is
📢 Thoughts on this report? Let us know! |
|
Currently working on the feedback from codex. |
Closes #7002
SQL string filters on a dictionary-encoded column (e.g.
Dictionary(Int16, Utf8)) failed to plan with "could not convert to literal of type 'Dictionary(...)'" becausesafe_coerce_scalarhad no arm for a dictionary target and noScalarValue::Dictionaryinput arm. This also silently lost scalar-index pushdown for=/INpredicates on dictionary columns.Add two generic guard clauses to
safe_coerce_scalar: unwrap a dictionary literal and coerce its inner value, and coerce a value to a dictionary target by recursing on the value type and re-wrapping. Nulls keep their untyped form, matching the existing behavior for all targets.Enabling pushdown exposed a
todo!()inOrderableScalarValue::cmpforDictionaryvsDictionary(scalar indexes store dictionary columns asDictionaryscalar keys in a BTreeMap), which would have turned a previously-working full-scan query into a panic. Implement the comparison by delegating to the inner values.This was created by Claude Code using Opus 4.8.