Skip to content

fix: schema-aware Arrow struct accessor for partial nested projections#531

Open
wombatu-kun wants to merge 1 commit into
lance-format:mainfrom
wombatu-kun:fix/499-nested-struct-projection-schema-aware-accessor
Open

fix: schema-aware Arrow struct accessor for partial nested projections#531
wombatu-kun wants to merge 1 commit into
lance-format:mainfrom
wombatu-kun:fix/499-nested-struct-projection-schema-aware-accessor

Conversation

@wombatu-kun
Copy link
Copy Markdown
Contributor

Fixes #499 — partial projection of nested struct children (e.g. SELECT s.b, s.c FROM t over struct<a, b, c, d>) crashed the Lance vectorized reader with UnsupportedOperationException from ArrowVectorAccessor.getLong.

  • Lance's native scan does not push down nested struct projection, so the Arrow StructVector always carries all on-disk children in physical order; LanceStructAccessor was binding by physical Arrow ordinal but Spark's generated projection (and external consumers like Hudi's LanceRecordIterator) index by the pruned schema's ordinal — type mismatch.
  • Adds schema-aware constructors LanceArrowColumnVector(ValueVector, DataType) and LanceStructAccessor(StructVector, StructType) that bind Arrow children to a Spark StructType by name, recursing into nested structs. LanceFragmentColumnarBatchScanner.loadNextBatch threads the input-partition schema through so lance-spark's own scan also uses the schema-aware path.
  • Keeps ReadSchemaNestedStructWidening from fix: widen pruned nested struct schemas to preserve Arrow child ordinals #442 as defense-in-depth for the standard scan path.

@github-actions github-actions Bot added the bug Something isn't working label May 14, 2026
@wombatu-kun wombatu-kun force-pushed the fix/499-nested-struct-projection-schema-aware-accessor branch from 4321029 to cac7b86 Compare May 22, 2026 05:41
@wombatu-kun wombatu-kun force-pushed the fix/499-nested-struct-projection-schema-aware-accessor branch from cac7b86 to 45e6e0f Compare May 25, 2026 03:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Vectorized reader fails on partial nested-struct projection — UnsupportedOperationException in getLong

1 participant