fix: advance _row_last_updated_at_version for UPDATE COLUMNS FROM#528
Open
jerryjch wants to merge 1 commit into
Open
fix: advance _row_last_updated_at_version for UPDATE COLUMNS FROM#528jerryjch wants to merge 1 commit into
jerryjch wants to merge 1 commit into
Conversation
…stable-row-id tables
Contributor
Author
|
cc @hamersaw |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Lance dependency — required to compile
Update.Builder.updatedFragmentOffsets(...)/Update.updatedFragmentOffsets()and JNIFromJava + IntoJava so the driver commit passes matched offsets into Rust.
Bump
lance.versioninpom.xmlto a release that includeslance#6748 before building or merging this PR; otherwise build fails on the new API calls.
Summary
UpdateColumnsWriter.processFragment: after eachfragment.updateColumns()call, readsresult.getUpdatedRowOffsets()and accumulates aMap<Long, long[]>of fragment id →matched physical row offsets.
TaskCommit: carries the per-fragment offset map alongside the existingupdatedFragmentsandfieldsModifiedfields.UpdateColumnsBackfillBatchWrite.commit(): merges offset maps from all task commits andpasses them to
Update.builder().updatedFragmentOffsets(...). Lance'sbuild_manifestthen calls the partial
_row_last_updated_at_versionrefresh only for the matched rows,leaving unmatched rows and untouched fragments unchanged.
BaseUpdateColumnsBackfillTest: flipstestUpdateColumnsPreservesCreatedAtAndAdvancesLastUpdatedWithStableRowIdsfrom the"known gap" pin (assertEquals, no change) to the correct assertion
(
assertTrue(after > before)); updates Javadoc accordingly.Background
UPDATE COLUMNS FROMrewrites column data in place via Lance'sOperation::UpdatewithRewriteColumnsmode. Lance'sbuild_manifestcan partially refresh_row_last_updated_at_versionfor only the matched rows — but only when theupdated_fragment_offsetsmap is non-empty on the commit. PreviouslyUpdateColumnsBackfillBatchWritenever populated this map, so the partial refresh neveractivated and
_row_last_updated_at_versionstayed stale after every UPDATE COLUMNScommit, breaking CDF consumers.
The matched row offsets are already computed inside Lance during
fragment.updateColumns()and surfaced via
FragmentUpdateResult.getUpdatedRowOffsets()(lance#6650). The missingpiece was wiring those offsets from the executor task result through
TaskCommitto thedriver commit, and then setting them on the
Updateoperation — which this PR does.Test plan
BaseUpdateColumnsBackfillTest#testUpdateColumnsPreservesCreatedAtAndAdvancesLastUpdatedWithStableRowIds— creates a stable-row-id table, runs UPDATE COLUMNS over all rows, and asserts
_row_last_updated_at_versionstrictly increases for each row while_row_created_at_versionis unchanged.