Skip to content

fix: advance _row_last_updated_at_version for UPDATE COLUMNS FROM#528

Open
jerryjch wants to merge 1 commit into
lance-format:mainfrom
jerryjch:update_columns_bump_row_last_updated
Open

fix: advance _row_last_updated_at_version for UPDATE COLUMNS FROM#528
jerryjch wants to merge 1 commit into
lance-format:mainfrom
jerryjch:update_columns_bump_row_last_updated

Conversation

@jerryjch
Copy link
Copy Markdown
Contributor

@jerryjch jerryjch commented May 13, 2026

Lance dependency — required to compile

Bump lance.version in pom.xml to a release that includes
lance#6748 before building or merging this PR; otherwise build fails on the new API calls.

Summary

  • Fixes UPDATE COLUMNS FROM: bump _row_last_updated_at_version for matched rows #418.
  • UpdateColumnsWriter.processFragment: after each fragment.updateColumns() call, reads
    result.getUpdatedRowOffsets() and accumulates a Map<Long, long[]> of fragment id →
    matched physical row offsets.
  • TaskCommit: carries the per-fragment offset map alongside the existing
    updatedFragments and fieldsModified fields.
  • UpdateColumnsBackfillBatchWrite.commit(): merges offset maps from all task commits and
    passes them to Update.builder().updatedFragmentOffsets(...). Lance's build_manifest
    then calls the partial _row_last_updated_at_version refresh only for the matched rows,
    leaving unmatched rows and untouched fragments unchanged.
  • BaseUpdateColumnsBackfillTest: flips
    testUpdateColumnsPreservesCreatedAtAndAdvancesLastUpdatedWithStableRowIds from the
    "known gap" pin (assertEquals, no change) to the correct assertion
    (assertTrue(after > before)); updates Javadoc accordingly.

Background

UPDATE COLUMNS FROM rewrites column data in place via Lance's Operation::Update with
RewriteColumns mode. Lance's build_manifest can partially refresh
_row_last_updated_at_version for only the matched rows — but only when the
updated_fragment_offsets map is non-empty on the commit. Previously
UpdateColumnsBackfillBatchWrite never populated this map, so the partial refresh never
activated and _row_last_updated_at_version stayed stale after every UPDATE COLUMNS
commit, breaking CDF consumers.

The matched row offsets are already computed inside Lance during fragment.updateColumns()
and surfaced via FragmentUpdateResult.getUpdatedRowOffsets() (lance#6650). The missing
piece was wiring those offsets from the executor task result through TaskCommit to the
driver commit, and then setting them on the Update operation — which this PR does.

Test plan

  • BaseUpdateColumnsBackfillTest#testUpdateColumnsPreservesCreatedAtAndAdvancesLastUpdatedWithStableRowIds
    — creates a stable-row-id table, runs UPDATE COLUMNS over all rows, and asserts
    _row_last_updated_at_version strictly increases for each row while
    _row_created_at_version is unchanged.

@github-actions github-actions Bot added the bug Something isn't working label May 13, 2026
@jerryjch
Copy link
Copy Markdown
Contributor Author

cc @hamersaw

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working

Projects

None yet

Development

Successfully merging this pull request may close these issues.

UPDATE COLUMNS FROM: bump _row_last_updated_at_version for matched rows

1 participant