docs(configs): START_COMMIT is exclusive, not inclusive#18955
Draft
yihua wants to merge 2 commits into
Draft
Conversation
Updates latest, 1.1.x, and 1.2.x configuration pages to reflect that Spark's incremental query treats the START_COMMIT option as exclusive (completion_time > START_COMMIT), matching the V1 relation's start- exclusive findInstantsInRange and the V2 relation's RangeType.OPEN_CLOSED.
Also updates docs for hoodie.datasource.read.incr.table.version and hoodie.datasource.read.streaming.table.version (they override the detected source table version and thus the time-semantics).
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Describe the issue this Pull Request addresses
The published configuration pages on the Hudi docs site say
New data written with completion_time >= START_COMMIT are fetched outforhoodie.datasource.read.begin.instanttime. This contradicts the actual runtime behavior, which treatsSTART_COMMITas exclusive:findInstantsInRange(start, end)which is(start, end].RangeType.OPEN_CLOSEDafter the apache/hudi PR that made the start commit exclusive.A companion PR in
apache/hudi(masterbranch) updates the underlyingDataSourceOptions.scalaconfig description.Summary and Changelog
Updates the latest, 1.1.x, and 1.2.x configuration pages to reflect that
START_COMMITis exclusive:>instead of>=, andstrictly afterinstead ofon or after. Six files touched (configurations.mdandbasic_configurations.mdfor each ofwebsite/docs,website/versioned_docs/version-1.1.1, andwebsite/versioned_docs/version-1.2.0).Impact
Documentation only. No code change.
Risk Level
none
Documentation Update
This PR is the documentation update.
Contributor's checklist