From 8cc619e7b9305c5b3e4f4c094a12d5221b2da0af Mon Sep 17 00:00:00 2001 From: Y Ethan Guo Date: Tue, 9 Jun 2026 20:06:51 -0700 Subject: [PATCH 1/2] docs(configs): START_COMMIT is exclusive, not inclusive Updates latest, 1.1.x, and 1.2.x configuration pages to reflect that Spark's incremental query treats the START_COMMIT option as exclusive (completion_time > START_COMMIT), matching the V1 relation's start- exclusive findInstantsInRange and the V2 relation's RangeType.OPEN_CLOSED. --- website/docs/basic_configurations.md | 2 +- website/docs/configurations.md | 2 +- website/versioned_docs/version-1.1.1/basic_configurations.md | 2 +- website/versioned_docs/version-1.1.1/configurations.md | 2 +- website/versioned_docs/version-1.2.0/basic_configurations.md | 2 +- website/versioned_docs/version-1.2.0/configurations.md | 2 +- 6 files changed, 6 insertions(+), 6 deletions(-) diff --git a/website/docs/basic_configurations.md b/website/docs/basic_configurations.md index 4352039ebc074..a98f253435f6d 100644 --- a/website/docs/basic_configurations.md +++ b/website/docs/basic_configurations.md @@ -93,7 +93,7 @@ Options useful for reading tables via `read.format.option(...)` | Config Name | Default | Description | | -------------------------------------------------------------------------------------------- | --------- | --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | -| [hoodie.datasource.read.begin.instanttime](#hoodiedatasourcereadbegininstanttime) | (N/A) | Required when `hoodie.datasource.query.type` is set to `incremental`. Represents the completion time to start incrementally pulling data from. The completion time here need not necessarily correspond to an instant on the timeline. New data written with completion_time >= START_COMMIT are fetched out. For e.g: ‘20170901080000’ will get all new data written on or after Sep 1, 2017 08:00AM.
`Config Param: START_COMMIT`
`Since Version: 0.9.0` | +| [hoodie.datasource.read.begin.instanttime](#hoodiedatasourcereadbegininstanttime) | (N/A) | Required when `hoodie.datasource.query.type` is set to `incremental`. Represents the completion time to start incrementally pulling data from (exclusive). The completion time here need not necessarily correspond to an instant on the timeline. New data written with completion_time > START_COMMIT are fetched out. For e.g: ‘20170901080000’ will get all new data written strictly after Sep 1, 2017 08:00AM.
`Config Param: START_COMMIT`
`Since Version: 0.9.0` | | [hoodie.datasource.read.end.instanttime](#hoodiedatasourcereadendinstanttime) | (N/A) | Used when `hoodie.datasource.query.type` is set to `incremental`. Represents the completion time to limit incrementally fetched data to. When not specified latest commit completion time from timeline is assumed by default. When specified, new data written with completion_time <= END_COMMIT are fetched out. Point in time type queries make more sense with begin and end completion times specified.
`Config Param: END_COMMIT`
`Since Version: 0.9.0` | | [hoodie.datasource.read.incr.table.version](#hoodiedatasourcereadincrtableversion) | (N/A) | The table version assumed for incremental read
`Config Param: INCREMENTAL_READ_TABLE_VERSION`
`Since Version: 1.0.0` | | [hoodie.datasource.read.streaming.table.version](#hoodiedatasourcereadstreamingtableversion) | (N/A) | The table version assumed for streaming read
`Config Param: STREAMING_READ_TABLE_VERSION`
`Since Version: 1.0.0` | diff --git a/website/docs/configurations.md b/website/docs/configurations.md index 9d0caf4000856..3b6f5fa5f5b27 100644 --- a/website/docs/configurations.md +++ b/website/docs/configurations.md @@ -123,7 +123,7 @@ Options useful for reading tables via `read.format.option(...)` | Config Name | Default | Description | | -------------------------------------------------------------------------------------------- | --------- | --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | -| [hoodie.datasource.read.begin.instanttime](#hoodiedatasourcereadbegininstanttime) | (N/A) | Required when `hoodie.datasource.query.type` is set to `incremental`. Represents the completion time to start incrementally pulling data from. The completion time here need not necessarily correspond to an instant on the timeline. New data written with completion_time >= START_COMMIT are fetched out. For e.g: ‘20170901080000’ will get all new data written on or after Sep 1, 2017 08:00AM.
`Config Param: START_COMMIT`
`Since Version: 0.9.0` | +| [hoodie.datasource.read.begin.instanttime](#hoodiedatasourcereadbegininstanttime) | (N/A) | Required when `hoodie.datasource.query.type` is set to `incremental`. Represents the completion time to start incrementally pulling data from (exclusive). The completion time here need not necessarily correspond to an instant on the timeline. New data written with completion_time > START_COMMIT are fetched out. For e.g: ‘20170901080000’ will get all new data written strictly after Sep 1, 2017 08:00AM.
`Config Param: START_COMMIT`
`Since Version: 0.9.0` | | [hoodie.datasource.read.end.instanttime](#hoodiedatasourcereadendinstanttime) | (N/A) | Used when `hoodie.datasource.query.type` is set to `incremental`. Represents the completion time to limit incrementally fetched data to. When not specified latest commit completion time from timeline is assumed by default. When specified, new data written with completion_time <= END_COMMIT are fetched out. Point in time type queries make more sense with begin and end completion times specified.
`Config Param: END_COMMIT`
`Since Version: 0.9.0` | | [hoodie.datasource.read.incr.table.version](#hoodiedatasourcereadincrtableversion) | (N/A) | The table version assumed for incremental read
`Config Param: INCREMENTAL_READ_TABLE_VERSION`
`Since Version: 1.0.0` | | [hoodie.datasource.read.streaming.table.version](#hoodiedatasourcereadstreamingtableversion) | (N/A) | The table version assumed for streaming read
`Config Param: STREAMING_READ_TABLE_VERSION`
`Since Version: 1.0.0` | diff --git a/website/versioned_docs/version-1.1.1/basic_configurations.md b/website/versioned_docs/version-1.1.1/basic_configurations.md index 88b08ed148222..d0671c712033b 100644 --- a/website/versioned_docs/version-1.1.1/basic_configurations.md +++ b/website/versioned_docs/version-1.1.1/basic_configurations.md @@ -92,7 +92,7 @@ Options useful for reading tables via `read.format.option(...)` | Config Name | Default | Description | | -------------------------------------------------------------------------------------------- | --------- | --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | -| [hoodie.datasource.read.begin.instanttime](#hoodiedatasourcereadbegininstanttime) | (N/A) | Required when `hoodie.datasource.query.type` is set to `incremental`. Represents the completion time to start incrementally pulling data from. The completion time here need not necessarily correspond to an instant on the timeline. New data written with completion_time >= START_COMMIT are fetched out. For e.g: ‘20170901080000’ will get all new data written on or after Sep 1, 2017 08:00AM. Accepted formats: `yyyyMMddHHmmss[SSS]`, `yyyy-MM-dd`, `yyyy-MM-dd HH:mm:ss[.SSS]`, `yyyy-MM-ddTHH:mm:ss[.SSS]`, epoch seconds (10-digit), epoch millis (13-digit), or `earliest`. Invalid values throw an error immediately.
`Config Param: START_COMMIT` | +| [hoodie.datasource.read.begin.instanttime](#hoodiedatasourcereadbegininstanttime) | (N/A) | Required when `hoodie.datasource.query.type` is set to `incremental`. Represents the completion time to start incrementally pulling data from (exclusive). The completion time here need not necessarily correspond to an instant on the timeline. New data written with completion_time > START_COMMIT are fetched out. For e.g: ‘20170901080000’ will get all new data written strictly after Sep 1, 2017 08:00AM. Accepted formats: `yyyyMMddHHmmss[SSS]`, `yyyy-MM-dd`, `yyyy-MM-dd HH:mm:ss[.SSS]`, `yyyy-MM-ddTHH:mm:ss[.SSS]`, epoch seconds (10-digit), epoch millis (13-digit), or `earliest`. Invalid values throw an error immediately.
`Config Param: START_COMMIT` | | [hoodie.datasource.read.end.instanttime](#hoodiedatasourcereadendinstanttime) | (N/A) | Used when `hoodie.datasource.query.type` is set to `incremental`. Represents the completion time to limit incrementally fetched data to. When not specified latest commit completion time from timeline is assumed by default. When specified, new data written with completion_time <= END_COMMIT are fetched out. Point in time type queries make more sense with begin and end completion times specified. Accepted formats: `yyyyMMddHHmmss[SSS]`, `yyyy-MM-dd`, `yyyy-MM-dd HH:mm:ss[.SSS]`, `yyyy-MM-ddTHH:mm:ss[.SSS]`, epoch seconds (10-digit), epoch millis (13-digit), or `earliest`. Invalid values throw an error immediately.
`Config Param: END_COMMIT` | | [hoodie.datasource.read.incr.table.version](#hoodiedatasourcereadincrtableversion) | (N/A) | The table version assumed for incremental read
`Config Param: INCREMENTAL_READ_TABLE_VERSION` | | [hoodie.datasource.read.streaming.table.version](#hoodiedatasourcereadstreamingtableversion) | (N/A) | The table version assumed for streaming read
`Config Param: STREAMING_READ_TABLE_VERSION` | diff --git a/website/versioned_docs/version-1.1.1/configurations.md b/website/versioned_docs/version-1.1.1/configurations.md index 101c5e1aa547d..ba374289ba419 100644 --- a/website/versioned_docs/version-1.1.1/configurations.md +++ b/website/versioned_docs/version-1.1.1/configurations.md @@ -120,7 +120,7 @@ Options useful for reading tables via `read.format.option(...)` | Config Name | Default | Description | | -------------------------------------------------------------------------------------------- | --------- | --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | -| [hoodie.datasource.read.begin.instanttime](#hoodiedatasourcereadbegininstanttime) | (N/A) | Required when `hoodie.datasource.query.type` is set to `incremental`. Represents the completion time to start incrementally pulling data from. The completion time here need not necessarily correspond to an instant on the timeline. New data written with completion_time >= START_COMMIT are fetched out. For e.g: ‘20170901080000’ will get all new data written on or after Sep 1, 2017 08:00AM. Accepted formats: `yyyyMMddHHmmss[SSS]`, `yyyy-MM-dd`, `yyyy-MM-dd HH:mm:ss[.SSS]`, `yyyy-MM-ddTHH:mm:ss[.SSS]`, epoch seconds (10-digit), epoch millis (13-digit), or `earliest`. Invalid values throw an error immediately.
`Config Param: START_COMMIT` | +| [hoodie.datasource.read.begin.instanttime](#hoodiedatasourcereadbegininstanttime) | (N/A) | Required when `hoodie.datasource.query.type` is set to `incremental`. Represents the completion time to start incrementally pulling data from (exclusive). The completion time here need not necessarily correspond to an instant on the timeline. New data written with completion_time > START_COMMIT are fetched out. For e.g: ‘20170901080000’ will get all new data written strictly after Sep 1, 2017 08:00AM. Accepted formats: `yyyyMMddHHmmss[SSS]`, `yyyy-MM-dd`, `yyyy-MM-dd HH:mm:ss[.SSS]`, `yyyy-MM-ddTHH:mm:ss[.SSS]`, epoch seconds (10-digit), epoch millis (13-digit), or `earliest`. Invalid values throw an error immediately.
`Config Param: START_COMMIT` | | [hoodie.datasource.read.end.instanttime](#hoodiedatasourcereadendinstanttime) | (N/A) | Used when `hoodie.datasource.query.type` is set to `incremental`. Represents the completion time to limit incrementally fetched data to. When not specified latest commit completion time from timeline is assumed by default. When specified, new data written with completion_time <= END_COMMIT are fetched out. Point in time type queries make more sense with begin and end completion times specified. Accepted formats: `yyyyMMddHHmmss[SSS]`, `yyyy-MM-dd`, `yyyy-MM-dd HH:mm:ss[.SSS]`, `yyyy-MM-ddTHH:mm:ss[.SSS]`, epoch seconds (10-digit), epoch millis (13-digit), or `earliest`. Invalid values throw an error immediately.
`Config Param: END_COMMIT` | | [hoodie.datasource.read.incr.table.version](#hoodiedatasourcereadincrtableversion) | (N/A) | The table version assumed for incremental read
`Config Param: INCREMENTAL_READ_TABLE_VERSION` | | [hoodie.datasource.read.streaming.table.version](#hoodiedatasourcereadstreamingtableversion) | (N/A) | The table version assumed for streaming read
`Config Param: STREAMING_READ_TABLE_VERSION` | diff --git a/website/versioned_docs/version-1.2.0/basic_configurations.md b/website/versioned_docs/version-1.2.0/basic_configurations.md index 4352039ebc074..a98f253435f6d 100644 --- a/website/versioned_docs/version-1.2.0/basic_configurations.md +++ b/website/versioned_docs/version-1.2.0/basic_configurations.md @@ -93,7 +93,7 @@ Options useful for reading tables via `read.format.option(...)` | Config Name | Default | Description | | -------------------------------------------------------------------------------------------- | --------- | --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | -| [hoodie.datasource.read.begin.instanttime](#hoodiedatasourcereadbegininstanttime) | (N/A) | Required when `hoodie.datasource.query.type` is set to `incremental`. Represents the completion time to start incrementally pulling data from. The completion time here need not necessarily correspond to an instant on the timeline. New data written with completion_time >= START_COMMIT are fetched out. For e.g: ‘20170901080000’ will get all new data written on or after Sep 1, 2017 08:00AM.
`Config Param: START_COMMIT`
`Since Version: 0.9.0` | +| [hoodie.datasource.read.begin.instanttime](#hoodiedatasourcereadbegininstanttime) | (N/A) | Required when `hoodie.datasource.query.type` is set to `incremental`. Represents the completion time to start incrementally pulling data from (exclusive). The completion time here need not necessarily correspond to an instant on the timeline. New data written with completion_time > START_COMMIT are fetched out. For e.g: ‘20170901080000’ will get all new data written strictly after Sep 1, 2017 08:00AM.
`Config Param: START_COMMIT`
`Since Version: 0.9.0` | | [hoodie.datasource.read.end.instanttime](#hoodiedatasourcereadendinstanttime) | (N/A) | Used when `hoodie.datasource.query.type` is set to `incremental`. Represents the completion time to limit incrementally fetched data to. When not specified latest commit completion time from timeline is assumed by default. When specified, new data written with completion_time <= END_COMMIT are fetched out. Point in time type queries make more sense with begin and end completion times specified.
`Config Param: END_COMMIT`
`Since Version: 0.9.0` | | [hoodie.datasource.read.incr.table.version](#hoodiedatasourcereadincrtableversion) | (N/A) | The table version assumed for incremental read
`Config Param: INCREMENTAL_READ_TABLE_VERSION`
`Since Version: 1.0.0` | | [hoodie.datasource.read.streaming.table.version](#hoodiedatasourcereadstreamingtableversion) | (N/A) | The table version assumed for streaming read
`Config Param: STREAMING_READ_TABLE_VERSION`
`Since Version: 1.0.0` | diff --git a/website/versioned_docs/version-1.2.0/configurations.md b/website/versioned_docs/version-1.2.0/configurations.md index 9d0caf4000856..3b6f5fa5f5b27 100644 --- a/website/versioned_docs/version-1.2.0/configurations.md +++ b/website/versioned_docs/version-1.2.0/configurations.md @@ -123,7 +123,7 @@ Options useful for reading tables via `read.format.option(...)` | Config Name | Default | Description | | -------------------------------------------------------------------------------------------- | --------- | --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | -| [hoodie.datasource.read.begin.instanttime](#hoodiedatasourcereadbegininstanttime) | (N/A) | Required when `hoodie.datasource.query.type` is set to `incremental`. Represents the completion time to start incrementally pulling data from. The completion time here need not necessarily correspond to an instant on the timeline. New data written with completion_time >= START_COMMIT are fetched out. For e.g: ‘20170901080000’ will get all new data written on or after Sep 1, 2017 08:00AM.
`Config Param: START_COMMIT`
`Since Version: 0.9.0` | +| [hoodie.datasource.read.begin.instanttime](#hoodiedatasourcereadbegininstanttime) | (N/A) | Required when `hoodie.datasource.query.type` is set to `incremental`. Represents the completion time to start incrementally pulling data from (exclusive). The completion time here need not necessarily correspond to an instant on the timeline. New data written with completion_time > START_COMMIT are fetched out. For e.g: ‘20170901080000’ will get all new data written strictly after Sep 1, 2017 08:00AM.
`Config Param: START_COMMIT`
`Since Version: 0.9.0` | | [hoodie.datasource.read.end.instanttime](#hoodiedatasourcereadendinstanttime) | (N/A) | Used when `hoodie.datasource.query.type` is set to `incremental`. Represents the completion time to limit incrementally fetched data to. When not specified latest commit completion time from timeline is assumed by default. When specified, new data written with completion_time <= END_COMMIT are fetched out. Point in time type queries make more sense with begin and end completion times specified.
`Config Param: END_COMMIT`
`Since Version: 0.9.0` | | [hoodie.datasource.read.incr.table.version](#hoodiedatasourcereadincrtableversion) | (N/A) | The table version assumed for incremental read
`Config Param: INCREMENTAL_READ_TABLE_VERSION`
`Since Version: 1.0.0` | | [hoodie.datasource.read.streaming.table.version](#hoodiedatasourcereadstreamingtableversion) | (N/A) | The table version assumed for streaming read
`Config Param: STREAMING_READ_TABLE_VERSION`
`Since Version: 1.0.0` | From b2a33e4562e6b7cc3e975bf53e11ffcbfbf249f6 Mon Sep 17 00:00:00 2001 From: Y Ethan Guo Date: Tue, 9 Jun 2026 21:07:54 -0700 Subject: [PATCH 2/2] Clarify START_COMMIT/END_COMMIT semantics by source table version Also updates docs for hoodie.datasource.read.incr.table.version and hoodie.datasource.read.streaming.table.version (they override the detected source table version and thus the time-semantics). --- website/docs/basic_configurations.md | 8 ++++---- website/docs/configurations.md | 8 ++++---- .../versioned_docs/version-1.1.1/basic_configurations.md | 8 ++++---- website/versioned_docs/version-1.1.1/configurations.md | 8 ++++---- .../versioned_docs/version-1.2.0/basic_configurations.md | 8 ++++---- website/versioned_docs/version-1.2.0/configurations.md | 8 ++++---- 6 files changed, 24 insertions(+), 24 deletions(-) diff --git a/website/docs/basic_configurations.md b/website/docs/basic_configurations.md index a98f253435f6d..1134c09c862c4 100644 --- a/website/docs/basic_configurations.md +++ b/website/docs/basic_configurations.md @@ -93,10 +93,10 @@ Options useful for reading tables via `read.format.option(...)` | Config Name | Default | Description | | -------------------------------------------------------------------------------------------- | --------- | --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | -| [hoodie.datasource.read.begin.instanttime](#hoodiedatasourcereadbegininstanttime) | (N/A) | Required when `hoodie.datasource.query.type` is set to `incremental`. Represents the completion time to start incrementally pulling data from (exclusive). The completion time here need not necessarily correspond to an instant on the timeline. New data written with completion_time > START_COMMIT are fetched out. For e.g: ‘20170901080000’ will get all new data written strictly after Sep 1, 2017 08:00AM.
`Config Param: START_COMMIT`
`Since Version: 0.9.0` | -| [hoodie.datasource.read.end.instanttime](#hoodiedatasourcereadendinstanttime) | (N/A) | Used when `hoodie.datasource.query.type` is set to `incremental`. Represents the completion time to limit incrementally fetched data to. When not specified latest commit completion time from timeline is assumed by default. When specified, new data written with completion_time <= END_COMMIT are fetched out. Point in time type queries make more sense with begin and end completion times specified.
`Config Param: END_COMMIT`
`Since Version: 0.9.0` | -| [hoodie.datasource.read.incr.table.version](#hoodiedatasourcereadincrtableversion) | (N/A) | The table version assumed for incremental read
`Config Param: INCREMENTAL_READ_TABLE_VERSION`
`Since Version: 1.0.0` | -| [hoodie.datasource.read.streaming.table.version](#hoodiedatasourcereadstreamingtableversion) | (N/A) | The table version assumed for streaming read
`Config Param: STREAMING_READ_TABLE_VERSION`
`Since Version: 1.0.0` | +| [hoodie.datasource.read.begin.instanttime](#hoodiedatasourcereadbegininstanttime) | (N/A) | Required when `hoodie.datasource.query.type` is set to `incremental`. The start point (exclusive) to begin incrementally pulling data from. The semantics depend on the effective table version (overridable via `hoodie.datasource.read.incr.table.version` for incremental reads or `hoodie.datasource.read.streaming.table.version` for streaming reads; otherwise the source table's actual version): version 8 or later treats this as a completion time, earlier versions (e.g., version 6) treat it as a requested time (instant time). The value need not necessarily correspond to an instant on the timeline. New data written strictly after START_COMMIT are fetched out. For e.g. ‘20170901080000’ will get all new data written strictly after Sep 1, 2017 08:00AM.
`Config Param: START_COMMIT`
`Since Version: 0.9.0` | +| [hoodie.datasource.read.end.instanttime](#hoodiedatasourcereadendinstanttime) | (N/A) | Used when `hoodie.datasource.query.type` is set to `incremental`. The end point (inclusive) to limit incrementally fetched data to. Same time-semantics rules as START_COMMIT: version 8 or later treats this as a completion time, earlier versions (e.g., version 6) treat it as a requested time (overridable via `hoodie.datasource.read.incr.table.version` or `hoodie.datasource.read.streaming.table.version`). When not specified, the latest committed instant from the timeline is used. Point in time type queries make more sense with both begin and end specified.
`Config Param: END_COMMIT`
`Since Version: 0.9.0` | +| [hoodie.datasource.read.incr.table.version](#hoodiedatasourcereadincrtableversion) | (N/A) | Overrides the table version assumed for incremental reads. Version 8+ selects the V2 incremental relation (completion-time based START_COMMIT/END_COMMIT); earlier versions select the V1 relation (requested-time based). If unset, the source table's actual version is used.
`Config Param: INCREMENTAL_READ_TABLE_VERSION`
`Since Version: 1.0.0` | +| [hoodie.datasource.read.streaming.table.version](#hoodiedatasourcereadstreamingtableversion) | (N/A) | Overrides the table version assumed for streaming reads. Version 8+ selects HoodieStreamSourceV2 (completion-time based START_COMMIT/END_COMMIT); earlier versions select HoodieStreamSourceV1 (requested-time based). If unset, the source table's actual version is used.
`Config Param: STREAMING_READ_TABLE_VERSION`
`Since Version: 1.0.0` | | [hoodie.datasource.write.precombine.field](#hoodiedatasourcewriteprecombinefield) | (N/A) | Comma separated list of fields used in preCombining before actual write. When two records have the same key value, we will pick the one with the largest value for the precombine field, determined by Object.compareTo(..). For multiple fields if first key comparison is same, second key comparison is made and so on. This config is used for combining records within the same batch and also for merging using event time merge mode
`Config Param: READ_PRE_COMBINE_FIELD` | | [hoodie.datasource.query.type](#hoodiedatasourcequerytype) | snapshot | Whether data needs to be read, in `incremental` mode (new data since an instantTime) (or) `read_optimized` mode (obtain latest view, based on base files) (or) `snapshot` mode (obtain latest view, by merging base and (if any) log files)
`Config Param: QUERY_TYPE`
`Since Version: 0.9.0` | --- diff --git a/website/docs/configurations.md b/website/docs/configurations.md index 3b6f5fa5f5b27..a40d3412b1ffb 100644 --- a/website/docs/configurations.md +++ b/website/docs/configurations.md @@ -123,10 +123,10 @@ Options useful for reading tables via `read.format.option(...)` | Config Name | Default | Description | | -------------------------------------------------------------------------------------------- | --------- | --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | -| [hoodie.datasource.read.begin.instanttime](#hoodiedatasourcereadbegininstanttime) | (N/A) | Required when `hoodie.datasource.query.type` is set to `incremental`. Represents the completion time to start incrementally pulling data from (exclusive). The completion time here need not necessarily correspond to an instant on the timeline. New data written with completion_time > START_COMMIT are fetched out. For e.g: ‘20170901080000’ will get all new data written strictly after Sep 1, 2017 08:00AM.
`Config Param: START_COMMIT`
`Since Version: 0.9.0` | -| [hoodie.datasource.read.end.instanttime](#hoodiedatasourcereadendinstanttime) | (N/A) | Used when `hoodie.datasource.query.type` is set to `incremental`. Represents the completion time to limit incrementally fetched data to. When not specified latest commit completion time from timeline is assumed by default. When specified, new data written with completion_time <= END_COMMIT are fetched out. Point in time type queries make more sense with begin and end completion times specified.
`Config Param: END_COMMIT`
`Since Version: 0.9.0` | -| [hoodie.datasource.read.incr.table.version](#hoodiedatasourcereadincrtableversion) | (N/A) | The table version assumed for incremental read
`Config Param: INCREMENTAL_READ_TABLE_VERSION`
`Since Version: 1.0.0` | -| [hoodie.datasource.read.streaming.table.version](#hoodiedatasourcereadstreamingtableversion) | (N/A) | The table version assumed for streaming read
`Config Param: STREAMING_READ_TABLE_VERSION`
`Since Version: 1.0.0` | +| [hoodie.datasource.read.begin.instanttime](#hoodiedatasourcereadbegininstanttime) | (N/A) | Required when `hoodie.datasource.query.type` is set to `incremental`. The start point (exclusive) to begin incrementally pulling data from. The semantics depend on the effective table version (overridable via `hoodie.datasource.read.incr.table.version` for incremental reads or `hoodie.datasource.read.streaming.table.version` for streaming reads; otherwise the source table's actual version): version 8 or later treats this as a completion time, earlier versions (e.g., version 6) treat it as a requested time (instant time). The value need not necessarily correspond to an instant on the timeline. New data written strictly after START_COMMIT are fetched out. For e.g. ‘20170901080000’ will get all new data written strictly after Sep 1, 2017 08:00AM.
`Config Param: START_COMMIT`
`Since Version: 0.9.0` | +| [hoodie.datasource.read.end.instanttime](#hoodiedatasourcereadendinstanttime) | (N/A) | Used when `hoodie.datasource.query.type` is set to `incremental`. The end point (inclusive) to limit incrementally fetched data to. Same time-semantics rules as START_COMMIT: version 8 or later treats this as a completion time, earlier versions (e.g., version 6) treat it as a requested time (overridable via `hoodie.datasource.read.incr.table.version` or `hoodie.datasource.read.streaming.table.version`). When not specified, the latest committed instant from the timeline is used. Point in time type queries make more sense with both begin and end specified.
`Config Param: END_COMMIT`
`Since Version: 0.9.0` | +| [hoodie.datasource.read.incr.table.version](#hoodiedatasourcereadincrtableversion) | (N/A) | Overrides the table version assumed for incremental reads. Version 8+ selects the V2 incremental relation (completion-time based START_COMMIT/END_COMMIT); earlier versions select the V1 relation (requested-time based). If unset, the source table's actual version is used.
`Config Param: INCREMENTAL_READ_TABLE_VERSION`
`Since Version: 1.0.0` | +| [hoodie.datasource.read.streaming.table.version](#hoodiedatasourcereadstreamingtableversion) | (N/A) | Overrides the table version assumed for streaming reads. Version 8+ selects HoodieStreamSourceV2 (completion-time based START_COMMIT/END_COMMIT); earlier versions select HoodieStreamSourceV1 (requested-time based). If unset, the source table's actual version is used.
`Config Param: STREAMING_READ_TABLE_VERSION`
`Since Version: 1.0.0` | | [hoodie.datasource.write.precombine.field](#hoodiedatasourcewriteprecombinefield) | (N/A) | Comma separated list of fields used in preCombining before actual write. When two records have the same key value, we will pick the one with the largest value for the precombine field, determined by Object.compareTo(..). For multiple fields if first key comparison is same, second key comparison is made and so on. This config is used for combining records within the same batch and also for merging using event time merge mode
`Config Param: READ_PRE_COMBINE_FIELD` | | [hoodie.datasource.query.type](#hoodiedatasourcequerytype) | snapshot | Whether data needs to be read, in `incremental` mode (new data since an instantTime) (or) `read_optimized` mode (obtain latest view, based on base files) (or) `snapshot` mode (obtain latest view, by merging base and (if any) log files)
`Config Param: QUERY_TYPE`
`Since Version: 0.9.0` | diff --git a/website/versioned_docs/version-1.1.1/basic_configurations.md b/website/versioned_docs/version-1.1.1/basic_configurations.md index d0671c712033b..0e429db40c24d 100644 --- a/website/versioned_docs/version-1.1.1/basic_configurations.md +++ b/website/versioned_docs/version-1.1.1/basic_configurations.md @@ -92,10 +92,10 @@ Options useful for reading tables via `read.format.option(...)` | Config Name | Default | Description | | -------------------------------------------------------------------------------------------- | --------- | --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | -| [hoodie.datasource.read.begin.instanttime](#hoodiedatasourcereadbegininstanttime) | (N/A) | Required when `hoodie.datasource.query.type` is set to `incremental`. Represents the completion time to start incrementally pulling data from (exclusive). The completion time here need not necessarily correspond to an instant on the timeline. New data written with completion_time > START_COMMIT are fetched out. For e.g: ‘20170901080000’ will get all new data written strictly after Sep 1, 2017 08:00AM. Accepted formats: `yyyyMMddHHmmss[SSS]`, `yyyy-MM-dd`, `yyyy-MM-dd HH:mm:ss[.SSS]`, `yyyy-MM-ddTHH:mm:ss[.SSS]`, epoch seconds (10-digit), epoch millis (13-digit), or `earliest`. Invalid values throw an error immediately.
`Config Param: START_COMMIT` | -| [hoodie.datasource.read.end.instanttime](#hoodiedatasourcereadendinstanttime) | (N/A) | Used when `hoodie.datasource.query.type` is set to `incremental`. Represents the completion time to limit incrementally fetched data to. When not specified latest commit completion time from timeline is assumed by default. When specified, new data written with completion_time <= END_COMMIT are fetched out. Point in time type queries make more sense with begin and end completion times specified. Accepted formats: `yyyyMMddHHmmss[SSS]`, `yyyy-MM-dd`, `yyyy-MM-dd HH:mm:ss[.SSS]`, `yyyy-MM-ddTHH:mm:ss[.SSS]`, epoch seconds (10-digit), epoch millis (13-digit), or `earliest`. Invalid values throw an error immediately.
`Config Param: END_COMMIT` | -| [hoodie.datasource.read.incr.table.version](#hoodiedatasourcereadincrtableversion) | (N/A) | The table version assumed for incremental read
`Config Param: INCREMENTAL_READ_TABLE_VERSION` | -| [hoodie.datasource.read.streaming.table.version](#hoodiedatasourcereadstreamingtableversion) | (N/A) | The table version assumed for streaming read
`Config Param: STREAMING_READ_TABLE_VERSION` | +| [hoodie.datasource.read.begin.instanttime](#hoodiedatasourcereadbegininstanttime) | (N/A) | Required when `hoodie.datasource.query.type` is set to `incremental`. The start point (exclusive) to begin incrementally pulling data from. The semantics depend on the effective table version (overridable via `hoodie.datasource.read.incr.table.version` for incremental reads or `hoodie.datasource.read.streaming.table.version` for streaming reads; otherwise the source table's actual version): version 8 or later treats this as a completion time, earlier versions (e.g., version 6) treat it as a requested time (instant time). The value need not necessarily correspond to an instant on the timeline. New data written strictly after START_COMMIT are fetched out. For e.g. ‘20170901080000’ will get all new data written strictly after Sep 1, 2017 08:00AM. Accepted formats: `yyyyMMddHHmmss[SSS]`, `yyyy-MM-dd`, `yyyy-MM-dd HH:mm:ss[.SSS]`, `yyyy-MM-ddTHH:mm:ss[.SSS]`, epoch seconds (10-digit), epoch millis (13-digit), or `earliest`. Invalid values throw an error immediately.
`Config Param: START_COMMIT` | +| [hoodie.datasource.read.end.instanttime](#hoodiedatasourcereadendinstanttime) | (N/A) | Used when `hoodie.datasource.query.type` is set to `incremental`. The end point (inclusive) to limit incrementally fetched data to. Same time-semantics rules as START_COMMIT: version 8 or later treats this as a completion time, earlier versions (e.g., version 6) treat it as a requested time (overridable via `hoodie.datasource.read.incr.table.version` or `hoodie.datasource.read.streaming.table.version`). When not specified, the latest committed instant from the timeline is used. Point in time type queries make more sense with both begin and end specified. Accepted formats: `yyyyMMddHHmmss[SSS]`, `yyyy-MM-dd`, `yyyy-MM-dd HH:mm:ss[.SSS]`, `yyyy-MM-ddTHH:mm:ss[.SSS]`, epoch seconds (10-digit), epoch millis (13-digit), or `earliest`. Invalid values throw an error immediately.
`Config Param: END_COMMIT` | +| [hoodie.datasource.read.incr.table.version](#hoodiedatasourcereadincrtableversion) | (N/A) | Overrides the table version assumed for incremental reads. Version 8+ selects the V2 incremental relation (completion-time based START_COMMIT/END_COMMIT); earlier versions select the V1 relation (requested-time based). If unset, the source table's actual version is used.
`Config Param: INCREMENTAL_READ_TABLE_VERSION` | +| [hoodie.datasource.read.streaming.table.version](#hoodiedatasourcereadstreamingtableversion) | (N/A) | Overrides the table version assumed for streaming reads. Version 8+ selects HoodieStreamSourceV2 (completion-time based START_COMMIT/END_COMMIT); earlier versions select HoodieStreamSourceV1 (requested-time based). If unset, the source table's actual version is used.
`Config Param: STREAMING_READ_TABLE_VERSION` | | [hoodie.datasource.query.type](#hoodiedatasourcequerytype) | snapshot | Whether data needs to be read, in `incremental` mode (new data since an instantTime) (or) `read_optimized` mode (obtain latest view, based on base files) (or) `snapshot` mode (obtain latest view, by merging base and (if any) log files)
`Config Param: QUERY_TYPE` | --- diff --git a/website/versioned_docs/version-1.1.1/configurations.md b/website/versioned_docs/version-1.1.1/configurations.md index ba374289ba419..7ad2891ed83bf 100644 --- a/website/versioned_docs/version-1.1.1/configurations.md +++ b/website/versioned_docs/version-1.1.1/configurations.md @@ -120,10 +120,10 @@ Options useful for reading tables via `read.format.option(...)` | Config Name | Default | Description | | -------------------------------------------------------------------------------------------- | --------- | --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | -| [hoodie.datasource.read.begin.instanttime](#hoodiedatasourcereadbegininstanttime) | (N/A) | Required when `hoodie.datasource.query.type` is set to `incremental`. Represents the completion time to start incrementally pulling data from (exclusive). The completion time here need not necessarily correspond to an instant on the timeline. New data written with completion_time > START_COMMIT are fetched out. For e.g: ‘20170901080000’ will get all new data written strictly after Sep 1, 2017 08:00AM. Accepted formats: `yyyyMMddHHmmss[SSS]`, `yyyy-MM-dd`, `yyyy-MM-dd HH:mm:ss[.SSS]`, `yyyy-MM-ddTHH:mm:ss[.SSS]`, epoch seconds (10-digit), epoch millis (13-digit), or `earliest`. Invalid values throw an error immediately.
`Config Param: START_COMMIT` | -| [hoodie.datasource.read.end.instanttime](#hoodiedatasourcereadendinstanttime) | (N/A) | Used when `hoodie.datasource.query.type` is set to `incremental`. Represents the completion time to limit incrementally fetched data to. When not specified latest commit completion time from timeline is assumed by default. When specified, new data written with completion_time <= END_COMMIT are fetched out. Point in time type queries make more sense with begin and end completion times specified. Accepted formats: `yyyyMMddHHmmss[SSS]`, `yyyy-MM-dd`, `yyyy-MM-dd HH:mm:ss[.SSS]`, `yyyy-MM-ddTHH:mm:ss[.SSS]`, epoch seconds (10-digit), epoch millis (13-digit), or `earliest`. Invalid values throw an error immediately.
`Config Param: END_COMMIT` | -| [hoodie.datasource.read.incr.table.version](#hoodiedatasourcereadincrtableversion) | (N/A) | The table version assumed for incremental read
`Config Param: INCREMENTAL_READ_TABLE_VERSION` | -| [hoodie.datasource.read.streaming.table.version](#hoodiedatasourcereadstreamingtableversion) | (N/A) | The table version assumed for streaming read
`Config Param: STREAMING_READ_TABLE_VERSION` | +| [hoodie.datasource.read.begin.instanttime](#hoodiedatasourcereadbegininstanttime) | (N/A) | Required when `hoodie.datasource.query.type` is set to `incremental`. The start point (exclusive) to begin incrementally pulling data from. The semantics depend on the effective table version (overridable via `hoodie.datasource.read.incr.table.version` for incremental reads or `hoodie.datasource.read.streaming.table.version` for streaming reads; otherwise the source table's actual version): version 8 or later treats this as a completion time, earlier versions (e.g., version 6) treat it as a requested time (instant time). The value need not necessarily correspond to an instant on the timeline. New data written strictly after START_COMMIT are fetched out. For e.g. ‘20170901080000’ will get all new data written strictly after Sep 1, 2017 08:00AM. Accepted formats: `yyyyMMddHHmmss[SSS]`, `yyyy-MM-dd`, `yyyy-MM-dd HH:mm:ss[.SSS]`, `yyyy-MM-ddTHH:mm:ss[.SSS]`, epoch seconds (10-digit), epoch millis (13-digit), or `earliest`. Invalid values throw an error immediately.
`Config Param: START_COMMIT` | +| [hoodie.datasource.read.end.instanttime](#hoodiedatasourcereadendinstanttime) | (N/A) | Used when `hoodie.datasource.query.type` is set to `incremental`. The end point (inclusive) to limit incrementally fetched data to. Same time-semantics rules as START_COMMIT: version 8 or later treats this as a completion time, earlier versions (e.g., version 6) treat it as a requested time (overridable via `hoodie.datasource.read.incr.table.version` or `hoodie.datasource.read.streaming.table.version`). When not specified, the latest committed instant from the timeline is used. Point in time type queries make more sense with both begin and end specified. Accepted formats: `yyyyMMddHHmmss[SSS]`, `yyyy-MM-dd`, `yyyy-MM-dd HH:mm:ss[.SSS]`, `yyyy-MM-ddTHH:mm:ss[.SSS]`, epoch seconds (10-digit), epoch millis (13-digit), or `earliest`. Invalid values throw an error immediately.
`Config Param: END_COMMIT` | +| [hoodie.datasource.read.incr.table.version](#hoodiedatasourcereadincrtableversion) | (N/A) | Overrides the table version assumed for incremental reads. Version 8+ selects the V2 incremental relation (completion-time based START_COMMIT/END_COMMIT); earlier versions select the V1 relation (requested-time based). If unset, the source table's actual version is used.
`Config Param: INCREMENTAL_READ_TABLE_VERSION` | +| [hoodie.datasource.read.streaming.table.version](#hoodiedatasourcereadstreamingtableversion) | (N/A) | Overrides the table version assumed for streaming reads. Version 8+ selects HoodieStreamSourceV2 (completion-time based START_COMMIT/END_COMMIT); earlier versions select HoodieStreamSourceV1 (requested-time based). If unset, the source table's actual version is used.
`Config Param: STREAMING_READ_TABLE_VERSION` | | [hoodie.datasource.query.type](#hoodiedatasourcequerytype) | snapshot | Whether data needs to be read, in `incremental` mode (new data since an instantTime) (or) `read_optimized` mode (obtain latest view, based on base files) (or) `snapshot` mode (obtain latest view, by merging base and (if any) log files)
`Config Param: QUERY_TYPE` | [**Advanced Configs**](#Read-Options-advanced-configs) diff --git a/website/versioned_docs/version-1.2.0/basic_configurations.md b/website/versioned_docs/version-1.2.0/basic_configurations.md index a98f253435f6d..1134c09c862c4 100644 --- a/website/versioned_docs/version-1.2.0/basic_configurations.md +++ b/website/versioned_docs/version-1.2.0/basic_configurations.md @@ -93,10 +93,10 @@ Options useful for reading tables via `read.format.option(...)` | Config Name | Default | Description | | -------------------------------------------------------------------------------------------- | --------- | --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | -| [hoodie.datasource.read.begin.instanttime](#hoodiedatasourcereadbegininstanttime) | (N/A) | Required when `hoodie.datasource.query.type` is set to `incremental`. Represents the completion time to start incrementally pulling data from (exclusive). The completion time here need not necessarily correspond to an instant on the timeline. New data written with completion_time > START_COMMIT are fetched out. For e.g: ‘20170901080000’ will get all new data written strictly after Sep 1, 2017 08:00AM.
`Config Param: START_COMMIT`
`Since Version: 0.9.0` | -| [hoodie.datasource.read.end.instanttime](#hoodiedatasourcereadendinstanttime) | (N/A) | Used when `hoodie.datasource.query.type` is set to `incremental`. Represents the completion time to limit incrementally fetched data to. When not specified latest commit completion time from timeline is assumed by default. When specified, new data written with completion_time <= END_COMMIT are fetched out. Point in time type queries make more sense with begin and end completion times specified.
`Config Param: END_COMMIT`
`Since Version: 0.9.0` | -| [hoodie.datasource.read.incr.table.version](#hoodiedatasourcereadincrtableversion) | (N/A) | The table version assumed for incremental read
`Config Param: INCREMENTAL_READ_TABLE_VERSION`
`Since Version: 1.0.0` | -| [hoodie.datasource.read.streaming.table.version](#hoodiedatasourcereadstreamingtableversion) | (N/A) | The table version assumed for streaming read
`Config Param: STREAMING_READ_TABLE_VERSION`
`Since Version: 1.0.0` | +| [hoodie.datasource.read.begin.instanttime](#hoodiedatasourcereadbegininstanttime) | (N/A) | Required when `hoodie.datasource.query.type` is set to `incremental`. The start point (exclusive) to begin incrementally pulling data from. The semantics depend on the effective table version (overridable via `hoodie.datasource.read.incr.table.version` for incremental reads or `hoodie.datasource.read.streaming.table.version` for streaming reads; otherwise the source table's actual version): version 8 or later treats this as a completion time, earlier versions (e.g., version 6) treat it as a requested time (instant time). The value need not necessarily correspond to an instant on the timeline. New data written strictly after START_COMMIT are fetched out. For e.g. ‘20170901080000’ will get all new data written strictly after Sep 1, 2017 08:00AM.
`Config Param: START_COMMIT`
`Since Version: 0.9.0` | +| [hoodie.datasource.read.end.instanttime](#hoodiedatasourcereadendinstanttime) | (N/A) | Used when `hoodie.datasource.query.type` is set to `incremental`. The end point (inclusive) to limit incrementally fetched data to. Same time-semantics rules as START_COMMIT: version 8 or later treats this as a completion time, earlier versions (e.g., version 6) treat it as a requested time (overridable via `hoodie.datasource.read.incr.table.version` or `hoodie.datasource.read.streaming.table.version`). When not specified, the latest committed instant from the timeline is used. Point in time type queries make more sense with both begin and end specified.
`Config Param: END_COMMIT`
`Since Version: 0.9.0` | +| [hoodie.datasource.read.incr.table.version](#hoodiedatasourcereadincrtableversion) | (N/A) | Overrides the table version assumed for incremental reads. Version 8+ selects the V2 incremental relation (completion-time based START_COMMIT/END_COMMIT); earlier versions select the V1 relation (requested-time based). If unset, the source table's actual version is used.
`Config Param: INCREMENTAL_READ_TABLE_VERSION`
`Since Version: 1.0.0` | +| [hoodie.datasource.read.streaming.table.version](#hoodiedatasourcereadstreamingtableversion) | (N/A) | Overrides the table version assumed for streaming reads. Version 8+ selects HoodieStreamSourceV2 (completion-time based START_COMMIT/END_COMMIT); earlier versions select HoodieStreamSourceV1 (requested-time based). If unset, the source table's actual version is used.
`Config Param: STREAMING_READ_TABLE_VERSION`
`Since Version: 1.0.0` | | [hoodie.datasource.write.precombine.field](#hoodiedatasourcewriteprecombinefield) | (N/A) | Comma separated list of fields used in preCombining before actual write. When two records have the same key value, we will pick the one with the largest value for the precombine field, determined by Object.compareTo(..). For multiple fields if first key comparison is same, second key comparison is made and so on. This config is used for combining records within the same batch and also for merging using event time merge mode
`Config Param: READ_PRE_COMBINE_FIELD` | | [hoodie.datasource.query.type](#hoodiedatasourcequerytype) | snapshot | Whether data needs to be read, in `incremental` mode (new data since an instantTime) (or) `read_optimized` mode (obtain latest view, based on base files) (or) `snapshot` mode (obtain latest view, by merging base and (if any) log files)
`Config Param: QUERY_TYPE`
`Since Version: 0.9.0` | --- diff --git a/website/versioned_docs/version-1.2.0/configurations.md b/website/versioned_docs/version-1.2.0/configurations.md index 3b6f5fa5f5b27..a40d3412b1ffb 100644 --- a/website/versioned_docs/version-1.2.0/configurations.md +++ b/website/versioned_docs/version-1.2.0/configurations.md @@ -123,10 +123,10 @@ Options useful for reading tables via `read.format.option(...)` | Config Name | Default | Description | | -------------------------------------------------------------------------------------------- | --------- | --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | -| [hoodie.datasource.read.begin.instanttime](#hoodiedatasourcereadbegininstanttime) | (N/A) | Required when `hoodie.datasource.query.type` is set to `incremental`. Represents the completion time to start incrementally pulling data from (exclusive). The completion time here need not necessarily correspond to an instant on the timeline. New data written with completion_time > START_COMMIT are fetched out. For e.g: ‘20170901080000’ will get all new data written strictly after Sep 1, 2017 08:00AM.
`Config Param: START_COMMIT`
`Since Version: 0.9.0` | -| [hoodie.datasource.read.end.instanttime](#hoodiedatasourcereadendinstanttime) | (N/A) | Used when `hoodie.datasource.query.type` is set to `incremental`. Represents the completion time to limit incrementally fetched data to. When not specified latest commit completion time from timeline is assumed by default. When specified, new data written with completion_time <= END_COMMIT are fetched out. Point in time type queries make more sense with begin and end completion times specified.
`Config Param: END_COMMIT`
`Since Version: 0.9.0` | -| [hoodie.datasource.read.incr.table.version](#hoodiedatasourcereadincrtableversion) | (N/A) | The table version assumed for incremental read
`Config Param: INCREMENTAL_READ_TABLE_VERSION`
`Since Version: 1.0.0` | -| [hoodie.datasource.read.streaming.table.version](#hoodiedatasourcereadstreamingtableversion) | (N/A) | The table version assumed for streaming read
`Config Param: STREAMING_READ_TABLE_VERSION`
`Since Version: 1.0.0` | +| [hoodie.datasource.read.begin.instanttime](#hoodiedatasourcereadbegininstanttime) | (N/A) | Required when `hoodie.datasource.query.type` is set to `incremental`. The start point (exclusive) to begin incrementally pulling data from. The semantics depend on the effective table version (overridable via `hoodie.datasource.read.incr.table.version` for incremental reads or `hoodie.datasource.read.streaming.table.version` for streaming reads; otherwise the source table's actual version): version 8 or later treats this as a completion time, earlier versions (e.g., version 6) treat it as a requested time (instant time). The value need not necessarily correspond to an instant on the timeline. New data written strictly after START_COMMIT are fetched out. For e.g. ‘20170901080000’ will get all new data written strictly after Sep 1, 2017 08:00AM.
`Config Param: START_COMMIT`
`Since Version: 0.9.0` | +| [hoodie.datasource.read.end.instanttime](#hoodiedatasourcereadendinstanttime) | (N/A) | Used when `hoodie.datasource.query.type` is set to `incremental`. The end point (inclusive) to limit incrementally fetched data to. Same time-semantics rules as START_COMMIT: version 8 or later treats this as a completion time, earlier versions (e.g., version 6) treat it as a requested time (overridable via `hoodie.datasource.read.incr.table.version` or `hoodie.datasource.read.streaming.table.version`). When not specified, the latest committed instant from the timeline is used. Point in time type queries make more sense with both begin and end specified.
`Config Param: END_COMMIT`
`Since Version: 0.9.0` | +| [hoodie.datasource.read.incr.table.version](#hoodiedatasourcereadincrtableversion) | (N/A) | Overrides the table version assumed for incremental reads. Version 8+ selects the V2 incremental relation (completion-time based START_COMMIT/END_COMMIT); earlier versions select the V1 relation (requested-time based). If unset, the source table's actual version is used.
`Config Param: INCREMENTAL_READ_TABLE_VERSION`
`Since Version: 1.0.0` | +| [hoodie.datasource.read.streaming.table.version](#hoodiedatasourcereadstreamingtableversion) | (N/A) | Overrides the table version assumed for streaming reads. Version 8+ selects HoodieStreamSourceV2 (completion-time based START_COMMIT/END_COMMIT); earlier versions select HoodieStreamSourceV1 (requested-time based). If unset, the source table's actual version is used.
`Config Param: STREAMING_READ_TABLE_VERSION`
`Since Version: 1.0.0` | | [hoodie.datasource.write.precombine.field](#hoodiedatasourcewriteprecombinefield) | (N/A) | Comma separated list of fields used in preCombining before actual write. When two records have the same key value, we will pick the one with the largest value for the precombine field, determined by Object.compareTo(..). For multiple fields if first key comparison is same, second key comparison is made and so on. This config is used for combining records within the same batch and also for merging using event time merge mode
`Config Param: READ_PRE_COMBINE_FIELD` | | [hoodie.datasource.query.type](#hoodiedatasourcequerytype) | snapshot | Whether data needs to be read, in `incremental` mode (new data since an instantTime) (or) `read_optimized` mode (obtain latest view, based on base files) (or) `snapshot` mode (obtain latest view, by merging base and (if any) log files)
`Config Param: QUERY_TYPE`
`Since Version: 0.9.0` |