diff --git a/website/docs/basic_configurations.md b/website/docs/basic_configurations.md index 4352039ebc074..1134c09c862c4 100644 --- a/website/docs/basic_configurations.md +++ b/website/docs/basic_configurations.md @@ -93,10 +93,10 @@ Options useful for reading tables via `read.format.option(...)` | Config Name | Default | Description | | -------------------------------------------------------------------------------------------- | --------- | --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | -| [hoodie.datasource.read.begin.instanttime](#hoodiedatasourcereadbegininstanttime) | (N/A) | Required when `hoodie.datasource.query.type` is set to `incremental`. Represents the completion time to start incrementally pulling data from. The completion time here need not necessarily correspond to an instant on the timeline. New data written with completion_time >= START_COMMIT are fetched out. For e.g: ‘20170901080000’ will get all new data written on or after Sep 1, 2017 08:00AM.
`Config Param: START_COMMIT`
`Since Version: 0.9.0` | -| [hoodie.datasource.read.end.instanttime](#hoodiedatasourcereadendinstanttime) | (N/A) | Used when `hoodie.datasource.query.type` is set to `incremental`. Represents the completion time to limit incrementally fetched data to. When not specified latest commit completion time from timeline is assumed by default. When specified, new data written with completion_time <= END_COMMIT are fetched out. Point in time type queries make more sense with begin and end completion times specified.
`Config Param: END_COMMIT`
`Since Version: 0.9.0` | -| [hoodie.datasource.read.incr.table.version](#hoodiedatasourcereadincrtableversion) | (N/A) | The table version assumed for incremental read
`Config Param: INCREMENTAL_READ_TABLE_VERSION`
`Since Version: 1.0.0` | -| [hoodie.datasource.read.streaming.table.version](#hoodiedatasourcereadstreamingtableversion) | (N/A) | The table version assumed for streaming read
`Config Param: STREAMING_READ_TABLE_VERSION`
`Since Version: 1.0.0` | +| [hoodie.datasource.read.begin.instanttime](#hoodiedatasourcereadbegininstanttime) | (N/A) | Required when `hoodie.datasource.query.type` is set to `incremental`. The start point (exclusive) to begin incrementally pulling data from. The semantics depend on the effective table version (overridable via `hoodie.datasource.read.incr.table.version` for incremental reads or `hoodie.datasource.read.streaming.table.version` for streaming reads; otherwise the source table's actual version): version 8 or later treats this as a completion time, earlier versions (e.g., version 6) treat it as a requested time (instant time). The value need not necessarily correspond to an instant on the timeline. New data written strictly after START_COMMIT are fetched out. For e.g. ‘20170901080000’ will get all new data written strictly after Sep 1, 2017 08:00AM.
`Config Param: START_COMMIT`
`Since Version: 0.9.0` | +| [hoodie.datasource.read.end.instanttime](#hoodiedatasourcereadendinstanttime) | (N/A) | Used when `hoodie.datasource.query.type` is set to `incremental`. The end point (inclusive) to limit incrementally fetched data to. Same time-semantics rules as START_COMMIT: version 8 or later treats this as a completion time, earlier versions (e.g., version 6) treat it as a requested time (overridable via `hoodie.datasource.read.incr.table.version` or `hoodie.datasource.read.streaming.table.version`). When not specified, the latest committed instant from the timeline is used. Point in time type queries make more sense with both begin and end specified.
`Config Param: END_COMMIT`
`Since Version: 0.9.0` | +| [hoodie.datasource.read.incr.table.version](#hoodiedatasourcereadincrtableversion) | (N/A) | Overrides the table version assumed for incremental reads. Version 8+ selects the V2 incremental relation (completion-time based START_COMMIT/END_COMMIT); earlier versions select the V1 relation (requested-time based). If unset, the source table's actual version is used.
`Config Param: INCREMENTAL_READ_TABLE_VERSION`
`Since Version: 1.0.0` | +| [hoodie.datasource.read.streaming.table.version](#hoodiedatasourcereadstreamingtableversion) | (N/A) | Overrides the table version assumed for streaming reads. Version 8+ selects HoodieStreamSourceV2 (completion-time based START_COMMIT/END_COMMIT); earlier versions select HoodieStreamSourceV1 (requested-time based). If unset, the source table's actual version is used.
`Config Param: STREAMING_READ_TABLE_VERSION`
`Since Version: 1.0.0` | | [hoodie.datasource.write.precombine.field](#hoodiedatasourcewriteprecombinefield) | (N/A) | Comma separated list of fields used in preCombining before actual write. When two records have the same key value, we will pick the one with the largest value for the precombine field, determined by Object.compareTo(..). For multiple fields if first key comparison is same, second key comparison is made and so on. This config is used for combining records within the same batch and also for merging using event time merge mode
`Config Param: READ_PRE_COMBINE_FIELD` | | [hoodie.datasource.query.type](#hoodiedatasourcequerytype) | snapshot | Whether data needs to be read, in `incremental` mode (new data since an instantTime) (or) `read_optimized` mode (obtain latest view, based on base files) (or) `snapshot` mode (obtain latest view, by merging base and (if any) log files)
`Config Param: QUERY_TYPE`
`Since Version: 0.9.0` | --- diff --git a/website/docs/configurations.md b/website/docs/configurations.md index 9d0caf4000856..a40d3412b1ffb 100644 --- a/website/docs/configurations.md +++ b/website/docs/configurations.md @@ -123,10 +123,10 @@ Options useful for reading tables via `read.format.option(...)` | Config Name | Default | Description | | -------------------------------------------------------------------------------------------- | --------- | --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | -| [hoodie.datasource.read.begin.instanttime](#hoodiedatasourcereadbegininstanttime) | (N/A) | Required when `hoodie.datasource.query.type` is set to `incremental`. Represents the completion time to start incrementally pulling data from. The completion time here need not necessarily correspond to an instant on the timeline. New data written with completion_time >= START_COMMIT are fetched out. For e.g: ‘20170901080000’ will get all new data written on or after Sep 1, 2017 08:00AM.
`Config Param: START_COMMIT`
`Since Version: 0.9.0` | -| [hoodie.datasource.read.end.instanttime](#hoodiedatasourcereadendinstanttime) | (N/A) | Used when `hoodie.datasource.query.type` is set to `incremental`. Represents the completion time to limit incrementally fetched data to. When not specified latest commit completion time from timeline is assumed by default. When specified, new data written with completion_time <= END_COMMIT are fetched out. Point in time type queries make more sense with begin and end completion times specified.
`Config Param: END_COMMIT`
`Since Version: 0.9.0` | -| [hoodie.datasource.read.incr.table.version](#hoodiedatasourcereadincrtableversion) | (N/A) | The table version assumed for incremental read
`Config Param: INCREMENTAL_READ_TABLE_VERSION`
`Since Version: 1.0.0` | -| [hoodie.datasource.read.streaming.table.version](#hoodiedatasourcereadstreamingtableversion) | (N/A) | The table version assumed for streaming read
`Config Param: STREAMING_READ_TABLE_VERSION`
`Since Version: 1.0.0` | +| [hoodie.datasource.read.begin.instanttime](#hoodiedatasourcereadbegininstanttime) | (N/A) | Required when `hoodie.datasource.query.type` is set to `incremental`. The start point (exclusive) to begin incrementally pulling data from. The semantics depend on the effective table version (overridable via `hoodie.datasource.read.incr.table.version` for incremental reads or `hoodie.datasource.read.streaming.table.version` for streaming reads; otherwise the source table's actual version): version 8 or later treats this as a completion time, earlier versions (e.g., version 6) treat it as a requested time (instant time). The value need not necessarily correspond to an instant on the timeline. New data written strictly after START_COMMIT are fetched out. For e.g. ‘20170901080000’ will get all new data written strictly after Sep 1, 2017 08:00AM.
`Config Param: START_COMMIT`
`Since Version: 0.9.0` | +| [hoodie.datasource.read.end.instanttime](#hoodiedatasourcereadendinstanttime) | (N/A) | Used when `hoodie.datasource.query.type` is set to `incremental`. The end point (inclusive) to limit incrementally fetched data to. Same time-semantics rules as START_COMMIT: version 8 or later treats this as a completion time, earlier versions (e.g., version 6) treat it as a requested time (overridable via `hoodie.datasource.read.incr.table.version` or `hoodie.datasource.read.streaming.table.version`). When not specified, the latest committed instant from the timeline is used. Point in time type queries make more sense with both begin and end specified.
`Config Param: END_COMMIT`
`Since Version: 0.9.0` | +| [hoodie.datasource.read.incr.table.version](#hoodiedatasourcereadincrtableversion) | (N/A) | Overrides the table version assumed for incremental reads. Version 8+ selects the V2 incremental relation (completion-time based START_COMMIT/END_COMMIT); earlier versions select the V1 relation (requested-time based). If unset, the source table's actual version is used.
`Config Param: INCREMENTAL_READ_TABLE_VERSION`
`Since Version: 1.0.0` | +| [hoodie.datasource.read.streaming.table.version](#hoodiedatasourcereadstreamingtableversion) | (N/A) | Overrides the table version assumed for streaming reads. Version 8+ selects HoodieStreamSourceV2 (completion-time based START_COMMIT/END_COMMIT); earlier versions select HoodieStreamSourceV1 (requested-time based). If unset, the source table's actual version is used.
`Config Param: STREAMING_READ_TABLE_VERSION`
`Since Version: 1.0.0` | | [hoodie.datasource.write.precombine.field](#hoodiedatasourcewriteprecombinefield) | (N/A) | Comma separated list of fields used in preCombining before actual write. When two records have the same key value, we will pick the one with the largest value for the precombine field, determined by Object.compareTo(..). For multiple fields if first key comparison is same, second key comparison is made and so on. This config is used for combining records within the same batch and also for merging using event time merge mode
`Config Param: READ_PRE_COMBINE_FIELD` | | [hoodie.datasource.query.type](#hoodiedatasourcequerytype) | snapshot | Whether data needs to be read, in `incremental` mode (new data since an instantTime) (or) `read_optimized` mode (obtain latest view, based on base files) (or) `snapshot` mode (obtain latest view, by merging base and (if any) log files)
`Config Param: QUERY_TYPE`
`Since Version: 0.9.0` | diff --git a/website/versioned_docs/version-1.1.1/basic_configurations.md b/website/versioned_docs/version-1.1.1/basic_configurations.md index 88b08ed148222..0e429db40c24d 100644 --- a/website/versioned_docs/version-1.1.1/basic_configurations.md +++ b/website/versioned_docs/version-1.1.1/basic_configurations.md @@ -92,10 +92,10 @@ Options useful for reading tables via `read.format.option(...)` | Config Name | Default | Description | | -------------------------------------------------------------------------------------------- | --------- | --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | -| [hoodie.datasource.read.begin.instanttime](#hoodiedatasourcereadbegininstanttime) | (N/A) | Required when `hoodie.datasource.query.type` is set to `incremental`. Represents the completion time to start incrementally pulling data from. The completion time here need not necessarily correspond to an instant on the timeline. New data written with completion_time >= START_COMMIT are fetched out. For e.g: ‘20170901080000’ will get all new data written on or after Sep 1, 2017 08:00AM. Accepted formats: `yyyyMMddHHmmss[SSS]`, `yyyy-MM-dd`, `yyyy-MM-dd HH:mm:ss[.SSS]`, `yyyy-MM-ddTHH:mm:ss[.SSS]`, epoch seconds (10-digit), epoch millis (13-digit), or `earliest`. Invalid values throw an error immediately.
`Config Param: START_COMMIT` | -| [hoodie.datasource.read.end.instanttime](#hoodiedatasourcereadendinstanttime) | (N/A) | Used when `hoodie.datasource.query.type` is set to `incremental`. Represents the completion time to limit incrementally fetched data to. When not specified latest commit completion time from timeline is assumed by default. When specified, new data written with completion_time <= END_COMMIT are fetched out. Point in time type queries make more sense with begin and end completion times specified. Accepted formats: `yyyyMMddHHmmss[SSS]`, `yyyy-MM-dd`, `yyyy-MM-dd HH:mm:ss[.SSS]`, `yyyy-MM-ddTHH:mm:ss[.SSS]`, epoch seconds (10-digit), epoch millis (13-digit), or `earliest`. Invalid values throw an error immediately.
`Config Param: END_COMMIT` | -| [hoodie.datasource.read.incr.table.version](#hoodiedatasourcereadincrtableversion) | (N/A) | The table version assumed for incremental read
`Config Param: INCREMENTAL_READ_TABLE_VERSION` | -| [hoodie.datasource.read.streaming.table.version](#hoodiedatasourcereadstreamingtableversion) | (N/A) | The table version assumed for streaming read
`Config Param: STREAMING_READ_TABLE_VERSION` | +| [hoodie.datasource.read.begin.instanttime](#hoodiedatasourcereadbegininstanttime) | (N/A) | Required when `hoodie.datasource.query.type` is set to `incremental`. The start point (exclusive) to begin incrementally pulling data from. The semantics depend on the effective table version (overridable via `hoodie.datasource.read.incr.table.version` for incremental reads or `hoodie.datasource.read.streaming.table.version` for streaming reads; otherwise the source table's actual version): version 8 or later treats this as a completion time, earlier versions (e.g., version 6) treat it as a requested time (instant time). The value need not necessarily correspond to an instant on the timeline. New data written strictly after START_COMMIT are fetched out. For e.g. ‘20170901080000’ will get all new data written strictly after Sep 1, 2017 08:00AM. Accepted formats: `yyyyMMddHHmmss[SSS]`, `yyyy-MM-dd`, `yyyy-MM-dd HH:mm:ss[.SSS]`, `yyyy-MM-ddTHH:mm:ss[.SSS]`, epoch seconds (10-digit), epoch millis (13-digit), or `earliest`. Invalid values throw an error immediately.
`Config Param: START_COMMIT` | +| [hoodie.datasource.read.end.instanttime](#hoodiedatasourcereadendinstanttime) | (N/A) | Used when `hoodie.datasource.query.type` is set to `incremental`. The end point (inclusive) to limit incrementally fetched data to. Same time-semantics rules as START_COMMIT: version 8 or later treats this as a completion time, earlier versions (e.g., version 6) treat it as a requested time (overridable via `hoodie.datasource.read.incr.table.version` or `hoodie.datasource.read.streaming.table.version`). When not specified, the latest committed instant from the timeline is used. Point in time type queries make more sense with both begin and end specified. Accepted formats: `yyyyMMddHHmmss[SSS]`, `yyyy-MM-dd`, `yyyy-MM-dd HH:mm:ss[.SSS]`, `yyyy-MM-ddTHH:mm:ss[.SSS]`, epoch seconds (10-digit), epoch millis (13-digit), or `earliest`. Invalid values throw an error immediately.
`Config Param: END_COMMIT` | +| [hoodie.datasource.read.incr.table.version](#hoodiedatasourcereadincrtableversion) | (N/A) | Overrides the table version assumed for incremental reads. Version 8+ selects the V2 incremental relation (completion-time based START_COMMIT/END_COMMIT); earlier versions select the V1 relation (requested-time based). If unset, the source table's actual version is used.
`Config Param: INCREMENTAL_READ_TABLE_VERSION` | +| [hoodie.datasource.read.streaming.table.version](#hoodiedatasourcereadstreamingtableversion) | (N/A) | Overrides the table version assumed for streaming reads. Version 8+ selects HoodieStreamSourceV2 (completion-time based START_COMMIT/END_COMMIT); earlier versions select HoodieStreamSourceV1 (requested-time based). If unset, the source table's actual version is used.
`Config Param: STREAMING_READ_TABLE_VERSION` | | [hoodie.datasource.query.type](#hoodiedatasourcequerytype) | snapshot | Whether data needs to be read, in `incremental` mode (new data since an instantTime) (or) `read_optimized` mode (obtain latest view, based on base files) (or) `snapshot` mode (obtain latest view, by merging base and (if any) log files)
`Config Param: QUERY_TYPE` | --- diff --git a/website/versioned_docs/version-1.1.1/configurations.md b/website/versioned_docs/version-1.1.1/configurations.md index 101c5e1aa547d..7ad2891ed83bf 100644 --- a/website/versioned_docs/version-1.1.1/configurations.md +++ b/website/versioned_docs/version-1.1.1/configurations.md @@ -120,10 +120,10 @@ Options useful for reading tables via `read.format.option(...)` | Config Name | Default | Description | | -------------------------------------------------------------------------------------------- | --------- | --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | -| [hoodie.datasource.read.begin.instanttime](#hoodiedatasourcereadbegininstanttime) | (N/A) | Required when `hoodie.datasource.query.type` is set to `incremental`. Represents the completion time to start incrementally pulling data from. The completion time here need not necessarily correspond to an instant on the timeline. New data written with completion_time >= START_COMMIT are fetched out. For e.g: ‘20170901080000’ will get all new data written on or after Sep 1, 2017 08:00AM. Accepted formats: `yyyyMMddHHmmss[SSS]`, `yyyy-MM-dd`, `yyyy-MM-dd HH:mm:ss[.SSS]`, `yyyy-MM-ddTHH:mm:ss[.SSS]`, epoch seconds (10-digit), epoch millis (13-digit), or `earliest`. Invalid values throw an error immediately.
`Config Param: START_COMMIT` | -| [hoodie.datasource.read.end.instanttime](#hoodiedatasourcereadendinstanttime) | (N/A) | Used when `hoodie.datasource.query.type` is set to `incremental`. Represents the completion time to limit incrementally fetched data to. When not specified latest commit completion time from timeline is assumed by default. When specified, new data written with completion_time <= END_COMMIT are fetched out. Point in time type queries make more sense with begin and end completion times specified. Accepted formats: `yyyyMMddHHmmss[SSS]`, `yyyy-MM-dd`, `yyyy-MM-dd HH:mm:ss[.SSS]`, `yyyy-MM-ddTHH:mm:ss[.SSS]`, epoch seconds (10-digit), epoch millis (13-digit), or `earliest`. Invalid values throw an error immediately.
`Config Param: END_COMMIT` | -| [hoodie.datasource.read.incr.table.version](#hoodiedatasourcereadincrtableversion) | (N/A) | The table version assumed for incremental read
`Config Param: INCREMENTAL_READ_TABLE_VERSION` | -| [hoodie.datasource.read.streaming.table.version](#hoodiedatasourcereadstreamingtableversion) | (N/A) | The table version assumed for streaming read
`Config Param: STREAMING_READ_TABLE_VERSION` | +| [hoodie.datasource.read.begin.instanttime](#hoodiedatasourcereadbegininstanttime) | (N/A) | Required when `hoodie.datasource.query.type` is set to `incremental`. The start point (exclusive) to begin incrementally pulling data from. The semantics depend on the effective table version (overridable via `hoodie.datasource.read.incr.table.version` for incremental reads or `hoodie.datasource.read.streaming.table.version` for streaming reads; otherwise the source table's actual version): version 8 or later treats this as a completion time, earlier versions (e.g., version 6) treat it as a requested time (instant time). The value need not necessarily correspond to an instant on the timeline. New data written strictly after START_COMMIT are fetched out. For e.g. ‘20170901080000’ will get all new data written strictly after Sep 1, 2017 08:00AM. Accepted formats: `yyyyMMddHHmmss[SSS]`, `yyyy-MM-dd`, `yyyy-MM-dd HH:mm:ss[.SSS]`, `yyyy-MM-ddTHH:mm:ss[.SSS]`, epoch seconds (10-digit), epoch millis (13-digit), or `earliest`. Invalid values throw an error immediately.
`Config Param: START_COMMIT` | +| [hoodie.datasource.read.end.instanttime](#hoodiedatasourcereadendinstanttime) | (N/A) | Used when `hoodie.datasource.query.type` is set to `incremental`. The end point (inclusive) to limit incrementally fetched data to. Same time-semantics rules as START_COMMIT: version 8 or later treats this as a completion time, earlier versions (e.g., version 6) treat it as a requested time (overridable via `hoodie.datasource.read.incr.table.version` or `hoodie.datasource.read.streaming.table.version`). When not specified, the latest committed instant from the timeline is used. Point in time type queries make more sense with both begin and end specified. Accepted formats: `yyyyMMddHHmmss[SSS]`, `yyyy-MM-dd`, `yyyy-MM-dd HH:mm:ss[.SSS]`, `yyyy-MM-ddTHH:mm:ss[.SSS]`, epoch seconds (10-digit), epoch millis (13-digit), or `earliest`. Invalid values throw an error immediately.
`Config Param: END_COMMIT` | +| [hoodie.datasource.read.incr.table.version](#hoodiedatasourcereadincrtableversion) | (N/A) | Overrides the table version assumed for incremental reads. Version 8+ selects the V2 incremental relation (completion-time based START_COMMIT/END_COMMIT); earlier versions select the V1 relation (requested-time based). If unset, the source table's actual version is used.
`Config Param: INCREMENTAL_READ_TABLE_VERSION` | +| [hoodie.datasource.read.streaming.table.version](#hoodiedatasourcereadstreamingtableversion) | (N/A) | Overrides the table version assumed for streaming reads. Version 8+ selects HoodieStreamSourceV2 (completion-time based START_COMMIT/END_COMMIT); earlier versions select HoodieStreamSourceV1 (requested-time based). If unset, the source table's actual version is used.
`Config Param: STREAMING_READ_TABLE_VERSION` | | [hoodie.datasource.query.type](#hoodiedatasourcequerytype) | snapshot | Whether data needs to be read, in `incremental` mode (new data since an instantTime) (or) `read_optimized` mode (obtain latest view, based on base files) (or) `snapshot` mode (obtain latest view, by merging base and (if any) log files)
`Config Param: QUERY_TYPE` | [**Advanced Configs**](#Read-Options-advanced-configs) diff --git a/website/versioned_docs/version-1.2.0/basic_configurations.md b/website/versioned_docs/version-1.2.0/basic_configurations.md index 4352039ebc074..1134c09c862c4 100644 --- a/website/versioned_docs/version-1.2.0/basic_configurations.md +++ b/website/versioned_docs/version-1.2.0/basic_configurations.md @@ -93,10 +93,10 @@ Options useful for reading tables via `read.format.option(...)` | Config Name | Default | Description | | -------------------------------------------------------------------------------------------- | --------- | --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | -| [hoodie.datasource.read.begin.instanttime](#hoodiedatasourcereadbegininstanttime) | (N/A) | Required when `hoodie.datasource.query.type` is set to `incremental`. Represents the completion time to start incrementally pulling data from. The completion time here need not necessarily correspond to an instant on the timeline. New data written with completion_time >= START_COMMIT are fetched out. For e.g: ‘20170901080000’ will get all new data written on or after Sep 1, 2017 08:00AM.
`Config Param: START_COMMIT`
`Since Version: 0.9.0` | -| [hoodie.datasource.read.end.instanttime](#hoodiedatasourcereadendinstanttime) | (N/A) | Used when `hoodie.datasource.query.type` is set to `incremental`. Represents the completion time to limit incrementally fetched data to. When not specified latest commit completion time from timeline is assumed by default. When specified, new data written with completion_time <= END_COMMIT are fetched out. Point in time type queries make more sense with begin and end completion times specified.
`Config Param: END_COMMIT`
`Since Version: 0.9.0` | -| [hoodie.datasource.read.incr.table.version](#hoodiedatasourcereadincrtableversion) | (N/A) | The table version assumed for incremental read
`Config Param: INCREMENTAL_READ_TABLE_VERSION`
`Since Version: 1.0.0` | -| [hoodie.datasource.read.streaming.table.version](#hoodiedatasourcereadstreamingtableversion) | (N/A) | The table version assumed for streaming read
`Config Param: STREAMING_READ_TABLE_VERSION`
`Since Version: 1.0.0` | +| [hoodie.datasource.read.begin.instanttime](#hoodiedatasourcereadbegininstanttime) | (N/A) | Required when `hoodie.datasource.query.type` is set to `incremental`. The start point (exclusive) to begin incrementally pulling data from. The semantics depend on the effective table version (overridable via `hoodie.datasource.read.incr.table.version` for incremental reads or `hoodie.datasource.read.streaming.table.version` for streaming reads; otherwise the source table's actual version): version 8 or later treats this as a completion time, earlier versions (e.g., version 6) treat it as a requested time (instant time). The value need not necessarily correspond to an instant on the timeline. New data written strictly after START_COMMIT are fetched out. For e.g. ‘20170901080000’ will get all new data written strictly after Sep 1, 2017 08:00AM.
`Config Param: START_COMMIT`
`Since Version: 0.9.0` | +| [hoodie.datasource.read.end.instanttime](#hoodiedatasourcereadendinstanttime) | (N/A) | Used when `hoodie.datasource.query.type` is set to `incremental`. The end point (inclusive) to limit incrementally fetched data to. Same time-semantics rules as START_COMMIT: version 8 or later treats this as a completion time, earlier versions (e.g., version 6) treat it as a requested time (overridable via `hoodie.datasource.read.incr.table.version` or `hoodie.datasource.read.streaming.table.version`). When not specified, the latest committed instant from the timeline is used. Point in time type queries make more sense with both begin and end specified.
`Config Param: END_COMMIT`
`Since Version: 0.9.0` | +| [hoodie.datasource.read.incr.table.version](#hoodiedatasourcereadincrtableversion) | (N/A) | Overrides the table version assumed for incremental reads. Version 8+ selects the V2 incremental relation (completion-time based START_COMMIT/END_COMMIT); earlier versions select the V1 relation (requested-time based). If unset, the source table's actual version is used.
`Config Param: INCREMENTAL_READ_TABLE_VERSION`
`Since Version: 1.0.0` | +| [hoodie.datasource.read.streaming.table.version](#hoodiedatasourcereadstreamingtableversion) | (N/A) | Overrides the table version assumed for streaming reads. Version 8+ selects HoodieStreamSourceV2 (completion-time based START_COMMIT/END_COMMIT); earlier versions select HoodieStreamSourceV1 (requested-time based). If unset, the source table's actual version is used.
`Config Param: STREAMING_READ_TABLE_VERSION`
`Since Version: 1.0.0` | | [hoodie.datasource.write.precombine.field](#hoodiedatasourcewriteprecombinefield) | (N/A) | Comma separated list of fields used in preCombining before actual write. When two records have the same key value, we will pick the one with the largest value for the precombine field, determined by Object.compareTo(..). For multiple fields if first key comparison is same, second key comparison is made and so on. This config is used for combining records within the same batch and also for merging using event time merge mode
`Config Param: READ_PRE_COMBINE_FIELD` | | [hoodie.datasource.query.type](#hoodiedatasourcequerytype) | snapshot | Whether data needs to be read, in `incremental` mode (new data since an instantTime) (or) `read_optimized` mode (obtain latest view, based on base files) (or) `snapshot` mode (obtain latest view, by merging base and (if any) log files)
`Config Param: QUERY_TYPE`
`Since Version: 0.9.0` | --- diff --git a/website/versioned_docs/version-1.2.0/configurations.md b/website/versioned_docs/version-1.2.0/configurations.md index 9d0caf4000856..a40d3412b1ffb 100644 --- a/website/versioned_docs/version-1.2.0/configurations.md +++ b/website/versioned_docs/version-1.2.0/configurations.md @@ -123,10 +123,10 @@ Options useful for reading tables via `read.format.option(...)` | Config Name | Default | Description | | -------------------------------------------------------------------------------------------- | --------- | --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | -| [hoodie.datasource.read.begin.instanttime](#hoodiedatasourcereadbegininstanttime) | (N/A) | Required when `hoodie.datasource.query.type` is set to `incremental`. Represents the completion time to start incrementally pulling data from. The completion time here need not necessarily correspond to an instant on the timeline. New data written with completion_time >= START_COMMIT are fetched out. For e.g: ‘20170901080000’ will get all new data written on or after Sep 1, 2017 08:00AM.
`Config Param: START_COMMIT`
`Since Version: 0.9.0` | -| [hoodie.datasource.read.end.instanttime](#hoodiedatasourcereadendinstanttime) | (N/A) | Used when `hoodie.datasource.query.type` is set to `incremental`. Represents the completion time to limit incrementally fetched data to. When not specified latest commit completion time from timeline is assumed by default. When specified, new data written with completion_time <= END_COMMIT are fetched out. Point in time type queries make more sense with begin and end completion times specified.
`Config Param: END_COMMIT`
`Since Version: 0.9.0` | -| [hoodie.datasource.read.incr.table.version](#hoodiedatasourcereadincrtableversion) | (N/A) | The table version assumed for incremental read
`Config Param: INCREMENTAL_READ_TABLE_VERSION`
`Since Version: 1.0.0` | -| [hoodie.datasource.read.streaming.table.version](#hoodiedatasourcereadstreamingtableversion) | (N/A) | The table version assumed for streaming read
`Config Param: STREAMING_READ_TABLE_VERSION`
`Since Version: 1.0.0` | +| [hoodie.datasource.read.begin.instanttime](#hoodiedatasourcereadbegininstanttime) | (N/A) | Required when `hoodie.datasource.query.type` is set to `incremental`. The start point (exclusive) to begin incrementally pulling data from. The semantics depend on the effective table version (overridable via `hoodie.datasource.read.incr.table.version` for incremental reads or `hoodie.datasource.read.streaming.table.version` for streaming reads; otherwise the source table's actual version): version 8 or later treats this as a completion time, earlier versions (e.g., version 6) treat it as a requested time (instant time). The value need not necessarily correspond to an instant on the timeline. New data written strictly after START_COMMIT are fetched out. For e.g. ‘20170901080000’ will get all new data written strictly after Sep 1, 2017 08:00AM.
`Config Param: START_COMMIT`
`Since Version: 0.9.0` | +| [hoodie.datasource.read.end.instanttime](#hoodiedatasourcereadendinstanttime) | (N/A) | Used when `hoodie.datasource.query.type` is set to `incremental`. The end point (inclusive) to limit incrementally fetched data to. Same time-semantics rules as START_COMMIT: version 8 or later treats this as a completion time, earlier versions (e.g., version 6) treat it as a requested time (overridable via `hoodie.datasource.read.incr.table.version` or `hoodie.datasource.read.streaming.table.version`). When not specified, the latest committed instant from the timeline is used. Point in time type queries make more sense with both begin and end specified.
`Config Param: END_COMMIT`
`Since Version: 0.9.0` | +| [hoodie.datasource.read.incr.table.version](#hoodiedatasourcereadincrtableversion) | (N/A) | Overrides the table version assumed for incremental reads. Version 8+ selects the V2 incremental relation (completion-time based START_COMMIT/END_COMMIT); earlier versions select the V1 relation (requested-time based). If unset, the source table's actual version is used.
`Config Param: INCREMENTAL_READ_TABLE_VERSION`
`Since Version: 1.0.0` | +| [hoodie.datasource.read.streaming.table.version](#hoodiedatasourcereadstreamingtableversion) | (N/A) | Overrides the table version assumed for streaming reads. Version 8+ selects HoodieStreamSourceV2 (completion-time based START_COMMIT/END_COMMIT); earlier versions select HoodieStreamSourceV1 (requested-time based). If unset, the source table's actual version is used.
`Config Param: STREAMING_READ_TABLE_VERSION`
`Since Version: 1.0.0` | | [hoodie.datasource.write.precombine.field](#hoodiedatasourcewriteprecombinefield) | (N/A) | Comma separated list of fields used in preCombining before actual write. When two records have the same key value, we will pick the one with the largest value for the precombine field, determined by Object.compareTo(..). For multiple fields if first key comparison is same, second key comparison is made and so on. This config is used for combining records within the same batch and also for merging using event time merge mode
`Config Param: READ_PRE_COMBINE_FIELD` | | [hoodie.datasource.query.type](#hoodiedatasourcequerytype) | snapshot | Whether data needs to be read, in `incremental` mode (new data since an instantTime) (or) `read_optimized` mode (obtain latest view, based on base files) (or) `snapshot` mode (obtain latest view, by merging base and (if any) log files)
`Config Param: QUERY_TYPE`
`Since Version: 0.9.0` |