Skip to content

fix(schema): permit opt-in timestamp-precision evolution#19028

Closed
yihua wants to merge 1 commit into
apache:masterfrom
yihua:ENG-43554-timestamp-precision-evolution
Closed

fix(schema): permit opt-in timestamp-precision evolution#19028
yihua wants to merge 1 commit into
apache:masterfrom
yihua:ENG-43554-timestamp-precision-evolution

Conversation

@yihua

@yihua yihua commented Jun 17, 2026

Copy link
Copy Markdown
Contributor

Change Logs

Background. Earlier Hudi versions mishandled long-backed timestamp logical types in AvroInternalSchemaConverter:

  1. timestamp-millis and timestamp-micros both collapsed into a single internal TimestampType and were always re-emitted as timestampMicros() on serialize. A source schema declaring timestamp-millis got persisted in the table with the wrong timestamp-micros logical type, while the underlying long values written to parquet remained epoch-millis. Pure logical-type drift.
  2. local-timestamp-millis and local-timestamp-micros had no branch at all. They fell through to the bare LongType, and the logical type was dropped from the table schema entirely. Logical-type loss.

In both cases the parquet values are correct, only the logical type on the field is wrong. Current converters recognize all four logical types as distinct, so the writer schema now declares the correct logical type. On every subsequent write the reconcile path compares writer schema against the persisted table schema, finds the logical-type mismatch, and rejects it.

Why writes get blocked. When hoodie.write.set.null.for.missing.columns=true, HoodieSchemaUtils.deduceWriterSchema calls AvroSchemaEvolutionUtils.reconcileSchema, which goes through TableChanges.ColumnUpdateChange.updateColumnType and SchemaChangeUtils.isTypeUpdateAllow. That switch had no case for TIMESTAMP or TIMESTAMP_MILLIS, so any precision/logical-type change fell into default: return false and threw SchemaCompatibilityException. Streams that need null-fill on missing columns cannot write to any table previously persisted with long-backed timestamp columns mishandled this way.

The non-reconcile path (set.null=false) is unaffected: it skips reconcileSchema and lets AvroSchemaCompatibility.checkReaderWriterCompatibility validate, which is logical-type-blind (both timestamps are long underneath) and accepts the correction. So the reconcile path was strictly stricter than the non-reconcile path for the same scenario; only the set.null=true path was broken.

Fix. New write config hoodie.write.schema.allow.timestamp.precision.evolution (default false) that, when true, lets SchemaChangeUtils.isTypeUpdateAllow permit:

  • timestamp-millis ↔ timestamp-micros (logical-type drift case, both directions)
  • local-timestamp-millis ↔ local-timestamp-micros (precision swap among the recognized variants)
  • long → local-timestamp-millis / long → local-timestamp-micros (logical-type loss case, attach the missing logical type)

Default false preserves the existing strict rejection. Opt-in is per-write by setting the config to true. The flag is threaded into AvroSchemaEvolutionUtils.reconcileSchema and TableChanges.ColumnUpdateChange, and read from the write properties by HoodieSchemaUtils.scala, BaseHoodieWriteClient, HoodieMergeHelper, and FileGroupReaderBasedMergeHandle.

Why not unconditional. Default writes should not silently change column logical types; the gate keeps the opt-in explicit and per-write. Older readers that don't recognize the new logical-type values degrade gracefully: AvroInternalSchemaConverter falls through to the bare primitive for unrecognized logical types, so the corrected logical type is simply ignored on the reader side.

Complements the read-side repair from #14161, which handles parquet files carrying the wrong logical type transparently until the table schema is corrected. This PR closes the write-side gap so the table schema itself can be brought into agreement with the writer schema once and for all.

Impact

  • New advanced write config, opt-in. Default preserves the prior strict behavior; no existing caller sees a change.
  • One new overload on AvroSchemaEvolutionUtils.reconcileSchema, one new factory on TableChanges.ColumnUpdateChange.get. Pre-existing overloads kept as delegates.
  • Plumbing through HoodieSchemaUtils, BaseHoodieWriteClient, HoodieMergeHelper, FileGroupReaderBasedMergeHandle.

Risk level (write none, low medium or high below)

low

Opt-in gate with default-false means existing writers are unchanged. Positive variants exercise the gated repair path on the v6/v8/CURRENT logical-repair fixtures from #14161. Negative variant asserts SchemaCompatibilityException when the reconcile path is on with the gate closed, locking in the default behavior.

Documentation Update

New config documented inline on HoodieCommonConfig.ALLOW_TIMESTAMP_PRECISION_EVOLUTION with sinceVersion("1.3.0").

Contributor's checklist

  • Read through contributor's guide
  • Change Logs and Impact were stated clearly
  • Adequate tests were added if applicable
  • CI passed

Adds a write config `hoodie.write.schema.allow.timestamp.precision.evolution`
(default false) that, when true, lets the internal-schema reconcile path
relabel a column between timestamp-millis and timestamp-micros (and between
the local-timestamp variants), and attach a missing local-timestamp logical
type on top of a bare long. Default false preserves the existing strict
rejection so no caller sees a behavior change.

The non-reconcile write path was already lenient via Avro reader/writer
compatibility (both logical types share the same Avro long primitive). The
internal-schema reconcile path, triggered when
`hoodie.write.set.null.for.missing.columns=true`, instead rejected the
relabel through `SchemaChangeUtils.isTypeUpdateAllow`. This closes the
parity gap and enables forward-fixing tables that earlier versions
persisted with a timestamp-micros label but timestamp-millis values, or
that dropped the local-timestamp logical type entirely and stored the
column as bare long.

Threaded through SchemaChangeUtils -> TableChanges.ColumnUpdateChange ->
AvroSchemaEvolutionUtils.reconcileSchema, with HoodieSchemaUtils,
BaseHoodieWriteClient, HoodieMergeHelper, and FileGroupReaderBasedMergeHandle
reading the config from the write properties.

Tests:
- TestAvroSchemaEvolutionUtils.testReconcileSchemaTimestampPrecisionEvolution
  covers default-strict reject and opt-in permit for all three shapes
  (timestamp precision swap, local-timestamp precision swap, long ->
  local-timestamp logical-type attach).
- testCOWLogicalRepair / testMORLogicalRepair parameterize on both
  setNullForMissingColumns and allowTimestampPrecisionEvolution; positive
  variants exercise the gated repair path on v6/v8/CURRENT fixtures;
  a negative variant asserts SchemaCompatibilityException when the
  reconcile path is on with the gate closed.
@github-actions github-actions Bot added the size:M PR with lines of changes in (100, 300] label Jun 17, 2026

@hudi-agent hudi-agent left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤖 This review was generated by an AI agent and may contain mistakes. Please verify any suggestions before applying.

Thanks for working on this! The PR adds an opt-in write config to permit timestamp-precision evolution through the internal-schema reconcile path, with reasonable test coverage on both positive variants and a negative variant for the closed gate. One small inconsistency worth taking a look at in the inline comments around the API delegate promise. Please take a look at any inline comments, and this should be ready for a Hudi committer or PMC member to take it from here. A couple of small readability nits below.

* @return whether to allow the column type to be updated.
*/
public static boolean isTypeUpdateAllow(Type src, Type dst) {
/**

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤖 The PR description notes "Pre-existing overloads kept as delegates," which is followed for reconcileSchema and ColumnUpdateChange.get, but here the 2-arg isTypeUpdateAllow(Type, Type) is replaced rather than kept as a delegate. Could you preserve the original 2-arg signature as a thin delegate (return isTypeUpdateAllow(src, dst, false);)? This class is a public utility in internal.schema.utils and could plausibly be called by downstream tooling.

- AI-generated; verify before applying. React 👍/👎 to flag quality.

.defaultValue(false)
.markAdvanced()
.sinceVersion("1.3.0")
.withDocumentation("Controls whether schema evolution may change a column between timestamp-millis and "

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤖 Should the docs make it explicit that this only relabels the logical type — it does not rewrite stored values? Since old base files keep their original encoding, enabling the gate when actual values don't match the new precision can lead to misinterpretation on read. A one-liner like "Stored values are not rewritten; only enable when the actual values already match the target precision" would steer users away from misuse.

- AI-generated; verify before applying. React 👍/👎 to flag quality.

val setNullForMissingColumns = opts.getOrElse(HoodieCommonConfig.SET_NULL_FOR_MISSING_COLUMNS.key(),
HoodieCommonConfig.SET_NULL_FOR_MISSING_COLUMNS.defaultValue()).toBoolean
val mergedInternalSchema = AvroSchemaEvolutionUtils.reconcileSchema(canonicalizedSourceSchema.toAvroSchema(), internalSchema, setNullForMissingColumns)
val allowTimestampPrecisionEvolution = opts.getOrElse(HoodieCommonConfig.ALLOW_TIMESTAMP_PRECISION_EVOLUTION.key,

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤖 nit: allowTimestampPrecisionEvolution is already declared from opts at the outer scope a few dozen lines above — could you just reference that val here instead of re-reading the same key with the same default?

- AI-generated; verify before applying. React 👍/👎 to flag quality.

@hudi-bot

Copy link
Copy Markdown
Collaborator

CI report:

Bot commands @hudi-bot supports the following commands:
  • @hudi-bot run azure re-run the last Azure build

@yihua yihua closed this Jun 17, 2026
@yihua yihua deleted the ENG-43554-timestamp-precision-evolution branch June 17, 2026 05:38
@yihua

yihua commented Jun 17, 2026

Copy link
Copy Markdown
Contributor Author

Superseded by #19029 (same diff, renamed branch to drop the internal ticket prefix). Apologies for the noise.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

size:M PR with lines of changes in (100, 300]

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants