Skip to content

Docs: Refresh schema evolution info#2965

Open
aeluce wants to merge 1 commit into
masterfrom
emily/evolution-docs-refresh
Open

Docs: Refresh schema evolution info#2965
aeluce wants to merge 1 commit into
masterfrom
emily/evolution-docs-refresh

Conversation

@aeluce
Copy link
Copy Markdown
Collaborator

@aeluce aeluce commented May 22, 2026

Description:

Different updates have streamlined schema evolution so there aren't as many opportunities for schemas to break. This PR includes docs refreshes to account for the following:

  • Position automatic discovery and inference as the default to make it more clear that Estuary handles a lot of common evolution scenarios automatically.
  • Changing the collection key is no longer a truly incompatible update for collections. Group-by keys push handling down to materializations.
  • We no longer increment collections (automatically suffixing with "_v2"); if schema changes are truly incompatible, dataflow resets allow collections to be refreshed. This is especially noted for changes to collection logical partitions.

Documentation links affected:

Mainly updates schema evolution guide and concepts pages

Notes for reviewers:

Thanks for reviewing! Suggestions/comments welcome.

@aeluce aeluce requested a review from jwhartley May 22, 2026 17:30
@aeluce aeluce added the docs Documentation work required label May 22, 2026
@github-actions
Copy link
Copy Markdown

Copy link
Copy Markdown
Contributor

@jwhartley jwhartley left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Had a couple of questions here @aeluce

When any of these parts change, any capture or materialization writing to or reading from the collection must be updated to approve of the change, otherwise, the Data Flow will fail with an error.

You can use Estuary's **schema evolutions** feature to quickly and simultaneously update other parts of a Data Flow so you're able to re-start it without error when you introduce a collection change.
* When a collection **key** changes, you can separately manage materialization [group-by keys](/guides/customize-materialization-fields/#group-by-keys) during field selection.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This sentence feels a bit random - I'd still mention that collection key changes will trigger an automatic backfill with default settings (onIncompatibleSchemaChange: backfill) unless you have specifically set a materialization group by


If you materialized that collection into a relational database table, the table would look something like `my_table (id integer primary key, foo timestamptz)`.

Now, say you edit the collection spec to remove `format: date-time` from `foo`. You'd expect the materialized database table to then look like `(id integer primary key, foo text)`. But since the column type of `foo` has changed, this will fail. An easy solution in this case would be to change the name of the table that the collection is materialized into. Evolutions do this by appending a suffix to the original table name. In this case, you'd end up with `my_table_v2 (id integer primary key, foo text)`.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Aren't there still cases where we can't automatically migrate type changes, and thus will do an automatic backfill? e.g. a column type narrows from string to integer.

Wondering why you've removed this section

- "Automatically keep schemas up to date" enables `autoDiscover`
- "Automatically add new collections" corresponds to `addNewBindings`
- "Breaking changes re-versions collections" corresponds to `evolveIncompatibleCollections`
- "Changing primary keys re-versions collections" corresponds to `evolveIncompatibleCollections`
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe we just do a collection reset rather than re-version (v2, v3, etc) now


:::info
There are a variety of reasons why these properties may change, and also different mechanisms for detecting changes in source data. In general, it doesn't matter why the collection spec has changed, only _what_ has changed. However, [AutoDiscovers](../concepts/captures.md#automatically-update-captures) are able to handle some of these scenarios automatically. Where applicable, AutoDiscover behavior will be called out under each section.
Automatic schema evolution will only ever widen data types.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Useful to mention this is to ensure historical data is still valid under the evolved schema?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

edit: I see this is mentioned below:
"This works because the type is broadened, so existing values will still validate against the new schema. "

mode: Normal
backfill: 1
target: acmeCo/inventory/anvils_v2
Note that materializations can specify separate [group-by keys](/guides/customize-materialization-fields/#group-by-keys) from the collection key structure.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If there's a source key change coming, you can:

  1. preventatively set the materialization group by to avoid propagating key changes
  2. set the group by after the fact to migrate back to the previous/original key
  3. do nothing and let us change the key with a dataflow reset

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With ^, this seems backwards: "Ensure the materialization group-by key is updated if you wish to propogate key changes downstream."

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

docs Documentation work required

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants