Docs: Refresh schema evolution info#2965
Conversation
|
🚀 Preview deployed to https://docs.estuary.dev/pr-preview/pr-2965/ 📄 Changed pages: |
| When any of these parts change, any capture or materialization writing to or reading from the collection must be updated to approve of the change, otherwise, the Data Flow will fail with an error. | ||
|
|
||
| You can use Estuary's **schema evolutions** feature to quickly and simultaneously update other parts of a Data Flow so you're able to re-start it without error when you introduce a collection change. | ||
| * When a collection **key** changes, you can separately manage materialization [group-by keys](/guides/customize-materialization-fields/#group-by-keys) during field selection. |
There was a problem hiding this comment.
This sentence feels a bit random - I'd still mention that collection key changes will trigger an automatic backfill with default settings (onIncompatibleSchemaChange: backfill) unless you have specifically set a materialization group by
|
|
||
| If you materialized that collection into a relational database table, the table would look something like `my_table (id integer primary key, foo timestamptz)`. | ||
|
|
||
| Now, say you edit the collection spec to remove `format: date-time` from `foo`. You'd expect the materialized database table to then look like `(id integer primary key, foo text)`. But since the column type of `foo` has changed, this will fail. An easy solution in this case would be to change the name of the table that the collection is materialized into. Evolutions do this by appending a suffix to the original table name. In this case, you'd end up with `my_table_v2 (id integer primary key, foo text)`. |
There was a problem hiding this comment.
Aren't there still cases where we can't automatically migrate type changes, and thus will do an automatic backfill? e.g. a column type narrows from string to integer.
Wondering why you've removed this section
| - "Automatically keep schemas up to date" enables `autoDiscover` | ||
| - "Automatically add new collections" corresponds to `addNewBindings` | ||
| - "Breaking changes re-versions collections" corresponds to `evolveIncompatibleCollections` | ||
| - "Changing primary keys re-versions collections" corresponds to `evolveIncompatibleCollections` |
There was a problem hiding this comment.
I believe we just do a collection reset rather than re-version (v2, v3, etc) now
|
|
||
| :::info | ||
| There are a variety of reasons why these properties may change, and also different mechanisms for detecting changes in source data. In general, it doesn't matter why the collection spec has changed, only _what_ has changed. However, [AutoDiscovers](../concepts/captures.md#automatically-update-captures) are able to handle some of these scenarios automatically. Where applicable, AutoDiscover behavior will be called out under each section. | ||
| Automatic schema evolution will only ever widen data types. |
There was a problem hiding this comment.
Useful to mention this is to ensure historical data is still valid under the evolved schema?
There was a problem hiding this comment.
edit: I see this is mentioned below:
"This works because the type is broadened, so existing values will still validate against the new schema. "
| mode: Normal | ||
| backfill: 1 | ||
| target: acmeCo/inventory/anvils_v2 | ||
| Note that materializations can specify separate [group-by keys](/guides/customize-materialization-fields/#group-by-keys) from the collection key structure. |
There was a problem hiding this comment.
If there's a source key change coming, you can:
- preventatively set the materialization group by to avoid propagating key changes
- set the group by after the fact to migrate back to the previous/original key
- do nothing and let us change the key with a dataflow reset
There was a problem hiding this comment.
With ^, this seems backwards: "Ensure the materialization group-by key is updated if you wish to propogate key changes downstream."
Description:
Different updates have streamlined schema evolution so there aren't as many opportunities for schemas to break. This PR includes docs refreshes to account for the following:
Documentation links affected:
Mainly updates schema evolution guide and concepts pages
Notes for reviewers:
Thanks for reviewing! Suggestions/comments welcome.