fix: set maturity on parquet splits#6399
Conversation
|
@codex review |
|
Codex Review: Didn't find any major issues. Keep it up! ℹ️ About Codex in GitHubYour team has set up Codex to review pull requests in this repo. Reviews are triggered when you
If Codex has suggestions, it will comment; otherwise it will react with 👍. Codex can also answer questions or update the PR. Try commenting "@codex address that feedback". |
…ics-maturity-timestamp
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 4cb24ace41
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| if split.is_mature(now) { | ||
| continue; |
There was a problem hiding this comment.
Recompute maturity for legacy parquet splits
When a cluster restarts after upgrading, already-published parquet splits serialized before this new maturity field are deserialized with the field's default (Mature), so this split.is_mature(now) check discards them even if the merge policy check immediately above classifies their size/merge count as still immature. This hits the exact upgrade path where existing splits were created before the uploader started populating maturity, leaving those small legacy splits out of compaction; use the policy-derived maturation period for missing/defaulted metadata or otherwise distinguish legacy records instead of treating them as mature here.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
💡 Codex Review
When a newer metastore receives ListMetricsSplitsRequest/ListSketchSplitsRequest from a still-old indexing or query node, the query_json produced before this change will not contain mature; deserialize_query() calls serde_utils::from_json_str directly, so serde will reject those requests as missing this newly required field instead of treating it as unbounded. Add #[serde(default)] to keep parquet split listing compatible during rolling upgrades and with any stored/test JSON that predates this field.
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
Description
We were not setting maturity value on splits in the metastore, so they were all being set to
0. Sets maturity value on the ingest/merge path on the uploader, similar to what logs does.Need this for compaction, so we can fetch immature splits, and correctly designate when splits are no longer candidates for compaction.
How was this PR tested?
Describe how you tested this PR.