Backport of #104881 to 25.8: Use explicit flag for secondary on cluster queries#1875
Backport of #104881 to 25.8: Use explicit flag for secondary on cluster queries#1875ianton-ru wants to merge 1 commit into
Conversation
…ry-on-cluster Use explicit flag for secondary on cluster queries
|
@codex review |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 2fa783dc77
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| query_context->makeQueryContext(); | ||
| query_context->setCurrentQueryId(""); // generate random query_id | ||
| query_context->setQueryKind(ClientInfo::QueryKind::SECONDARY_QUERY); | ||
| query_context->setDDLOrOnClusterInternal(true); |
There was a problem hiding this comment.
Preserve worker mode for cluster table functions
When a DDL worker executes CREATE ... ON CLUSTER ... AS SELECT ... FROM fileCluster/urlCluster/s3Cluster, this context is no longer marked as SECONDARY_QUERY, while the *Cluster table functions still use query_kind == SECONDARY_QUERY to choose their worker-local storage path (for example TableFunctionFileCluster.cpp:23 and TableFunctionURLCluster.cpp:13). As a result, each DDL worker can fan out to the whole cluster again instead of reading only its assigned local input, multiplying remote reads and potentially duplicating the data inserted by CTAS on every replica; keep the secondary query kind for execution semantics or update those table-function checks to also recognize this new internal DDL flag.
Useful? React with 👍 / 👎.
Backport of ClickHouse#104881 by @tavplubix
Changelog category (leave one):
Changelog entry (a user-readable short description of the changes that goes to CHANGELOG.md):
Use an explicit flag in Context for secondary DDL/ON CLUSTER queries instead of
SECONDARY_QUERYCI/CD Options
Exclude tests:
Regression jobs to run: