Skip to content

feat(manifests): BREAKING CHANGE: Move CRD to Helm chart template directory#3655

Open
andreyvelich wants to merge 5 commits into
kubeflow:masterfrom
andreyvelich:crd-charts
Open

feat(manifests): BREAKING CHANGE: Move CRD to Helm chart template directory#3655
andreyvelich wants to merge 5 commits into
kubeflow:masterfrom
andreyvelich:crd-charts

Conversation

@andreyvelich

Copy link
Copy Markdown
Member

Fixes: #3650

As we discussed, let's move the Trainer CRDs under template directory, so helm upgrade will deploy newer version if needed.

/assign @astefanutti @tenzen-y @robert-bell @akshaychitneni @VassilisVassiliadis @Sridhar1030

Signed-off-by: Andrey Velichkevich <andrey.velichkevich@gmail.com>
Copilot AI review requested due to automatic review settings June 26, 2026 22:10
@google-oss-prow

Copy link
Copy Markdown

@andreyvelich: GitHub didn't allow me to assign the following users: robert-bell, VassilisVassiliadis.

Note that only kubeflow members with read permissions, repo collaborators and people who have commented on this issue/PR can be assigned. Additionally, issues/PRs can only have 10 assignees at the same time.
For more information please see the contributor guide

Details

In response to this:

Fixes: #3650

As we discussed, let's move the Trainer CRDs under template directory, so helm upgrade will deploy newer version if needed.

/assign @astefanutti @tenzen-y @robert-bell @akshaychitneni @VassilisVassiliadis @Sridhar1030

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@google-oss-prow

Copy link
Copy Markdown

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please ask for approval from andreyvelich. For more information see the Kubernetes Code Review Process.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR addresses Helm’s CRD upgrade limitation by moving the Trainer CRDs from Helm’s special crds/ install-only mechanism into regular chart templates so that helm upgrade will reconcile CRD updates (linked to #3650).

Changes:

  • Update the manifests Makefile target to copy generated CRDs into charts/kubeflow-trainer/templates/crd/ instead of charts/kubeflow-trainer/crds/.
  • Add Helm unit tests that validate the CRDs render from the new template paths.
  • Add the CRD YAMLs under charts/kubeflow-trainer/templates/crd/ (TrainJob, TrainingRuntime, ClusterTrainingRuntime).

Reviewed changes

Copilot reviewed 2 out of 5 changed files in this pull request and generated 4 comments.

Show a summary per file
File Description
Makefile Copies generated CRDs into the Helm templates directory (enabling upgrades).
charts/kubeflow-trainer/tests/crd/crd_test.yaml Adds helm-unittest coverage that CRDs render from template paths.
charts/kubeflow-trainer/templates/crd/trainer.kubeflow.org_trainjobs.yaml Ships TrainJob CRD as a Helm template (upgradeable).
charts/kubeflow-trainer/templates/crd/trainer.kubeflow.org_trainingruntimes.yaml Ships TrainingRuntime CRD as a Helm template (upgradeable).
charts/kubeflow-trainer/templates/crd/trainer.kubeflow.org_clustertrainingruntimes.yaml Ships ClusterTrainingRuntime CRD as a Helm template (upgradeable).

Comment on lines +31 to +36
asserts:
- containsDocument:
apiVersion: apiextensions.k8s.io/v1
kind: CustomResourceDefinition
name: trainjobs.trainer.kubeflow.org

Comment on lines +39 to +44
asserts:
- containsDocument:
apiVersion: apiextensions.k8s.io/v1
kind: CustomResourceDefinition
name: trainingruntimes.trainer.kubeflow.org

Comment on lines +47 to +51
asserts:
- containsDocument:
apiVersion: apiextensions.k8s.io/v1
kind: CustomResourceDefinition
name: clustertrainingruntimes.trainer.kubeflow.org
Comment thread Makefile
Comment on lines 168 to +172
paths="./pkg/apis/trainer/v1alpha1/...;./pkg/controller/...;./pkg/runtime/...;./pkg/webhooks/...;./pkg/util/cert/..." \
output:crd:artifacts:config=manifests/base/crds \
output:rbac:artifacts:config=manifests/base/rbac \
output:webhook:artifacts:config=manifests/base/webhook
cp -f manifests/base/crds/trainer.kubeflow.org_*.yaml $(TRAINER_CHART_DIR)/crds/
cp -f manifests/base/crds/trainer.kubeflow.org_*.yaml $(TRAINER_CHART_DIR)/templates/crd/

@robert-bell robert-bell left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @andreyvelich. Overall looks fine.

Couple of thoughts -

  1. I do think we should allow users to disable crd installation via a value like crds.enabled otherwise we're removing functionality. I know some folk do manage crds outside helm chart using --skip-crds. Wdyt?
  2. We need to announce this as a breaking change. User's who've set --skip-crds will need to update to crds.enabled: false.
  3. Are there any migration steps users need to take when upgrading? Will the helm release be able to previously installed CRDs remain automatically owned by the chart still?

@andreyvelich

Copy link
Copy Markdown
Member Author

I do think we should allow users to disable crd installation via a value like crds.enabled otherwise we're removing functionality. I know some folk do manage crds outside helm chart using --skip-crds. Wdyt?

Good point, let me add this flag.

We need to announce this as a breaking change. User's who've set --skip-crds will need to update to crds.enabled: false.

Yes, I agree. Let me update the PR description.

Are there any migration steps users need to take when upgrading? Will the helm release be able to previously installed CRDs remain automatically owned by the chart still?

@tenzen-y @Sridhar1030 @astefanutti @akshaychitneni @VassilisVassiliadis @aniket2405 Any thoughts on this?

Maybe we could switch the flag to crd.enabled: false in Trainer v2.3, then switch it back to crd.enabled: true in Trainer v2.4 to give users time to migrate?

@andreyvelich andreyvelich changed the title feat(manifests): Move CRD to Helm chart template directory feat(manifests): BREAKING_CHANGE – Move CRD to Helm chart template directory Jun 29, 2026
@andreyvelich andreyvelich changed the title feat(manifests): BREAKING_CHANGE – Move CRD to Helm chart template directory feat(manifests): BREAKING CHANGE: Move CRD to Helm chart template directory Jun 29, 2026
Signed-off-by: Andrey Velichkevich <andrey.velichkevich@gmail.com>
@google-oss-prow google-oss-prow Bot added size/L and removed size/M labels Jun 29, 2026
@robert-bell

Copy link
Copy Markdown
Contributor

Maybe we could switch the flag to crd.enabled: false in Trainer v2.3, then switch it back to crd.enabled: true in Trainer v2.4 to give users time to migrate?

I think we should default to true immediately as helm is already defaulting to installing the crds.

Re the migration- we should just be able to test the upgrade manually. I'm hoping helm will just add the necessary helm labels/annotations and adopt them, but if not users will just need to label/annotate the crds manually. It should be an easy test. Apols I can't check myself right now.

Signed-off-by: Andrey Velichkevich <andrey.velichkevich@gmail.com>
@google-oss-prow google-oss-prow Bot added size/M and removed size/L labels Jun 29, 2026
@andreyvelich

Copy link
Copy Markdown
Member Author

We are seeing some problems with Runtime installation, like I mentioned here: #3650 (comment)
@robert-bell What do you think about separation of Builtin Runtimes to a separate charts, like we do in Kustomize overlays?

Signed-off-by: Andrey Velichkevich <andrey.velichkevich@gmail.com>
@google-oss-prow google-oss-prow Bot added size/L and removed size/M labels Jun 29, 2026
@andreyvelich

andreyvelich commented Jun 29, 2026

Copy link
Copy Markdown
Member Author

If we still want to prefer single Helm charts for now, we can work around this by running helm upgrade after helm install complete: cc0f08b

What do we think about this approach?

Signed-off-by: Andrey Velichkevich <andrey.velichkevich@gmail.com>
@robert-bell

Copy link
Copy Markdown
Contributor

I dislike the two step install also I'm afraid. My main motivation is keeping operator installation as easy as possible. I think we should push for single command install.

Given the runtimes are default they should always be valid? Could we disable the validating webhook just for these resources? Something like a label on the ctrs and a MatchCondition on the webhook? We can add an integration/e2e test to ensure they do actually pass the webhook validation. Wdyt?

@andreyvelich

Copy link
Copy Markdown
Member Author

Something like a label on the ctrs and a MatchCondition on the webhook?

Hmm, MatchCondition could be a good idea. However, how are we going to define our builtin runtimes?
We don't want to disable validation for all Runtimes, since users might modify/create them, and we want Webhook to validate them.

@robert-bell

Copy link
Copy Markdown
Contributor

I'm thinking our builtin runtimes can have an additional label trainer.kubeflow.org/webhook-validation: disabled which causes them to be skipped when they're created.

Users wouldn't include this label on their CTRs so they would always be validated.

@andreyvelich

Copy link
Copy Markdown
Member Author

I'm thinking our builtin runtimes can have an additional label trainer.kubeflow.org/webhook-validation: disabled which causes them to be skipped when they're created.

@astefanutti @tenzen-y @VassilisVassiliadis @Sridhar1030 @akshaychitneni @kaisoz Any thoughts on this?
Shall we introduce such label to our builtin Runtimes?

@VassilisVassiliadis

Copy link
Copy Markdown
Contributor

I like retaining the ability for single-step installation of Kubeflow. Having a test or two will help us build confidence in the approach so I agree with it.

@Sridhar1030

Copy link
Copy Markdown
Member

+1, the label + MatchCondition approach works. Agreed on adding e2e tests to validate the builtins still pass webhook rules.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Split the Helm chart between CRDs, Control Plane, and Runtimes

8 participants