feat(manifests): BREAKING CHANGE: Move CRD to Helm chart template directory#3655
feat(manifests): BREAKING CHANGE: Move CRD to Helm chart template directory#3655andreyvelich wants to merge 5 commits into
Conversation
Signed-off-by: Andrey Velichkevich <andrey.velichkevich@gmail.com>
|
@andreyvelich: GitHub didn't allow me to assign the following users: robert-bell, VassilisVassiliadis. Note that only kubeflow members with read permissions, repo collaborators and people who have commented on this issue/PR can be assigned. Additionally, issues/PRs can only have 10 assignees at the same time. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: The full list of commands accepted by this bot can be found here. DetailsNeeds approval from an approver in each of these files:Approvers can indicate their approval by writing |
There was a problem hiding this comment.
Pull request overview
This PR addresses Helm’s CRD upgrade limitation by moving the Trainer CRDs from Helm’s special crds/ install-only mechanism into regular chart templates so that helm upgrade will reconcile CRD updates (linked to #3650).
Changes:
- Update the
manifestsMakefile target to copy generated CRDs intocharts/kubeflow-trainer/templates/crd/instead ofcharts/kubeflow-trainer/crds/. - Add Helm unit tests that validate the CRDs render from the new template paths.
- Add the CRD YAMLs under
charts/kubeflow-trainer/templates/crd/(TrainJob, TrainingRuntime, ClusterTrainingRuntime).
Reviewed changes
Copilot reviewed 2 out of 5 changed files in this pull request and generated 4 comments.
Show a summary per file
| File | Description |
|---|---|
| Makefile | Copies generated CRDs into the Helm templates directory (enabling upgrades). |
| charts/kubeflow-trainer/tests/crd/crd_test.yaml | Adds helm-unittest coverage that CRDs render from template paths. |
| charts/kubeflow-trainer/templates/crd/trainer.kubeflow.org_trainjobs.yaml | Ships TrainJob CRD as a Helm template (upgradeable). |
| charts/kubeflow-trainer/templates/crd/trainer.kubeflow.org_trainingruntimes.yaml | Ships TrainingRuntime CRD as a Helm template (upgradeable). |
| charts/kubeflow-trainer/templates/crd/trainer.kubeflow.org_clustertrainingruntimes.yaml | Ships ClusterTrainingRuntime CRD as a Helm template (upgradeable). |
| asserts: | ||
| - containsDocument: | ||
| apiVersion: apiextensions.k8s.io/v1 | ||
| kind: CustomResourceDefinition | ||
| name: trainjobs.trainer.kubeflow.org | ||
|
|
| asserts: | ||
| - containsDocument: | ||
| apiVersion: apiextensions.k8s.io/v1 | ||
| kind: CustomResourceDefinition | ||
| name: trainingruntimes.trainer.kubeflow.org | ||
|
|
| asserts: | ||
| - containsDocument: | ||
| apiVersion: apiextensions.k8s.io/v1 | ||
| kind: CustomResourceDefinition | ||
| name: clustertrainingruntimes.trainer.kubeflow.org |
| paths="./pkg/apis/trainer/v1alpha1/...;./pkg/controller/...;./pkg/runtime/...;./pkg/webhooks/...;./pkg/util/cert/..." \ | ||
| output:crd:artifacts:config=manifests/base/crds \ | ||
| output:rbac:artifacts:config=manifests/base/rbac \ | ||
| output:webhook:artifacts:config=manifests/base/webhook | ||
| cp -f manifests/base/crds/trainer.kubeflow.org_*.yaml $(TRAINER_CHART_DIR)/crds/ | ||
| cp -f manifests/base/crds/trainer.kubeflow.org_*.yaml $(TRAINER_CHART_DIR)/templates/crd/ |
robert-bell
left a comment
There was a problem hiding this comment.
Thanks @andreyvelich. Overall looks fine.
Couple of thoughts -
- I do think we should allow users to disable crd installation via a value like
crds.enabledotherwise we're removing functionality. I know some folk do manage crds outside helm chart using--skip-crds. Wdyt? - We need to announce this as a breaking change. User's who've set
--skip-crdswill need to update tocrds.enabled: false. - Are there any migration steps users need to take when upgrading? Will the helm release be able to previously installed CRDs remain automatically owned by the chart still?
Good point, let me add this flag.
Yes, I agree. Let me update the PR description.
@tenzen-y @Sridhar1030 @astefanutti @akshaychitneni @VassilisVassiliadis @aniket2405 Any thoughts on this? Maybe we could switch the flag to |
Signed-off-by: Andrey Velichkevich <andrey.velichkevich@gmail.com>
I think we should default to true immediately as helm is already defaulting to installing the crds. Re the migration- we should just be able to test the upgrade manually. I'm hoping helm will just add the necessary helm labels/annotations and adopt them, but if not users will just need to label/annotate the crds manually. It should be an easy test. Apols I can't check myself right now. |
Signed-off-by: Andrey Velichkevich <andrey.velichkevich@gmail.com>
|
We are seeing some problems with Runtime installation, like I mentioned here: #3650 (comment) |
Signed-off-by: Andrey Velichkevich <andrey.velichkevich@gmail.com>
|
If we still want to prefer single Helm charts for now, we can work around this by running What do we think about this approach? |
Signed-off-by: Andrey Velichkevich <andrey.velichkevich@gmail.com>
|
I dislike the two step install also I'm afraid. My main motivation is keeping operator installation as easy as possible. I think we should push for single command install. Given the runtimes are default they should always be valid? Could we disable the validating webhook just for these resources? Something like a label on the ctrs and a MatchCondition on the webhook? We can add an integration/e2e test to ensure they do actually pass the webhook validation. Wdyt? |
Hmm, MatchCondition could be a good idea. However, how are we going to define our builtin runtimes? |
|
I'm thinking our builtin runtimes can have an additional label Users wouldn't include this label on their CTRs so they would always be validated. |
@astefanutti @tenzen-y @VassilisVassiliadis @Sridhar1030 @akshaychitneni @kaisoz Any thoughts on this? |
|
I like retaining the ability for single-step installation of Kubeflow. Having a test or two will help us build confidence in the approach so I agree with it. |
|
+1, the label + MatchCondition approach works. Agreed on adding e2e tests to validate the builtins still pass webhook rules. |
Fixes: #3650
As we discussed, let's move the Trainer CRDs under template directory, so
helm upgradewill deploy newer version if needed./assign @astefanutti @tenzen-y @robert-bell @akshaychitneni @VassilisVassiliadis @Sridhar1030