website: add Kustomize installation docs for Spark Operator#4392
website: add Kustomize installation docs for Spark Operator#4392alimaredia wants to merge 1 commit into
Conversation
Add Kustomize as a first-class installation path alongside Helm in the Spark Operator getting-started guide. This includes install, configuration, upgrade, uninstall, and RBAC setup instructions for Kustomize users. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> Signed-off-by: Ali Maredia <amaredia@redhat.com>
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: The full list of commands accepted by this bot can be found here. DetailsNeeds approval from an approver in each of these files:Approvers can indicate their approval by writing |
|
Hi @alimaredia. Thanks for your PR. I'm waiting for a kubeflow member to verify that this patch is reasonable to test. If it is, they should reply with Once the patch is verified, the new status will be reflected by the I understand the commands that are listed here. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
|
🚫 This command cannot be processed. Only organization members or owners can use the commands. |
|
Review is pending a Spark Operator point release with the updated Kustomize manifests |
tariq-hasan
left a comment
There was a problem hiding this comment.
I have added some comments. I suppose a lot of these could go in a separate PR as part of an overall docs refresh since they are unrelated to the Kustomize change but I thought to flag them either way.
| onFailureRetryInterval: 10 | ||
| onSubmissionFailureRetries: 5 | ||
| onSubmissionFailureRetryInterval: 20 | ||
| type: Scala |
There was a problem hiding this comment.
I am wondering if we want to update this example to reflect the versions, service account, image and the shape of the restart policy accurately.
For example, a Kustomize install would give serviceAccount: spark-operator-spark. This would then align with the Helm install if the Helm release is named spark-operator.
| ``` | ||
|
|
||
| Then the chart will set up a service account for your Spark jobs to use in that namespace. | ||
| **Warning:** `kubectl delete -k config/default` will also remove the CRDs, which deletes all SparkApplication, ScheduledSparkApplication, and SparkConnect resources cluster-wide. |
There was a problem hiding this comment.
I'm noticing the uninstall guide says that CRDs are not removed automatically and that they need to be manually removed. I suppose the upstream doc is the one that needs to be fixed.
|
|
||
| To upgrade the operator using Kustomize manifests pull the latest manifests (or the desired release tag) and re-apply: | ||
|
|
||
| ``` |
There was a problem hiding this comment.
This is more of a nit but perhaps we should do ```shell to add the language hint on the code block.
| kubectl -n spark-operator get pods | ||
| ``` | ||
|
|
||
| Note that `spark-pi.yaml` configures the driver pod to use the `spark` service account to communicate with the Kubernetes API server. You might need to replace it with the appropriate service account before submitting the job. If you installed the operator using the Helm chart and overrode `spark.jobNamespaces`, the service account name ends with `-spark` and starts with the Helm release name. For example, if you would like to run your Spark jobs to run in a namespace called `test-ns`, first make sure it already exists, and then install the chart with the command: |
There was a problem hiding this comment.
Although the documentation is stale I'm wondering if the gist of it is still valid - in the sense that the example's serviceAccount must match whatever service account the user's install creates (<release>-spark for Helm, spark-operator-spark for Kustomize).
The caveat lies with Helm in that any release with a release name that is not spark-operator will make the spark-pi.yaml example fail.
| By default, the operator will install the [CustomResourceDefinitions](https://kubernetes.io/docs/tasks/access-kubernetes-api/extend-api-custom-resource-definitions/) for the custom resources it manages. This can be disabled by setting the flag `-install-crds=false`, in which case the CustomResourceDefinitions can be installed manually using `kubectl apply -f manifest/spark-operator-crds.yaml`. | ||
|
|
||
| The mutating admission webhook is an **optional** component and can be enabled or disabled using the `-enable-webhook` flag, which defaults to `false`. |
There was a problem hiding this comment.
The install-crds and enable-webhook flags are no longer supported on the operator.
|
|
||
| The mutating admission webhook is an **optional** component and can be enabled or disabled using the `-enable-webhook` flag, which defaults to `false`. | ||
|
|
||
| By default, the operator will manage custom resource objects of the managed CRD types for the whole cluster. It can be configured to manage only the custom resource objects in a specific namespace with the flag `-namespace=<namespace>` |
There was a problem hiding this comment.
| By default, the operator will manage custom resource objects of the managed CRD types for the whole cluster. It can be configured to manage only the custom resource objects in a specific namespace with the flag `-namespace=<namespace>` | |
| By default, the operator will manage custom resource objects of the managed CRD types for the whole cluster. It can be configured to manage only the custom resource objects in a specific namespace with the flag `--namespace=<namespace>` |
| -enable-metrics=true | ||
| -metrics-port=10254 | ||
| -metrics-endpoint=/metrics | ||
| -metrics-prefix=myServiceName | ||
| -metrics-label=label1Key | ||
| -metrics-label=label2Key |
There was a problem hiding this comment.
The metrics-port seems to have been removed as part of kubeflow/spark-operator#2072.
| --enable-metrics=true | |
| --metrics-endpoint=/metrics | |
| --metrics-prefix=myServiceName | |
| --metrics-label=label1Key | |
| --metrics-label=label2Key |
| See [helm install](https://helm.sh/docs/helm/helm_install) for command documentation. | ||
|
|
||
| Installing the chart will create a namespace `spark-operator` if it doesn't exist, and helm will set up RBAC for the operator to run in the namespace. It will also set up RBAC in the `default` namespace for driver pods of your Spark applications to be able to manipulate executor pods. In addition, the chart will create a Deployment in the namespace `spark-operator`. The chart by default does not enable [Mutating Admission Webhook](https://kubernetes.io/docs/reference/access-authn-authz/extensible-admission-controllers/) for Spark pod customization. When enabled, a webhook service and a secret storing the x509 certificate called `spark-webhook-certs` are created for that purpose. To install the operator with the mutating admission webhook on a Kubernetes cluster, install the chart with the flag `webhook.enable=true`: | ||
|
|
There was a problem hiding this comment.
I believe the secret name is spark-operator-webhook-certs as opposed to spark-webhook-certs.
| The operator is typically deployed and run using the Helm chart. However, users can still run it outside a Kubernetes cluster and make it talk to the Kubernetes API server of a cluster by specifying path to `kubeconfig`, which can be done using the `-kubeconfig` flag. | ||
| The operator is typically deployed and run using the Helm chart or Kustomize manifests. However, users can still run it outside a Kubernetes cluster and make it talk to the Kubernetes API server of a cluster by specifying path to `kubeconfig`, which can be done using the `-kubeconfig` flag. | ||
|
|
||
| The operator uses multiple workers in the `SparkApplication` controller. The number of worker threads are controlled using command-line flag `-controller-threads` which has a default value of 10. |
There was a problem hiding this comment.
I believe this is a remnant from the controller-runtime upgrade: https://github.com/kubeflow/spark-operator/pull/2072/changes.
| The operator uses multiple workers in the `SparkApplication` controller. The number of worker threads are controlled using command-line flag `-controller-threads` which has a default value of 10. | |
| The operator uses multiple workers in the `SparkApplication` controller. The number of worker threads are controlled using command-line flag `--controller-threads` which has a default value of 10. |
|
|
||
| The operator uses multiple workers in the `SparkApplication` controller. The number of worker threads are controlled using command-line flag `-controller-threads` which has a default value of 10. | ||
|
|
||
| The operator enables cache resynchronization so periodically the informers used by the operator will re-list existing objects it manages and re-trigger resource events. The resynchronization interval in seconds can be configured using the flag `-resync-interval`, with a default value of 30 seconds. |
There was a problem hiding this comment.
The resync-interval flag was removed as part of kubeflow/spark-operator#2072.
Description of Changes
Add Kustomize as a first-class installation path alongside Helm in the Spark Operator getting-started guide. This includes install, configuration, upgrade, uninstall, and RBAC setup instructions for Kustomize users.
Checklist