Skip to content

website: add Kustomize installation docs for Spark Operator#4392

Draft
alimaredia wants to merge 1 commit into
kubeflow:masterfrom
alimaredia:spark-operator-installation-via-kustomize
Draft

website: add Kustomize installation docs for Spark Operator#4392
alimaredia wants to merge 1 commit into
kubeflow:masterfrom
alimaredia:spark-operator-installation-via-kustomize

Conversation

@alimaredia

Copy link
Copy Markdown

Description of Changes

Add Kustomize as a first-class installation path alongside Helm in the Spark Operator getting-started guide. This includes install, configuration, upgrade, uninstall, and RBAC setup instructions for Kustomize users.

Checklist

Add Kustomize as a first-class installation path alongside Helm in the
Spark Operator getting-started guide. This includes install, configuration,
upgrade, uninstall, and RBAC setup instructions for Kustomize users.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Signed-off-by: Ali Maredia <amaredia@redhat.com>
@google-oss-prow

Copy link
Copy Markdown

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please assign jacobsalway for approval. For more information see the Kubernetes Code Review Process.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@google-oss-prow

Copy link
Copy Markdown

Hi @alimaredia. Thanks for your PR.

I'm waiting for a kubeflow member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@github-actions

github-actions Bot commented Jun 9, 2026

Copy link
Copy Markdown

🚫 This command cannot be processed. Only organization members or owners can use the commands.

@alimaredia

Copy link
Copy Markdown
Author

Review is pending a Spark Operator point release with the updated Kustomize manifests

@tariq-hasan tariq-hasan left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have added some comments. I suppose a lot of these could go in a separate PR as part of an overall docs refresh since they are unrelated to the Kustomize change but I thought to flag them either way.

onFailureRetryInterval: 10
onSubmissionFailureRetries: 5
onSubmissionFailureRetryInterval: 20
type: Scala

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am wondering if we want to update this example to reflect the versions, service account, image and the shape of the restart policy accurately.

For example, a Kustomize install would give serviceAccount: spark-operator-spark. This would then align with the Helm install if the Helm release is named spark-operator.

```

Then the chart will set up a service account for your Spark jobs to use in that namespace.
**Warning:** `kubectl delete -k config/default` will also remove the CRDs, which deletes all SparkApplication, ScheduledSparkApplication, and SparkConnect resources cluster-wide.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm noticing the uninstall guide says that CRDs are not removed automatically and that they need to be manually removed. I suppose the upstream doc is the one that needs to be fixed.


To upgrade the operator using Kustomize manifests pull the latest manifests (or the desired release tag) and re-apply:

```

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is more of a nit but perhaps we should do ```shell to add the language hint on the code block.

kubectl -n spark-operator get pods
```

Note that `spark-pi.yaml` configures the driver pod to use the `spark` service account to communicate with the Kubernetes API server. You might need to replace it with the appropriate service account before submitting the job. If you installed the operator using the Helm chart and overrode `spark.jobNamespaces`, the service account name ends with `-spark` and starts with the Helm release name. For example, if you would like to run your Spark jobs to run in a namespace called `test-ns`, first make sure it already exists, and then install the chart with the command:

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Although the documentation is stale I'm wondering if the gist of it is still valid - in the sense that the example's serviceAccount must match whatever service account the user's install creates (<release>-spark for Helm, spark-operator-spark for Kustomize).

The caveat lies with Helm in that any release with a release name that is not spark-operator will make the spark-pi.yaml example fail.

Comment on lines 288 to 290
By default, the operator will install the [CustomResourceDefinitions](https://kubernetes.io/docs/tasks/access-kubernetes-api/extend-api-custom-resource-definitions/) for the custom resources it manages. This can be disabled by setting the flag `-install-crds=false`, in which case the CustomResourceDefinitions can be installed manually using `kubectl apply -f manifest/spark-operator-crds.yaml`.

The mutating admission webhook is an **optional** component and can be enabled or disabled using the `-enable-webhook` flag, which defaults to `false`.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The install-crds and enable-webhook flags are no longer supported on the operator.


The mutating admission webhook is an **optional** component and can be enabled or disabled using the `-enable-webhook` flag, which defaults to `false`.

By default, the operator will manage custom resource objects of the managed CRD types for the whole cluster. It can be configured to manage only the custom resource objects in a specific namespace with the flag `-namespace=<namespace>`

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
By default, the operator will manage custom resource objects of the managed CRD types for the whole cluster. It can be configured to manage only the custom resource objects in a specific namespace with the flag `-namespace=<namespace>`
By default, the operator will manage custom resource objects of the managed CRD types for the whole cluster. It can be configured to manage only the custom resource objects in a specific namespace with the flag `--namespace=<namespace>`

Comment on lines 355 to 360
-enable-metrics=true
-metrics-port=10254
-metrics-endpoint=/metrics
-metrics-prefix=myServiceName
-metrics-label=label1Key
-metrics-label=label2Key

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The metrics-port seems to have been removed as part of kubeflow/spark-operator#2072.

Suggested change
--enable-metrics=true
--metrics-endpoint=/metrics
--metrics-prefix=myServiceName
--metrics-label=label1Key
--metrics-label=label2Key

See [helm install](https://helm.sh/docs/helm/helm_install) for command documentation.

Installing the chart will create a namespace `spark-operator` if it doesn't exist, and helm will set up RBAC for the operator to run in the namespace. It will also set up RBAC in the `default` namespace for driver pods of your Spark applications to be able to manipulate executor pods. In addition, the chart will create a Deployment in the namespace `spark-operator`. The chart by default does not enable [Mutating Admission Webhook](https://kubernetes.io/docs/reference/access-authn-authz/extensible-admission-controllers/) for Spark pod customization. When enabled, a webhook service and a secret storing the x509 certificate called `spark-webhook-certs` are created for that purpose. To install the operator with the mutating admission webhook on a Kubernetes cluster, install the chart with the flag `webhook.enable=true`:

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe the secret name is spark-operator-webhook-certs as opposed to spark-webhook-certs.

The operator is typically deployed and run using the Helm chart. However, users can still run it outside a Kubernetes cluster and make it talk to the Kubernetes API server of a cluster by specifying path to `kubeconfig`, which can be done using the `-kubeconfig` flag.
The operator is typically deployed and run using the Helm chart or Kustomize manifests. However, users can still run it outside a Kubernetes cluster and make it talk to the Kubernetes API server of a cluster by specifying path to `kubeconfig`, which can be done using the `-kubeconfig` flag.

The operator uses multiple workers in the `SparkApplication` controller. The number of worker threads are controlled using command-line flag `-controller-threads` which has a default value of 10.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe this is a remnant from the controller-runtime upgrade: https://github.com/kubeflow/spark-operator/pull/2072/changes.

Suggested change
The operator uses multiple workers in the `SparkApplication` controller. The number of worker threads are controlled using command-line flag `-controller-threads` which has a default value of 10.
The operator uses multiple workers in the `SparkApplication` controller. The number of worker threads are controlled using command-line flag `--controller-threads` which has a default value of 10.


The operator uses multiple workers in the `SparkApplication` controller. The number of worker threads are controlled using command-line flag `-controller-threads` which has a default value of 10.

The operator enables cache resynchronization so periodically the informers used by the operator will re-list existing objects it manages and re-trigger resource events. The resynchronization interval in seconds can be configured using the flag `-resync-interval`, with a default value of 30 seconds.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The resync-interval flag was removed as part of kubeflow/spark-operator#2072.

@Arhell Arhell left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/ok-to-test

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants