Skip to content

STOR-2962: Add SELinuxMount GA Upgrade Readiness for 5.0#2010

Merged
openshift-merge-bot[bot] merged 5 commits into
openshift:masterfrom
jsafrane:selinux-block-upgrade
Jun 10, 2026
Merged

STOR-2962: Add SELinuxMount GA Upgrade Readiness for 5.0#2010
openshift-merge-bot[bot] merged 5 commits into
openshift:masterfrom
jsafrane:selinux-block-upgrade

Conversation

@jsafrane

@jsafrane jsafrane commented May 14, 2026

Copy link
Copy Markdown
Contributor

This enhancement prepares OpenShift 5.0 for the SELinuxMount feature going GA in Kubernetes 1.37 / OpenShift 5.1.

SELinuxMount introduces a breaking change and we'll need to mark a 5.0 cluster un-upgradeable until the cluster admin fixes their workloads or opts -out from the SELinuxMount. This enhancement proposes how to detect such workloads and how to pass the information from the component that knows it (a <carry> patch in kube-controller-manager) to a component that marks the cluster un-upgradeable (cluster-storage-operator).

See metric cluster:selinux_warning_controller_selinux_volume_conflict:count in telemetry for nr. of affected clusters. It's a very low number (not commenting publicly ;-)). Most clusters will upgrade just fine.

There are some open questions about the actual API used to pass the info. Just circulating the idea about a <carry> patch first before we dive into implementation details.

Proof of concept of the <carry> patch, using a ConfigMap in openshift-config namespace as "the API object": openshift/kubernetes#2671 (the actual API object is for discussion).

@openshift-ci-robot openshift-ci-robot added the jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. label May 14, 2026
@openshift-ci-robot

openshift-ci-robot commented May 14, 2026

Copy link
Copy Markdown

@jsafrane: This pull request references STOR-2962 which is a valid jira issue.

Details

In response to this:

This enhancement prepares OpenShift 5.0 for the SELinuxMount feature going GA in Kubernetes 1.37 / OpenShift 5.1.

SELinuxMount introduces a breaking change and we'll need to mark a 5.0 cluster un-upgradeable until the cluster admin fixes their workloads or opts -out from the SELinuxMount. This enhancement proposes how to detect such workloads and how to pass the information from the component that knows it (a <carry> patch in kube-controller-manager) to a component that marks the cluster un-upgradeable (cluster-storage-operator).

See metric cluster:selinux_warning_controller_selinux_volume_conflict:count in telemetry for nr. of affected clusters. It's a very low number (not commenting publicly ;-)). Most clusters will upgrade just fine.

There are some open question about the actual API used to pass the info. Just circulating the idea about a <carry> patch first before we dive into implementation details.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@openshift-ci openshift-ci Bot requested review from 2uasimojo and enxebre May 14, 2026 12:50
What is *the actual API object* is currently open. Ideas:

* A ConfigMap in a shared namespace, such as
`openshift-config/selinux-conflicts`. Does KCM have permissions to do so?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@JoelSpeed JoelSpeed left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No major concerns from my perspective here

Would be interested though if you could add an example of how the end user is supposed to observe the warnings? What will the upgradeable false condition look like and how will they therefore know which pods need attention

Will there be a KCS that explains to them what actions they need to take linked from the condition?

What is *the actual API object* is currently open. Ideas:

* A ConfigMap in a shared namespace, such as
`openshift-config/selinux-conflicts`. Does KCM have permissions to do so?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Who is the consumer of this object? Is it for end users or is this considered to be internal communication between openshift components?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The config map is consumed only by the cluster-storage-operator.

As the nr. of bad Pods can be large (we have a cluster with 6000 of them), users need to use metrics to list the namespaces + pods. The upgradeable condition will try to point the users to the metric. There will be an alert with a longer human friendly description and name of the metric to check (and maybe a link to the console with the metric, if I find how to make it).

The question is, should the upgradeable condition say generic "there are Pods that could get broken during upgrade to 5.0 / 4.23, please see metric TBD" or should it be specific about the nr of Pods, "there are 512 Pods that could get broken during upgrade to 5.0 / 4.23, please see metric TBD"? If we want the actual number, we need to choose how often will KCM update it. Frequent updates will load the cluster unnecessarily, less often updates may give old number to the user.

I'd start with just a boolean flag instead of the actual number.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The config map is consumed only by the cluster-storage-operator.

Given this is completely internal I think a configmap makes sense as a temporary way to co-ordinate between two components

The upgradeable condition will try to point the users to the metric. There will be an alert with a longer human friendly description and name of the metric to check (and maybe a link to the console with the metric, if I find how to make it).

In a cluster without metrics, will there be an alternative way for users to identify the pods? Is there some CLI command we could recommend via a KCS?

should it be specific about the nr of Pods

Perhaps you could update the message based on a range? E.g. there are approximately 500

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In a cluster without metrics, will there be an alternative way for users to identify the pods? Is there some CLI command we could recommend via a KCS?

I added a note to the KEP, indeed, the KCS needs to have steps how to find the Pods in a cluster without Prometheus. curl + grep scraping KCM metrics could be enough, however, we need to document how to get a token + how to find all KCMs.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps you could update the message based on a range? E.g. there are approximately 500

I left it as an implementation detail :-).

jsafrane added 2 commits May 25, 2026 13:11
- KCM-O is not a viable approach, it does not run in HyperShift.
- Add KCS about details, so we can link it from the alert(s).
- The API object is indeed a ConfigMap.
- Add not how often it will be updated + what's the content.
- Add more details about the proposed KCS, especially it must have
  instructions how to get the affected Pods in a cluster without
  Prometheus.

#### Single-node Deployments or MicroShift

No special considerations are needed for SNO.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There may be some special things need to do for single node deployments. I do not think StoragePerformantPolicy admission hook was enabled in those environments.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If it is about openshift/cluster-storage-operator#664 (comment), that's microshift. And it's intentionally omitted in a paragraph below.

Comment thread enhancements/storage/selinuxmount-ga-block-upgrade.md
Comment thread enhancements/storage/selinuxmount-ga-block-upgrade.md
Comment thread enhancements/storage/selinuxmount-ga-block-upgrade.md
@gnufied

gnufied commented Jun 8, 2026

Copy link
Copy Markdown
Member

/lgtm

@openshift-ci openshift-ci Bot added the lgtm Indicates that a PR is ready to be merged. label Jun 8, 2026
Comment on lines +294 to +299
The KCS must mention how to get list of the Pods. Listing metric
`selinux_warning_controller_selinux_volume_conflict`
would be enough for most users, but for those who don't run
Prometheus we need to document how to scrape KCM metrics
directly, i.e. getting a token + `curl` + `grep` against
all KCMs.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a way they could use oc to get pods and look at something within them that would show this? What exactly is KCM looking at to determine which pods are susceptible?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not possible to simply list all affected Pods. KCM cross-checks all pods that use the same volume. All of them must have the same SELInux label (either in pod's spec.securityContext or in every spec.container[*].securityLevel) and the same spec.securityContext.SELinuxChangePolicy and privileged level and also compare it with CSIDriver seLinuxMount field. It's ... convoluted.

https://github.com/kubernetes/enhancements/tree/master/keps/sig-storage/1710-selinux-relabeling

@ingvagabund

Copy link
Copy Markdown
Member

/approve

@openshift-ci

openshift-ci Bot commented Jun 10, 2026

Copy link
Copy Markdown
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: ingvagabund

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci openshift-ci Bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jun 10, 2026
@openshift-ci

openshift-ci Bot commented Jun 10, 2026

Copy link
Copy Markdown
Contributor

@jsafrane: all tests passed!

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

@openshift-merge-bot openshift-merge-bot Bot merged commit f507baa into openshift:master Jun 10, 2026
2 checks passed
openshift-merge-bot Bot pushed a commit to openshift/api that referenced this pull request Jun 26, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. lgtm Indicates that a PR is ready to be merged.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants