Skip to content

OCPCLOUD-3539: add topology-aware operator for CRD Compatibility Checker#600

Open
simkam wants to merge 1 commit into
openshift:mainfrom
simkam:crdchecker-controller
Open

OCPCLOUD-3539: add topology-aware operator for CRD Compatibility Checker#600
simkam wants to merge 1 commit into
openshift:mainfrom
simkam:crdchecker-controller

Conversation

@simkam

@simkam simkam commented Jun 17, 2026

Copy link
Copy Markdown
Contributor

https://redhat.atlassian.net/browse/OCPCLOUD-3539

Implement OCPCLOUD-3539 by converting the CRD Compatibility Checker from a static CVO-deployed Deployment to a Manager/Operand pattern. This fixes SNO conformance by making the operand topology-aware.

The new crd-compatibility-checker-operator:

  • Watches Infrastructure.status.controlPlaneTopology
  • Configures operand with 1 replica (no PDB) for SingleReplica topology
  • Configures operand with 2 replicas + PDB for HighlyAvailable topologies
  • Corrects drift when operand is manually modified
  • Uses separate ServiceAccounts (operator vs operand) for least privilege

Changes:

  • New controller in pkg/controllers/crdcompatibilityoperator/
  • New binary cmd/crd-compatibility-checker-operator/
  • Integration tests covering topology transitions and drift correction
  • Operator manifests with RBAC for managing Deployments and PDBs
  • Updated build pipeline (Makefile, Dockerfile.rhel)
  • Renumbered CRD checker manifests to eliminate gaps

The operand Deployment is now managed dynamically by the operator instead of statically by CVO, allowing proper HA configuration without breaking SNO.

Summary by CodeRabbit

Release Notes

  • New Features

    • Added the CRD Compatibility Checker Operator for topology-aware management of the operand workload, including replica handling and high-availability behavior (PDB present/absent).
  • Documentation

    • Documented the operator’s watched resources and topology-driven reconciliation/drift correction behavior.
  • Chores

    • Updated operator packaging to ship the operator binary.
    • Added operator ServiceAccount, RBAC, metrics Service, and deployment/secret wiring.
    • Expanded egress NetworkPolicy selection for the operator.
  • Tests

    • Added integration tests covering topology changes and drift correction.

@openshift-merge-bot

Copy link
Copy Markdown
Contributor

Pipeline controller notification
This repo is configured to use the pipeline controller. Second-stage tests will be triggered either automatically or after lgtm label is added, depending on the repository configuration. The pipeline controller will automatically detect which contexts are required and will utilize /test Prow commands to trigger the second stage.

For optional jobs, comment /test ? to see a list of all defined jobs. To trigger manually all jobs from second stage use /pipeline required command.

This repository is configured in: LGTM mode

@openshift-ci-robot openshift-ci-robot added the jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. label Jun 17, 2026
@openshift-ci-robot

openshift-ci-robot commented Jun 17, 2026

Copy link
Copy Markdown

@simkam: This pull request references OCPCLOUD-3539 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "5.0.0" version, but no target version was set.

Details

In response to this:

https://redhat.atlassian.net/browse/OCPCLOUD-3539

Implement OCPCLOUD-3539 by converting the CRD Compatibility Checker from a static CVO-deployed Deployment to a Manager/Operand pattern. This fixes SNO conformance by making the operand topology-aware.

The new crd-compatibility-checker-operator:

  • Watches Infrastructure.status.controlPlaneTopology
  • Configures operand with 1 replica (no PDB) for SingleReplica topology
  • Configures operand with 2 replicas + PDB for HighlyAvailable topologies
  • Corrects drift when operand is manually modified
  • Uses separate ServiceAccounts (operator vs operand) for least privilege

Changes:

  • New controller in pkg/controllers/crdcompatibilityoperator/
  • New binary cmd/crd-compatibility-checker-operator/
  • Integration tests covering topology transitions and drift correction
  • Operator manifests with RBAC for managing Deployments and PDBs
  • Updated build pipeline (Makefile, Dockerfile.rhel)
  • Renumbered CRD checker manifests to eliminate gaps

The operand Deployment is now managed dynamically by the operator instead of statically by CVO, allowing proper HA configuration without breaking SNO.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@coderabbitai

coderabbitai Bot commented Jun 17, 2026

Copy link
Copy Markdown

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review

Walkthrough

This PR introduces a new crd-compatibility-checker-operator binary and CRDCompatibilityOperatorController that watches Infrastructure.status.controlPlaneTopology to manage the CRD Compatibility Checker operand Deployment (replicas 1 or 2) and PodDisruptionBudget (create/delete) based on cluster topology. Accompanying changes add Kubernetes RBAC manifests, a metrics Service, an updated Deployment manifest, build wiring, integration tests, and documentation.

Changes

CRD Compatibility Checker Operator

Layer / File(s) Summary
Shared namespace constant and existing entrypoint refactor
pkg/controllers/common_consts.go, cmd/crd-compatibility-checker/main.go
Exports DefaultCRDCompatibilityCheckerNamespace and updates the existing checker entrypoint to use it instead of a local constant.
CRDCompatibilityOperatorController implementation
pkg/controllers/crdcompatibilityoperator/controller.go
Adds the full controller: type, constructor, SetupWithManager watches for Infrastructure/Deployment/PodDisruptionBudget, Reconcile topology logic, reconcileDeployment (complete operand Deployment spec), and reconcilePDB (create-or-patch or delete based on topology).
Operator entrypoint and bootstrap
cmd/crd-compatibility-checker-operator/main.go
Implements initScheme, main (namespace-limited cache, operand image env resolution, manager init, controller wiring, manager start), setupController, and getContainerImage for pod identity lookup.
RBAC ServiceAccount, Roles, and Bindings
manifests/0000_21_crd-compatibility-checker-operator_00_service-account.yaml, manifests/0000_21_crd-compatibility-checker-operator_01_rbac-roles.yaml, manifests/0000_21_crd-compatibility-checker-operator_02_rbac-bindings.yaml
Defines ServiceAccount and token Secret, cluster-scoped ClusterRole with get/list/watch on infrastructures and apiservers, namespaced Role with patch/delete/create permissions for Deployments, PodDisruptionBudgets, Leases, Events, and Pods, and role bindings to connect ServiceAccount to both roles.
Operator Deployment, metrics Service, and egress NetworkPolicy
manifests/0000_21_crd-compatibility-checker-operator_03_metrics-service.yaml, manifests/0000_21_crd-compatibility-checker-operator_04_deployment.yaml, manifests/0000_20_crd-compatibility-checker_06_allow-egress-operators.yaml
Creates headless metrics Service on port 8443, Deployment with operator binary, POD_NAME/POD_NAMESPACE env vars, master node selection/toleration, priority class, and operator metrics TLS secret volume; extends egress NetworkPolicy to permit compatibility-requirements-operator pod.
Controller integration tests
pkg/controllers/crdcompatibilityoperator/suite_test.go, pkg/controllers/crdcompatibilityoperator/controller_test.go
Adds envtest suite with topology transitions (HighlyAvailable ↔ SingleReplica), operand Deployment and PodDisruptionBudget drift-correction assertions, and table-driven tests for DualReplica, HighlyAvailableArbiter, and External topologies.
Build, Docker image, and documentation
Makefile, Dockerfile.rhel, README.md, cmd/crd-compatibility-checker-operator/README.md, docs/controllers/crd-compatibility-checker-operator.md
Adds bin/crd-compatibility-checker-operator to build target prerequisites, copies operator binary into RHEL image, updates README controllers section, and adds detailed controller documentation describing topology detection, replica scaling, PDB lifecycle, drift correction, and related components.

Sequence Diagram(s)

sequenceDiagram
    participant Manager as controller-runtime Manager
    participant Controller as CRDCompatibilityOperatorController
    participant APIServer as Kubernetes API Server
    participant Infrastructure as Infrastructure CR
    participant Deployment as Operand Deployment
    participant PDB as PodDisruptionBudget

    Manager->>Controller: Reconcile(req)
    Controller->>APIServer: Get Infrastructure
    APIServer-->>Controller: ControlPlaneTopology status
    alt SingleReplica
        Controller->>APIServer: Patch Deployment replicas=1
        APIServer-->>Deployment: updated
        Controller->>APIServer: Delete PDB (ignore NotFound)
    else HighlyAvailable / other
        Controller->>APIServer: Patch Deployment replicas=2
        APIServer-->>Deployment: updated
        Controller->>APIServer: CreateOrPatch PDB minAvailable=1
        APIServer-->>PDB: created/updated
    end
    Controller-->>Manager: Result{}
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

🚥 Pre-merge checks | ✅ 11 | ❌ 4

❌ Failed checks (4 warnings)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 22.22% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
Test Structure And Quality ⚠️ Warning Test suite has critical assertion message gaps in suite_test.go (lines 62-64, 72, 91 lack failure messages) and 5 Eventually calls in controller_test.go lack explicit timeout parameters, violating... Add failure messages to all 5 assertions in suite_test.go; add timeout/interval parameters to all Eventually calls (particularly at lines 141, 181, 226, 96, 103 in controller_test.go BeforeEach blocks).
Microshift Test Compatibility ⚠️ Warning New Ginkgo integration tests use config.openshift.io Infrastructure API (not available on MicroShift) without protection tags or skip mechanisms. Add [apigroup:config.openshift.io] tag to test names, or add [Skipped:MicroShift] label, or guard with exutil.IsMicroShiftCluster() check.
Topology-Aware Scheduling Compatibility ⚠️ Warning The operand Deployment created by the controller targets control-plane nodes via nodeSelector, which breaks on HyperShift (External topology) where no control-plane nodes exist. Remove or conditionally apply the nodeSelector targeting node-role.kubernetes.io/master based on topology; on External topology, allow pods to schedule on worker nodes instead.
✅ Passed checks (11 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The PR title accurately describes the main change: adding a topology-aware operator for the CRD Compatibility Checker. It clearly summarizes the primary objective and is specific enough for scanning PR history.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.
Stable And Deterministic Test Names ✅ Passed All Ginkgo test titles in controller_test.go and suite_test.go are stable and deterministic. No dynamic information (pod names, timestamps, UUIDs, node names, namespace suffixes, IPs) appears in an...
Single Node Openshift (Sno) Test Compatibility ✅ Passed The Ginkgo tests added use envtest (local API server, no actual nodes) and verify only resource declarative state (replicas, PDB existence, labels) without requiring actual multi-node pod schedulin...
Ote Binary Stdout Contract ✅ Passed Operator main.go properly redirects klog to stderr via commoncmdoptions.InitOperatorConfig; test suite uses GinkgoWriter (framework-level test writer, not stdout); no direct stdout writes in proces...
Ipv6 And Disconnected Network Test Compatibility ✅ Passed Tests use isolated envtest environment with no external registry pulls or hardcoded IPv4 addresses; no IP parsing assumptions; affinity based on label selectors only.
No-Weak-Crypto ✅ Passed No weak cryptographic algorithms (MD5, SHA1, DES, RC4, 3DES, Blowfish, ECB), custom crypto implementations, or non-constant-time secret comparisons found in any code added by this PR.
Container-Privileges ✅ Passed No privilege escalation vectors found. Operator uses restricted-v3 SCC, containers have no privileged mode, host access, or dangerous capabilities enabled.
No-Sensitive-Data-In-Logs ✅ Passed All logging statements in the new code use generic, structured messages with no exposure of passwords, tokens, API keys, PII, session IDs, or credentials. Topology configuration logged is non-sensi...

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@openshift-ci openshift-ci Bot requested review from damdo and mdbooth June 17, 2026 11:16
@openshift-ci

openshift-ci Bot commented Jun 17, 2026

Copy link
Copy Markdown
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please assign nrb for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)
manifests/0000_21_crd-compatibility-checker-operator_03_deployment.yaml (2)

65-68: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Set CPU/memory limits for the operator container.

Only requests are defined. Missing limits can lead to uncontrolled resource growth under pressure.

As per coding guidelines, "Resource limits (cpu, memory) on every container."

Suggested limits patch
         resources:
           requests:
             cpu: 10m
             memory: 50Mi
+          limits:
+            cpu: 100m
+            memory: 200Mi
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@manifests/0000_21_crd-compatibility-checker-operator_03_deployment.yaml`
around lines 65 - 68, The operator container in the deployment manifest has
resource requests defined (cpu: 10m and memory: 50Mi) but is missing resource
limits. Add a limits section under the resources field alongside the existing
requests section, specifying appropriate maximum CPU and memory values to
prevent uncontrolled resource consumption. This aligns with the coding
guidelines requiring resource limits on every container.

Source: Coding guidelines


26-83: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Add explicit pod/container hardening securityContext.

The pod spec does not explicitly enforce non-root execution, no privilege escalation, read-only root FS, or dropped Linux capabilities for the container.

As per coding guidelines, "securityContext: runAsNonRoot, readOnlyRootFilesystem, allowPrivilegeEscalation: false" and "Drop ALL capabilities, add only what is required."

Suggested hardening patch
 spec:
   template:
     spec:
+      securityContext:
+        runAsNonRoot: true
       serviceAccountName: compatibility-requirements-operator
       containers:
       - name: compatibility-requirements-operator
+        securityContext:
+          allowPrivilegeEscalation: false
+          readOnlyRootFilesystem: true
+          capabilities:
+            drop:
+            - ALL
         image: registry.ci.openshift.org/openshift:cluster-capi-operator
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@manifests/0000_21_crd-compatibility-checker-operator_03_deployment.yaml`
around lines 26 - 83, The deployment manifest for the
compatibility-requirements-operator container lacks security hardening
configurations. Add a securityContext to the pod level (under spec) and another
at the container level (under the compatibility-requirements-operator container
definition) to enforce non-root execution and filesystem protections. Configure
runAsNonRoot set to true, readOnlyRootFilesystem set to true,
allowPrivilegeEscalation set to false, and drop all Linux capabilities (drop:
ALL), adding back only the specific capabilities required for the operator to
function if any are necessary after testing.

Source: Coding guidelines

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@manifests/0000_21_crd-compatibility-checker-operator_01_rbac-roles.yaml`:
- Around line 42-53: The namespaced Role grants unnecessary delete permissions
on deployments and leases resources that the controller does not actually use
(it only creates, updates, and patches deployments, and explicitly deletes PDBs,
not deployments or leases). Remove the delete verb from both the deployments
resource rules (in the apiGroups: apps section) and the leases resource rules
(also in the same Role) to follow the least privilege principle and reduce the
blast radius if the service account token is compromised.

In `@pkg/controllers/crdcompatibilityoperator/controller_test.go`:
- Around line 52-57: The envtest instance created in the BeforeAll hook (the
testEnv variable initialized with &envtest.Environment{}) is never stopped in
the AfterAll hook, causing control-plane processes to leak. Add a cleanup call
in the AfterAll block to properly stop the testEnv instance after the context
cancellation, ensuring all resources are released and preventing destabilization
of subsequent test specs.
- Around line 104-107: The Eventually call wrapping the cl.Get(testCtx,
deploymentKey, deployment) operation is using implicit default timeout and
interval values, which can lead to non-deterministic CI behavior. Add explicit
timeout and polling interval parameters to the Eventually call by chaining the
appropriate timeout and interval specifications (such as WithTimeout and
WithPollingInterval or equivalent gomega matchers) to ensure deterministic
behavior for this cluster-facing API interaction.

---

Outside diff comments:
In `@manifests/0000_21_crd-compatibility-checker-operator_03_deployment.yaml`:
- Around line 65-68: The operator container in the deployment manifest has
resource requests defined (cpu: 10m and memory: 50Mi) but is missing resource
limits. Add a limits section under the resources field alongside the existing
requests section, specifying appropriate maximum CPU and memory values to
prevent uncontrolled resource consumption. This aligns with the coding
guidelines requiring resource limits on every container.
- Around line 26-83: The deployment manifest for the
compatibility-requirements-operator container lacks security hardening
configurations. Add a securityContext to the pod level (under spec) and another
at the container level (under the compatibility-requirements-operator container
definition) to enforce non-root execution and filesystem protections. Configure
runAsNonRoot set to true, readOnlyRootFilesystem set to true,
allowPrivilegeEscalation set to false, and drop all Linux capabilities (drop:
ALL), adding back only the specific capabilities required for the operator to
function if any are necessary after testing.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository YAML (base), Central YAML (inherited)

Review profile: CHILL

Plan: Enterprise

Run ID: d381ea06-fa5e-4f51-b52e-021db66bd5b6

📥 Commits

Reviewing files that changed from the base of the PR and between 05c113e and 0fa1315.

📒 Files selected for processing (22)
  • Dockerfile.rhel
  • Makefile
  • README.md
  • cmd/crd-compatibility-checker-operator/README.md
  • cmd/crd-compatibility-checker-operator/main.go
  • cmd/crd-compatibility-checker/main.go
  • docs/controllers/crd-compatibility-checker-operator.md
  • manifests/0000_20_crd-compatibility-checker_01_service_account.yaml
  • manifests/0000_20_crd-compatibility-checker_02_rbac_roles.yaml
  • manifests/0000_20_crd-compatibility-checker_03_rbac_bindings.yaml
  • manifests/0000_20_crd-compatibility-checker_04_metrics-service.yaml
  • manifests/0000_20_crd-compatibility-checker_05_webhook-service.yaml
  • manifests/0000_20_crd-compatibility-checker_06_allow-egress-operators.yaml
  • manifests/0000_20_crd-compatibility-checker_07_allow-ingress-to-webhook.yaml
  • manifests/0000_21_crd-compatibility-checker-operator_00_service-account.yaml
  • manifests/0000_21_crd-compatibility-checker-operator_01_rbac-roles.yaml
  • manifests/0000_21_crd-compatibility-checker-operator_02_rbac-bindings.yaml
  • manifests/0000_21_crd-compatibility-checker-operator_03_deployment.yaml
  • pkg/controllers/common_consts.go
  • pkg/controllers/crdcompatibilityoperator/controller.go
  • pkg/controllers/crdcompatibilityoperator/controller_test.go
  • pkg/controllers/crdcompatibilityoperator/suite_test.go

Comment thread manifests/0000_21_crd-compatibility-checker-operator_01_rbac-roles.yaml Outdated
Comment thread pkg/controllers/crdcompatibilityoperator/controller_test.go Outdated
Comment thread pkg/controllers/crdcompatibilityoperator/controller_test.go Outdated
@simkam simkam force-pushed the crdchecker-controller branch 2 times, most recently from 50d7d00 to 3d46c93 Compare June 17, 2026 12:18
@simkam

simkam commented Jun 17, 2026

Copy link
Copy Markdown
Contributor Author

@coderabbitai review

@coderabbitai

coderabbitai Bot commented Jun 17, 2026

Copy link
Copy Markdown
✅ Action performed

Review finished.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@pkg/controllers/crdcompatibilityoperator/controller.go`:
- Around line 50-52: The releaseVersionEnv and defaultReleaseVersion constants
are defined but the code at lines 193-197 always injects the hardcoded
defaultReleaseVersion of "0.0.1-snapshot" instead of reading from the
RELEASE_VERSION environment variable at runtime. Fix this by updating the code
that injects the operand version metadata to first attempt to read the actual
release version from the RELEASE_VERSION environment variable, and only fall
back to the defaultReleaseVersion constant if that environment variable is not
set or is empty. This ensures the correct runtime version is used in diagnostics
and version reporting.
- Around line 263-273: The NodeSelector and Tolerations in the pod spec are
unconditionally hard-pinning pods to master nodes, which makes them
unschedulable on clusters without schedulable master nodes. Refactor this by
making the master node placement conditional based on cluster topology. Check
the topology configuration and only add the NodeSelector with
"node-role.kubernetes.io/master" and the corresponding Toleration for
corev1.TaintEffectNoSchedule when the topology supports it, rather than applying
these constraints unconditionally to all cluster configurations.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository YAML (base), Central YAML (inherited)

Review profile: CHILL

Plan: Enterprise

Run ID: f5693e19-a097-41a5-b91e-cbb5655a8ad2

📥 Commits

Reviewing files that changed from the base of the PR and between 0fa1315 and 3d46c93.

📒 Files selected for processing (22)
  • Dockerfile.rhel
  • Makefile
  • README.md
  • cmd/crd-compatibility-checker-operator/README.md
  • cmd/crd-compatibility-checker-operator/main.go
  • cmd/crd-compatibility-checker/main.go
  • docs/controllers/crd-compatibility-checker-operator.md
  • manifests/0000_20_crd-compatibility-checker_01_service_account.yaml
  • manifests/0000_20_crd-compatibility-checker_02_rbac_roles.yaml
  • manifests/0000_20_crd-compatibility-checker_03_rbac_bindings.yaml
  • manifests/0000_20_crd-compatibility-checker_04_metrics-service.yaml
  • manifests/0000_20_crd-compatibility-checker_05_webhook-service.yaml
  • manifests/0000_20_crd-compatibility-checker_06_allow-egress-operators.yaml
  • manifests/0000_20_crd-compatibility-checker_07_allow-ingress-to-webhook.yaml
  • manifests/0000_21_crd-compatibility-checker-operator_00_service-account.yaml
  • manifests/0000_21_crd-compatibility-checker-operator_01_rbac-roles.yaml
  • manifests/0000_21_crd-compatibility-checker-operator_02_rbac-bindings.yaml
  • manifests/0000_21_crd-compatibility-checker-operator_03_deployment.yaml
  • pkg/controllers/common_consts.go
  • pkg/controllers/crdcompatibilityoperator/controller.go
  • pkg/controllers/crdcompatibilityoperator/controller_test.go
  • pkg/controllers/crdcompatibilityoperator/suite_test.go
✅ Files skipped from review due to trivial changes (3)
  • README.md
  • cmd/crd-compatibility-checker-operator/README.md
  • docs/controllers/crd-compatibility-checker-operator.md
🚧 Files skipped from review as they are similar to previous changes (12)
  • Dockerfile.rhel
  • manifests/0000_21_crd-compatibility-checker-operator_00_service-account.yaml
  • manifests/0000_20_crd-compatibility-checker_06_allow-egress-operators.yaml
  • cmd/crd-compatibility-checker/main.go
  • pkg/controllers/common_consts.go
  • manifests/0000_21_crd-compatibility-checker-operator_01_rbac-roles.yaml
  • Makefile
  • pkg/controllers/crdcompatibilityoperator/suite_test.go
  • cmd/crd-compatibility-checker-operator/main.go
  • manifests/0000_21_crd-compatibility-checker-operator_02_rbac-bindings.yaml
  • manifests/0000_21_crd-compatibility-checker-operator_03_deployment.yaml
  • pkg/controllers/crdcompatibilityoperator/controller_test.go

Comment on lines +50 to +52
releaseVersionEnv = "RELEASE_VERSION"
defaultReleaseVersion = "0.0.1-snapshot"
)

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

RELEASE_VERSION is hard-coded, so operand version metadata is incorrect

Line 193 through Line 197 always injects "0.0.1-snapshot" and ignores runtime release version. This can skew diagnostics and version reporting.

Proposed fix
 import (
 	"context"
 	"fmt"
+	"os"
@@
 						Env: []corev1.EnvVar{
 							{
 								Name:  releaseVersionEnv,
-								Value: defaultReleaseVersion,
+								Value: func() string {
+									if v := os.Getenv(releaseVersionEnv); v != "" {
+										return v
+									}
+									return defaultReleaseVersion
+								}(),
 							},
 						},

Also applies to: 193-197

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@pkg/controllers/crdcompatibilityoperator/controller.go` around lines 50 - 52,
The releaseVersionEnv and defaultReleaseVersion constants are defined but the
code at lines 193-197 always injects the hardcoded defaultReleaseVersion of
"0.0.1-snapshot" instead of reading from the RELEASE_VERSION environment
variable at runtime. Fix this by updating the code that injects the operand
version metadata to first attempt to read the actual release version from the
RELEASE_VERSION environment variable, and only fall back to the
defaultReleaseVersion constant if that environment variable is not set or is
empty. This ensures the correct runtime version is used in diagnostics and
version reporting.

Comment on lines +263 to +273
NodeSelector: map[string]string{
"node-role.kubernetes.io/master": "",
},
PriorityClassName: "system-cluster-critical",
RestartPolicy: corev1.RestartPolicyAlways,
Tolerations: []corev1.Toleration{
{
Key: "node-role.kubernetes.io/master",
Operator: corev1.TolerationOpExists,
Effect: corev1.TaintEffectNoSchedule,
},

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | 🏗️ Heavy lift

Control-plane-only scheduling is topology-unaware and can strand the operand

Line 263 through Line 273 hard-pin pods to master nodes for every topology. On clusters where control-plane nodes are not schedulable/present, this makes the checker permanently unschedulable.

Suggested direction
-				NodeSelector: map[string]string{
-					"node-role.kubernetes.io/master": "",
-				},
 				PriorityClassName: "system-cluster-critical",
 				RestartPolicy:     corev1.RestartPolicyAlways,
-				Tolerations: []corev1.Toleration{
-					{
-						Key:      "node-role.kubernetes.io/master",
-						Operator: corev1.TolerationOpExists,
-						Effect:   corev1.TaintEffectNoSchedule,
-					},
-				},

If control-plane placement is required for specific modes, make it conditional on topology instead of unconditional.

As per coding guidelines, "**/*controller*.go: ... Flag topology-unaware replica counts, control-plane node selectors, arbiter node scheduling, and PDB settings not appropriate for all topologies."

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@pkg/controllers/crdcompatibilityoperator/controller.go` around lines 263 -
273, The NodeSelector and Tolerations in the pod spec are unconditionally
hard-pinning pods to master nodes, which makes them unschedulable on clusters
without schedulable master nodes. Refactor this by making the master node
placement conditional based on cluster topology. Check the topology
configuration and only add the NodeSelector with
"node-role.kubernetes.io/master" and the corresponding Toleration for
corev1.TaintEffectNoSchedule when the topology supports it, rather than applying
these constraints unconditionally to all cluster configurations.

Source: Coding guidelines

@simkam

simkam commented Jun 17, 2026

Copy link
Copy Markdown
Contributor Author

/pipeline auto

@openshift-merge-bot

Copy link
Copy Markdown
Contributor

Pipeline controller notification

The pipeline-auto label has been added to this PR. Second-stage tests will be triggered automatically when all first-stage tests pass.

@openshift-merge-bot

Copy link
Copy Markdown
Contributor

Scheduling tests matching the pipeline_run_if_changed or not excluded by pipeline_skip_if_only_changed parameters:
/test e2e-aws-capi-disconnected-techpreview
/test e2e-aws-capi-techpreview
/test e2e-aws-capi-techpreview-post-install
/test e2e-aws-ovn
/test e2e-aws-ovn-serial-1of2
/test e2e-aws-ovn-serial-2of2
/test e2e-aws-ovn-techpreview
/test e2e-aws-ovn-techpreview-upgrade
/test e2e-azure-capi-techpreview
/test e2e-azure-ovn-techpreview
/test e2e-azure-ovn-techpreview-upgrade
/test e2e-gcp-capi-techpreview
/test e2e-gcp-ovn-techpreview
/test e2e-metal3-capi-techpreview
/test e2e-openstack-capi-techpreview
/test e2e-openstack-ovn-techpreview
/test e2e-vsphere-capi-techpreview
/test regression-clusterinfra-aws-ipi-techpreview-capi

@simkam

simkam commented Jun 17, 2026

Copy link
Copy Markdown
Contributor Author

/pipeline required

@openshift-merge-bot

Copy link
Copy Markdown
Contributor

Scheduling tests matching the pipeline_run_if_changed or not excluded by pipeline_skip_if_only_changed parameters:
/test e2e-aws-capi-disconnected-techpreview
/test e2e-aws-capi-techpreview
/test e2e-aws-capi-techpreview-post-install
/test e2e-aws-ovn
/test e2e-aws-ovn-serial-1of2
/test e2e-aws-ovn-serial-2of2
/test e2e-aws-ovn-techpreview
/test e2e-aws-ovn-techpreview-upgrade
/test e2e-azure-capi-techpreview
/test e2e-azure-ovn-techpreview
/test e2e-azure-ovn-techpreview-upgrade
/test e2e-gcp-capi-techpreview
/test e2e-gcp-ovn-techpreview
/test e2e-metal3-capi-techpreview
/test e2e-openstack-capi-techpreview
/test e2e-openstack-ovn-techpreview
/test e2e-vsphere-capi-techpreview
/test regression-clusterinfra-aws-ipi-techpreview-capi

@openshift-merge-bot

Copy link
Copy Markdown
Contributor

Tests from second stage were triggered manually. Pipeline can be controlled only manually, until HEAD changes. Use command to trigger second stage.

@openshift-merge-bot

Copy link
Copy Markdown
Contributor

Scheduling tests matching the pipeline_run_if_changed or not excluded by pipeline_skip_if_only_changed parameters:
/test e2e-aws-capi-disconnected-techpreview
/test e2e-aws-capi-techpreview
/test e2e-aws-capi-techpreview-post-install
/test e2e-aws-ovn
/test e2e-aws-ovn-serial-1of2
/test e2e-aws-ovn-serial-2of2
/test e2e-aws-ovn-techpreview
/test e2e-aws-ovn-techpreview-upgrade
/test e2e-azure-capi-techpreview
/test e2e-azure-ovn-techpreview
/test e2e-azure-ovn-techpreview-upgrade
/test e2e-gcp-capi-techpreview
/test e2e-gcp-ovn-techpreview
/test e2e-metal3-capi-techpreview
/test e2e-openstack-capi-techpreview
/test e2e-openstack-ovn-techpreview
/test e2e-vsphere-capi-techpreview
/test regression-clusterinfra-aws-ipi-techpreview-capi

@simkam simkam force-pushed the crdchecker-controller branch from 3d46c93 to 1ac3753 Compare June 17, 2026 19:31

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
manifests/0000_21_crd-compatibility-checker-operator_04_deployment.yaml (1)

65-68: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Add resource limits for CPU and memory.

The container defines resource requests but no limits. While OpenShift may have LimitRanges that apply defaults, explicit limits ensure predictable resource consumption and are required by coding guidelines.

Suggested fix
         resources:
           requests:
             cpu: 10m
             memory: 50Mi
+          limits:
+            cpu: 100m
+            memory: 128Mi
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@manifests/0000_21_crd-compatibility-checker-operator_04_deployment.yaml`
around lines 65 - 68, The resources section in the deployment is missing limits
for CPU and memory, only defining requests. Add a limits subsection alongside
the existing requests section that specifies appropriate CPU and memory limits.
These limits should be set to reasonable values that prevent the container from
consuming excessive resources while still allowing it to function properly. Both
cpu and memory limits should be defined to ensure predictable resource
consumption and compliance with organizational guidelines.

Source: Coding guidelines

🧹 Nitpick comments (1)
manifests/0000_21_crd-compatibility-checker-operator_04_deployment.yaml (1)

26-73: 💤 Low value

Consider adding explicit securityContext for defense-in-depth.

Static analysis flags missing securityContext. While the openshift.io/required-scc: restricted-v3 annotation ensures OpenShift enforces security constraints at admission, explicit settings provide documentation value and satisfy security scanners that lack OpenShift-awareness.

Optional securityContext addition
       containers:
       - name: compatibility-requirements-operator
         image: registry.ci.openshift.org/openshift:cluster-capi-operator
+        securityContext:
+          allowPrivilegeEscalation: false
+          capabilities:
+            drop:
+            - ALL
+          runAsNonRoot: true
+          seccompProfile:
+            type: RuntimeDefault
         command:
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@manifests/0000_21_crd-compatibility-checker-operator_04_deployment.yaml`
around lines 26 - 73, The container specification for the
compatibility-requirements-operator is missing an explicit securityContext
block. Add a securityContext section to the container spec (after the image
field and before or after the command/args section) that includes restricted
security settings such as runAsNonRoot: true, allowPrivilegeEscalation: false,
readOnlyRootFilesystem: true, and appropriate capabilities drop/add rules that
align with the restricted-v3 SCC referenced in the pod annotations. This
provides defense-in-depth security documentation and helps security scanners
that are not OpenShift-aware validate the deployment configuration.

Source: Linters/SAST tools

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@pkg/controllers/crdcompatibilityoperator/controller.go`:
- Line 154: The string literal "k8s-app" is repeated multiple times throughout
the file at lines 154, 161, 178, and 335, which violates the goconst linter
rule. Create a package-level or appropriate-scope constant to represent this
label key (for example, something like `const kubernetesAppLabel = "k8s-app"`)
and replace all four occurrences of the string literal "k8s-app" with references
to this new constant. This will ensure label consistency across the codebase and
satisfy the linter.

---

Outside diff comments:
In `@manifests/0000_21_crd-compatibility-checker-operator_04_deployment.yaml`:
- Around line 65-68: The resources section in the deployment is missing limits
for CPU and memory, only defining requests. Add a limits subsection alongside
the existing requests section that specifies appropriate CPU and memory limits.
These limits should be set to reasonable values that prevent the container from
consuming excessive resources while still allowing it to function properly. Both
cpu and memory limits should be defined to ensure predictable resource
consumption and compliance with organizational guidelines.

---

Nitpick comments:
In `@manifests/0000_21_crd-compatibility-checker-operator_04_deployment.yaml`:
- Around line 26-73: The container specification for the
compatibility-requirements-operator is missing an explicit securityContext
block. Add a securityContext section to the container spec (after the image
field and before or after the command/args section) that includes restricted
security settings such as runAsNonRoot: true, allowPrivilegeEscalation: false,
readOnlyRootFilesystem: true, and appropriate capabilities drop/add rules that
align with the restricted-v3 SCC referenced in the pod annotations. This
provides defense-in-depth security documentation and helps security scanners
that are not OpenShift-aware validate the deployment configuration.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository YAML (base), Central YAML (inherited)

Review profile: CHILL

Plan: Enterprise

Run ID: 0cde779d-a322-4a2d-bd85-f8b2d2f9f30e

📥 Commits

Reviewing files that changed from the base of the PR and between 3d46c93 and 1ac3753.

📒 Files selected for processing (23)
  • Dockerfile.rhel
  • Makefile
  • README.md
  • cmd/crd-compatibility-checker-operator/README.md
  • cmd/crd-compatibility-checker-operator/main.go
  • cmd/crd-compatibility-checker/main.go
  • docs/controllers/crd-compatibility-checker-operator.md
  • manifests/0000_20_crd-compatibility-checker_01_service_account.yaml
  • manifests/0000_20_crd-compatibility-checker_02_rbac_roles.yaml
  • manifests/0000_20_crd-compatibility-checker_03_rbac_bindings.yaml
  • manifests/0000_20_crd-compatibility-checker_04_metrics-service.yaml
  • manifests/0000_20_crd-compatibility-checker_05_webhook-service.yaml
  • manifests/0000_20_crd-compatibility-checker_06_allow-egress-operators.yaml
  • manifests/0000_20_crd-compatibility-checker_07_allow-ingress-to-webhook.yaml
  • manifests/0000_21_crd-compatibility-checker-operator_00_service-account.yaml
  • manifests/0000_21_crd-compatibility-checker-operator_01_rbac-roles.yaml
  • manifests/0000_21_crd-compatibility-checker-operator_02_rbac-bindings.yaml
  • manifests/0000_21_crd-compatibility-checker-operator_03_metrics-service.yaml
  • manifests/0000_21_crd-compatibility-checker-operator_04_deployment.yaml
  • pkg/controllers/common_consts.go
  • pkg/controllers/crdcompatibilityoperator/controller.go
  • pkg/controllers/crdcompatibilityoperator/controller_test.go
  • pkg/controllers/crdcompatibilityoperator/suite_test.go
✅ Files skipped from review due to trivial changes (5)
  • docs/controllers/crd-compatibility-checker-operator.md
  • cmd/crd-compatibility-checker-operator/README.md
  • README.md
  • pkg/controllers/common_consts.go
  • manifests/0000_20_crd-compatibility-checker_06_allow-egress-operators.yaml
🚧 Files skipped from review as they are similar to previous changes (8)
  • manifests/0000_21_crd-compatibility-checker-operator_00_service-account.yaml
  • cmd/crd-compatibility-checker/main.go
  • pkg/controllers/crdcompatibilityoperator/suite_test.go
  • Makefile
  • manifests/0000_21_crd-compatibility-checker-operator_02_rbac-bindings.yaml
  • Dockerfile.rhel
  • cmd/crd-compatibility-checker-operator/main.go
  • pkg/controllers/crdcompatibilityoperator/controller_test.go

deployment.Spec.Replicas = &replicas
deployment.Spec.Selector = &metav1.LabelSelector{
MatchLabels: map[string]string{
"k8s-app": operandLabel,

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Extract repeated k8s-app key into a constant.

k8s-app is repeated multiple times and is currently flagged by goconst; pull it into a shared constant to keep labels consistent and satisfy lint.

Suggested diff
 const (
 	controllerName            = "CRDCompatibilityOperatorController"
+	k8sAppLabelKey            = "k8s-app"
 	operandDeploymentName     = "compatibility-requirements-controllers"
@@
 		deployment.Spec.Selector = &metav1.LabelSelector{
 			MatchLabels: map[string]string{
-				"k8s-app": operandLabel,
+				k8sAppLabelKey: operandLabel,
 			},
 		}
@@
 				Labels: map[string]string{
-					"k8s-app": operandLabel,
+					k8sAppLabelKey: operandLabel,
 				},
@@
 										MatchLabels: map[string]string{
-											"k8s-app": operandLabel,
+											k8sAppLabelKey: operandLabel,
 										},
@@
 		pdb.Spec.Selector = &metav1.LabelSelector{
 			MatchLabels: map[string]string{
-				"k8s-app": operandLabel,
+				k8sAppLabelKey: operandLabel,
 			},
 		}

Also applies to: 161-161, 178-178, 335-335

🧰 Tools
🪛 golangci-lint (2.12.2)

[error] 154-154: string k8s-app has 4 occurrences, make it a constant

(goconst)

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@pkg/controllers/crdcompatibilityoperator/controller.go` at line 154, The
string literal "k8s-app" is repeated multiple times throughout the file at lines
154, 161, 178, and 335, which violates the goconst linter rule. Create a
package-level or appropriate-scope constant to represent this label key (for
example, something like `const kubernetesAppLabel = "k8s-app"`) and replace all
four occurrences of the string literal "k8s-app" with references to this new
constant. This will ensure label consistency across the codebase and satisfy the
linter.

Source: Linters/SAST tools

@simkam simkam force-pushed the crdchecker-controller branch from 1ac3753 to 6b367dd Compare June 17, 2026 19:46
@openshift-merge-bot

Copy link
Copy Markdown
Contributor

Scheduling tests matching the pipeline_run_if_changed or not excluded by pipeline_skip_if_only_changed parameters:
/test e2e-aws-capi-disconnected-techpreview
/test e2e-aws-capi-techpreview
/test e2e-aws-capi-techpreview-post-install
/test e2e-aws-ovn
/test e2e-aws-ovn-serial-1of2
/test e2e-aws-ovn-serial-2of2
/test e2e-aws-ovn-techpreview
/test e2e-aws-ovn-techpreview-upgrade
/test e2e-azure-capi-techpreview
/test e2e-azure-ovn-techpreview
/test e2e-azure-ovn-techpreview-upgrade
/test e2e-gcp-capi-techpreview
/test e2e-gcp-ovn-techpreview
/test e2e-metal3-capi-techpreview
/test e2e-openstack-capi-techpreview
/test e2e-openstack-ovn-techpreview
/test e2e-vsphere-capi-techpreview
/test regression-clusterinfra-aws-ipi-techpreview-capi

@simkam

simkam commented Jun 18, 2026

Copy link
Copy Markdown
Contributor Author

/hold WIP
/pipeline manual

@openshift-ci openshift-ci Bot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Jun 18, 2026
@simkam simkam force-pushed the crdchecker-controller branch from 6b367dd to 04b5244 Compare June 18, 2026 11:27

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@manifests/0000_21_crd-compatibility-checker-operator_04_deployment.yaml`:
- Around line 27-39: Add explicit securityContext hardening to the
compatibility-requirements-operator deployment manifest. At the pod level (under
the spec.template.spec section with serviceAccountName), add a securityContext
with runAsNonRoot set to true and readOnlyRootFilesystem set to true.
Additionally, at the container level (under the containers section for the
compatibility-requirements-operator container), add a securityContext with
allowPrivilegeEscalation set to false, readOnlyRootFilesystem set to true, and
capabilities with a drop list containing ALL, followed by an add list with only
the specific capabilities required for the operator to function (if any are
needed).
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository YAML (base), Central YAML (inherited)

Review profile: CHILL

Plan: Enterprise

Run ID: c9ddb148-60b9-4aea-83c5-17103e5d7559

📥 Commits

Reviewing files that changed from the base of the PR and between 1ac3753 and 04b5244.

📒 Files selected for processing (23)
  • Dockerfile.rhel
  • Makefile
  • README.md
  • cmd/crd-compatibility-checker-operator/README.md
  • cmd/crd-compatibility-checker-operator/main.go
  • cmd/crd-compatibility-checker/main.go
  • docs/controllers/crd-compatibility-checker-operator.md
  • manifests/0000_20_crd-compatibility-checker_01_service_account.yaml
  • manifests/0000_20_crd-compatibility-checker_02_rbac_roles.yaml
  • manifests/0000_20_crd-compatibility-checker_03_rbac_bindings.yaml
  • manifests/0000_20_crd-compatibility-checker_04_metrics-service.yaml
  • manifests/0000_20_crd-compatibility-checker_05_webhook-service.yaml
  • manifests/0000_20_crd-compatibility-checker_06_allow-egress-operators.yaml
  • manifests/0000_20_crd-compatibility-checker_07_allow-ingress-to-webhook.yaml
  • manifests/0000_21_crd-compatibility-checker-operator_00_service-account.yaml
  • manifests/0000_21_crd-compatibility-checker-operator_01_rbac-roles.yaml
  • manifests/0000_21_crd-compatibility-checker-operator_02_rbac-bindings.yaml
  • manifests/0000_21_crd-compatibility-checker-operator_03_metrics-service.yaml
  • manifests/0000_21_crd-compatibility-checker-operator_04_deployment.yaml
  • pkg/controllers/common_consts.go
  • pkg/controllers/crdcompatibilityoperator/controller.go
  • pkg/controllers/crdcompatibilityoperator/controller_test.go
  • pkg/controllers/crdcompatibilityoperator/suite_test.go
✅ Files skipped from review due to trivial changes (3)
  • docs/controllers/crd-compatibility-checker-operator.md
  • cmd/crd-compatibility-checker-operator/README.md
  • README.md
🚧 Files skipped from review as they are similar to previous changes (12)
  • pkg/controllers/common_consts.go
  • Dockerfile.rhel
  • manifests/0000_20_crd-compatibility-checker_06_allow-egress-operators.yaml
  • manifests/0000_21_crd-compatibility-checker-operator_03_metrics-service.yaml
  • Makefile
  • manifests/0000_21_crd-compatibility-checker-operator_00_service-account.yaml
  • manifests/0000_21_crd-compatibility-checker-operator_02_rbac-bindings.yaml
  • cmd/crd-compatibility-checker-operator/main.go
  • cmd/crd-compatibility-checker/main.go
  • manifests/0000_21_crd-compatibility-checker-operator_01_rbac-roles.yaml
  • pkg/controllers/crdcompatibilityoperator/suite_test.go
  • pkg/controllers/crdcompatibilityoperator/controller_test.go

Comment on lines +27 to +39
serviceAccountName: compatibility-requirements-operator
containers:
- name: compatibility-requirements-controllers
- name: compatibility-requirements-operator
image: registry.ci.openshift.org/openshift:cluster-capi-operator
command:
- ./crd-compatibility-checker
- ./crd-compatibility-checker-operator
args:
- --diagnostics-address=:8443
env:
- name: RELEASE_VERSION
value: "0.0.1-snapshot"
- name: COMPATIBILITY_REQUIREMENTS_CONTROLLERS_IMAGE
value: registry.ci.openshift.org/openshift:cluster-capi-operator

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Add explicit pod/container securityContext hardening.

Line 27 onward defines the operator pod without explicit hardening controls (runAsNonRoot, allowPrivilegeEscalation: false, capability drop, readOnlyRootFilesystem). This currently fails the security checks surfaced for this manifest.

As per coding guidelines, "If this is a Kubernetes/OpenShift manifest ... securityContext: runAsNonRoot, readOnlyRootFilesystem, allowPrivilegeEscalation: false ... Drop ALL capabilities, add only what is required."

Proposed patch
 spec:
   selector:
@@
   template:
@@
     spec:
+      securityContext:
+        runAsNonRoot: true
       serviceAccountName: compatibility-requirements-operator
       containers:
       - name: compatibility-requirements-operator
         image: registry.ci.openshift.org/openshift:cluster-capi-operator
+        securityContext:
+          allowPrivilegeEscalation: false
+          readOnlyRootFilesystem: true
+          capabilities:
+            drop:
+            - ALL
         command:
         - ./crd-compatibility-checker-operator
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@manifests/0000_21_crd-compatibility-checker-operator_04_deployment.yaml`
around lines 27 - 39, Add explicit securityContext hardening to the
compatibility-requirements-operator deployment manifest. At the pod level (under
the spec.template.spec section with serviceAccountName), add a securityContext
with runAsNonRoot set to true and readOnlyRootFilesystem set to true.
Additionally, at the container level (under the containers section for the
compatibility-requirements-operator container), add a securityContext with
allowPrivilegeEscalation set to false, readOnlyRootFilesystem set to true, and
capabilities with a drop list containing ALL, followed by an add list with only
the specific capabilities required for the operator to function (if any are
needed).

Sources: Coding guidelines, Linters/SAST tools

@openshift-merge-bot

Copy link
Copy Markdown
Contributor

Scheduling tests matching the pipeline_run_if_changed or not excluded by pipeline_skip_if_only_changed parameters:
/test e2e-aws-capi-disconnected-techpreview
/test e2e-aws-capi-techpreview
/test e2e-aws-capi-techpreview-post-install
/test e2e-aws-ovn
/test e2e-aws-ovn-serial-1of2
/test e2e-aws-ovn-serial-2of2
/test e2e-aws-ovn-techpreview
/test e2e-aws-ovn-techpreview-upgrade
/test e2e-azure-capi-techpreview
/test e2e-azure-ovn-techpreview
/test e2e-azure-ovn-techpreview-upgrade
/test e2e-gcp-capi-techpreview
/test e2e-gcp-ovn-techpreview
/test e2e-metal3-capi-techpreview
/test e2e-openstack-capi-techpreview
/test e2e-openstack-ovn-techpreview
/test e2e-vsphere-capi-techpreview
/test regression-clusterinfra-aws-ipi-techpreview-capi

@openshift-ci openshift-ci Bot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Jun 18, 2026
https://redhat.atlassian.net/browse/OCPCLOUD-3539

Implement OCPCLOUD-3539 by converting the CRD Compatibility Checker from
a static CVO-deployed Deployment to a Manager/Operand pattern. This fixes
SNO conformance by making the operand topology-aware.

The new crd-compatibility-checker-operator:
- Watches Infrastructure.status.controlPlaneTopology
- Configures operand with 1 replica (no PDB) for SingleReplica topology
- Configures operand with 2 replicas + PDB for HighlyAvailable topologies
- Corrects drift when operand is manually modified
- Uses separate ServiceAccounts (operator vs operand) for least privilege

Changes:
- New controller in pkg/controllers/crdcompatibilityoperator/
- New binary cmd/crd-compatibility-checker-operator/
- Integration tests covering topology transitions and drift correction
- Operator manifests with RBAC for managing Deployments and PDBs
- Updated build pipeline (Makefile, Dockerfile.rhel)
- Renumbered CRD checker manifests to eliminate gaps

The operand Deployment is now managed dynamically by the operator instead
of statically by CVO, allowing proper HA configuration without breaking SNO.
@simkam simkam force-pushed the crdchecker-controller branch from 04b5244 to fe45557 Compare June 22, 2026 11:56
@openshift-ci openshift-ci Bot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Jun 22, 2026

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
manifests/0000_21_crd-compatibility-checker-operator_04_deployment.yaml (1)

71-74: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Add CPU/memory limits to the operator container.

Line 71 defines requests only; without limits this pod can consume unbounded resources under pressure.

As per coding guidelines, "Resource limits (cpu, memory) on every container".

Proposed patch
         resources:
           requests:
             cpu: 10m
             memory: 50Mi
+          limits:
+            cpu: 100m
+            memory: 200Mi
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@manifests/0000_21_crd-compatibility-checker-operator_04_deployment.yaml`
around lines 71 - 74, The operator container in the
crd-compatibility-checker-operator deployment defines resource requests (cpu:
10m, memory: 50Mi) but lacks resource limits, allowing unbounded consumption.
Add a limits section under the resources field at the same indentation level as
requests, specifying appropriate CPU and memory limits for the container to
prevent it from consuming excessive resources under pressure.

Source: Coding guidelines

🧹 Nitpick comments (1)
pkg/controllers/crdcompatibilityoperator/controller.go (1)

243-248: 🧹 Nitpick | 🔵 Trivial | ⚡ Quick win

Consider adding resource limits.

Only Requests are defined; Limits are absent. Without limits, a misbehaving container could consume unbounded CPU/memory, potentially affecting node stability.

Suggested fix
 						Resources: corev1.ResourceRequirements{
 							Requests: corev1.ResourceList{
 								corev1.ResourceCPU:    resource.MustParse("10m"),
 								corev1.ResourceMemory: resource.MustParse("50Mi"),
 							},
+							Limits: corev1.ResourceList{
+								corev1.ResourceCPU:    resource.MustParse("100m"),
+								corev1.ResourceMemory: resource.MustParse("128Mi"),
+							},
 						},
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@pkg/controllers/crdcompatibilityoperator/controller.go` around lines 243 -
248, The ResourceRequirements struct in the controller definition only specifies
resource requests without defining resource limits. Add a Limits field to the
ResourceRequirements struct alongside the existing Requests field, setting
appropriate CPU and memory limits (typically the same as or slightly higher than
the request values) to prevent containers from consuming unbounded resources and
destabilizing the node.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Outside diff comments:
In `@manifests/0000_21_crd-compatibility-checker-operator_04_deployment.yaml`:
- Around line 71-74: The operator container in the
crd-compatibility-checker-operator deployment defines resource requests (cpu:
10m, memory: 50Mi) but lacks resource limits, allowing unbounded consumption.
Add a limits section under the resources field at the same indentation level as
requests, specifying appropriate CPU and memory limits for the container to
prevent it from consuming excessive resources under pressure.

---

Nitpick comments:
In `@pkg/controllers/crdcompatibilityoperator/controller.go`:
- Around line 243-248: The ResourceRequirements struct in the controller
definition only specifies resource requests without defining resource limits.
Add a Limits field to the ResourceRequirements struct alongside the existing
Requests field, setting appropriate CPU and memory limits (typically the same as
or slightly higher than the request values) to prevent containers from consuming
unbounded resources and destabilizing the node.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository YAML (base), Central YAML (inherited)

Review profile: CHILL

Plan: Enterprise

Run ID: 33aa2b0e-b085-4fad-bd36-6777688b9de7

📥 Commits

Reviewing files that changed from the base of the PR and between 04b5244 and fe45557.

📒 Files selected for processing (23)
  • Dockerfile.rhel
  • Makefile
  • README.md
  • cmd/crd-compatibility-checker-operator/README.md
  • cmd/crd-compatibility-checker-operator/main.go
  • cmd/crd-compatibility-checker/main.go
  • docs/controllers/crd-compatibility-checker-operator.md
  • manifests/0000_20_crd-compatibility-checker_01_service_account.yaml
  • manifests/0000_20_crd-compatibility-checker_02_rbac_roles.yaml
  • manifests/0000_20_crd-compatibility-checker_03_rbac_bindings.yaml
  • manifests/0000_20_crd-compatibility-checker_04_metrics-service.yaml
  • manifests/0000_20_crd-compatibility-checker_05_webhook-service.yaml
  • manifests/0000_20_crd-compatibility-checker_06_allow-egress-operators.yaml
  • manifests/0000_20_crd-compatibility-checker_07_allow-ingress-to-webhook.yaml
  • manifests/0000_21_crd-compatibility-checker-operator_00_service-account.yaml
  • manifests/0000_21_crd-compatibility-checker-operator_01_rbac-roles.yaml
  • manifests/0000_21_crd-compatibility-checker-operator_02_rbac-bindings.yaml
  • manifests/0000_21_crd-compatibility-checker-operator_03_metrics-service.yaml
  • manifests/0000_21_crd-compatibility-checker-operator_04_deployment.yaml
  • pkg/controllers/common_consts.go
  • pkg/controllers/crdcompatibilityoperator/controller.go
  • pkg/controllers/crdcompatibilityoperator/controller_test.go
  • pkg/controllers/crdcompatibilityoperator/suite_test.go
✅ Files skipped from review due to trivial changes (5)
  • pkg/controllers/common_consts.go
  • README.md
  • cmd/crd-compatibility-checker-operator/README.md
  • docs/controllers/crd-compatibility-checker-operator.md
  • manifests/0000_21_crd-compatibility-checker-operator_00_service-account.yaml
🚧 Files skipped from review as they are similar to previous changes (9)
  • Dockerfile.rhel
  • manifests/0000_21_crd-compatibility-checker-operator_03_metrics-service.yaml
  • cmd/crd-compatibility-checker/main.go
  • manifests/0000_21_crd-compatibility-checker-operator_02_rbac-bindings.yaml
  • Makefile
  • manifests/0000_21_crd-compatibility-checker-operator_01_rbac-roles.yaml
  • manifests/0000_20_crd-compatibility-checker_06_allow-egress-operators.yaml
  • pkg/controllers/crdcompatibilityoperator/suite_test.go
  • pkg/controllers/crdcompatibilityoperator/controller_test.go

@openshift-merge-bot

Copy link
Copy Markdown
Contributor

Scheduling tests matching the pipeline_run_if_changed or not excluded by pipeline_skip_if_only_changed parameters:
/test e2e-aws-ovn
/test e2e-aws-ovn-serial-1of2
/test e2e-aws-ovn-serial-2of2
/test e2e-aws-capi-disconnected-techpreview
/test e2e-aws-capi-techpreview
/test e2e-aws-capi-techpreview-post-install
/test e2e-aws-ovn-techpreview
/test e2e-aws-ovn-techpreview-upgrade
/test e2e-azure-capi-techpreview
/test e2e-azure-ovn-techpreview
/test e2e-azure-ovn-techpreview-upgrade
/test e2e-gcp-capi-techpreview
/test e2e-gcp-ovn-techpreview
/test e2e-metal3-capi-techpreview
/test e2e-openstack-capi-techpreview
/test e2e-vsphere-capi-techpreview
/test regression-clusterinfra-aws-ipi-techpreview-capi

@simkam

simkam commented Jun 22, 2026

Copy link
Copy Markdown
Contributor Author

/hold cancel

/test e2e-gcp-ovn-techpreview
/test e2e-aws-ovn-techpreview

@openshift-ci openshift-ci Bot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Jun 22, 2026
@openshift-ci

openshift-ci Bot commented Jun 22, 2026

Copy link
Copy Markdown
Contributor

@simkam: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/prow/e2e-openstack-ovn-techpreview 04b5244 link false /test e2e-openstack-ovn-techpreview
ci/prow/e2e-aws-capi-disconnected-techpreview fe45557 link false /test e2e-aws-capi-disconnected-techpreview
ci/prow/e2e-gcp-ovn-techpreview fe45557 link true /test e2e-gcp-ovn-techpreview

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

@simkam

simkam commented Jul 1, 2026

Copy link
Copy Markdown
Contributor Author

/test e2e-aws-capi-disconnected-techpreview
/test e2e-gcp-ovn-techpreview

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. pipeline-auto

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants