OCPCLOUD-3539: add topology-aware operator for CRD Compatibility Checker#600
OCPCLOUD-3539: add topology-aware operator for CRD Compatibility Checker#600simkam wants to merge 1 commit into
Conversation
|
Pipeline controller notification For optional jobs, comment This repository is configured in: LGTM mode |
|
@simkam: This pull request references OCPCLOUD-3539 which is a valid jira issue. Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "5.0.0" version, but no target version was set. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
WalkthroughThis PR introduces a new ChangesCRD Compatibility Checker Operator
Sequence Diagram(s)sequenceDiagram
participant Manager as controller-runtime Manager
participant Controller as CRDCompatibilityOperatorController
participant APIServer as Kubernetes API Server
participant Infrastructure as Infrastructure CR
participant Deployment as Operand Deployment
participant PDB as PodDisruptionBudget
Manager->>Controller: Reconcile(req)
Controller->>APIServer: Get Infrastructure
APIServer-->>Controller: ControlPlaneTopology status
alt SingleReplica
Controller->>APIServer: Patch Deployment replicas=1
APIServer-->>Deployment: updated
Controller->>APIServer: Delete PDB (ignore NotFound)
else HighlyAvailable / other
Controller->>APIServer: Patch Deployment replicas=2
APIServer-->>Deployment: updated
Controller->>APIServer: CreateOrPatch PDB minAvailable=1
APIServer-->>PDB: created/updated
end
Controller-->>Manager: Result{}
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes 🚥 Pre-merge checks | ✅ 11 | ❌ 4❌ Failed checks (4 warnings)
✅ Passed checks (11 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: The full list of commands accepted by this bot can be found here. DetailsNeeds approval from an approver in each of these files:Approvers can indicate their approval by writing |
There was a problem hiding this comment.
Actionable comments posted: 3
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (2)
manifests/0000_21_crd-compatibility-checker-operator_03_deployment.yaml (2)
65-68:⚠️ Potential issue | 🟠 Major | ⚡ Quick winSet CPU/memory limits for the operator container.
Only
requestsare defined. Missing limits can lead to uncontrolled resource growth under pressure.As per coding guidelines, "Resource limits (cpu, memory) on every container."
Suggested limits patch
resources: requests: cpu: 10m memory: 50Mi + limits: + cpu: 100m + memory: 200Mi🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@manifests/0000_21_crd-compatibility-checker-operator_03_deployment.yaml` around lines 65 - 68, The operator container in the deployment manifest has resource requests defined (cpu: 10m and memory: 50Mi) but is missing resource limits. Add a limits section under the resources field alongside the existing requests section, specifying appropriate maximum CPU and memory values to prevent uncontrolled resource consumption. This aligns with the coding guidelines requiring resource limits on every container.Source: Coding guidelines
26-83:⚠️ Potential issue | 🟠 Major | ⚡ Quick winAdd explicit pod/container hardening securityContext.
The pod spec does not explicitly enforce non-root execution, no privilege escalation, read-only root FS, or dropped Linux capabilities for the container.
As per coding guidelines, "securityContext: runAsNonRoot, readOnlyRootFilesystem, allowPrivilegeEscalation: false" and "Drop ALL capabilities, add only what is required."
Suggested hardening patch
spec: template: spec: + securityContext: + runAsNonRoot: true serviceAccountName: compatibility-requirements-operator containers: - name: compatibility-requirements-operator + securityContext: + allowPrivilegeEscalation: false + readOnlyRootFilesystem: true + capabilities: + drop: + - ALL image: registry.ci.openshift.org/openshift:cluster-capi-operator🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@manifests/0000_21_crd-compatibility-checker-operator_03_deployment.yaml` around lines 26 - 83, The deployment manifest for the compatibility-requirements-operator container lacks security hardening configurations. Add a securityContext to the pod level (under spec) and another at the container level (under the compatibility-requirements-operator container definition) to enforce non-root execution and filesystem protections. Configure runAsNonRoot set to true, readOnlyRootFilesystem set to true, allowPrivilegeEscalation set to false, and drop all Linux capabilities (drop: ALL), adding back only the specific capabilities required for the operator to function if any are necessary after testing.Source: Coding guidelines
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@manifests/0000_21_crd-compatibility-checker-operator_01_rbac-roles.yaml`:
- Around line 42-53: The namespaced Role grants unnecessary delete permissions
on deployments and leases resources that the controller does not actually use
(it only creates, updates, and patches deployments, and explicitly deletes PDBs,
not deployments or leases). Remove the delete verb from both the deployments
resource rules (in the apiGroups: apps section) and the leases resource rules
(also in the same Role) to follow the least privilege principle and reduce the
blast radius if the service account token is compromised.
In `@pkg/controllers/crdcompatibilityoperator/controller_test.go`:
- Around line 52-57: The envtest instance created in the BeforeAll hook (the
testEnv variable initialized with &envtest.Environment{}) is never stopped in
the AfterAll hook, causing control-plane processes to leak. Add a cleanup call
in the AfterAll block to properly stop the testEnv instance after the context
cancellation, ensuring all resources are released and preventing destabilization
of subsequent test specs.
- Around line 104-107: The Eventually call wrapping the cl.Get(testCtx,
deploymentKey, deployment) operation is using implicit default timeout and
interval values, which can lead to non-deterministic CI behavior. Add explicit
timeout and polling interval parameters to the Eventually call by chaining the
appropriate timeout and interval specifications (such as WithTimeout and
WithPollingInterval or equivalent gomega matchers) to ensure deterministic
behavior for this cluster-facing API interaction.
---
Outside diff comments:
In `@manifests/0000_21_crd-compatibility-checker-operator_03_deployment.yaml`:
- Around line 65-68: The operator container in the deployment manifest has
resource requests defined (cpu: 10m and memory: 50Mi) but is missing resource
limits. Add a limits section under the resources field alongside the existing
requests section, specifying appropriate maximum CPU and memory values to
prevent uncontrolled resource consumption. This aligns with the coding
guidelines requiring resource limits on every container.
- Around line 26-83: The deployment manifest for the
compatibility-requirements-operator container lacks security hardening
configurations. Add a securityContext to the pod level (under spec) and another
at the container level (under the compatibility-requirements-operator container
definition) to enforce non-root execution and filesystem protections. Configure
runAsNonRoot set to true, readOnlyRootFilesystem set to true,
allowPrivilegeEscalation set to false, and drop all Linux capabilities (drop:
ALL), adding back only the specific capabilities required for the operator to
function if any are necessary after testing.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository YAML (base), Central YAML (inherited)
Review profile: CHILL
Plan: Enterprise
Run ID: d381ea06-fa5e-4f51-b52e-021db66bd5b6
📒 Files selected for processing (22)
Dockerfile.rhelMakefileREADME.mdcmd/crd-compatibility-checker-operator/README.mdcmd/crd-compatibility-checker-operator/main.gocmd/crd-compatibility-checker/main.godocs/controllers/crd-compatibility-checker-operator.mdmanifests/0000_20_crd-compatibility-checker_01_service_account.yamlmanifests/0000_20_crd-compatibility-checker_02_rbac_roles.yamlmanifests/0000_20_crd-compatibility-checker_03_rbac_bindings.yamlmanifests/0000_20_crd-compatibility-checker_04_metrics-service.yamlmanifests/0000_20_crd-compatibility-checker_05_webhook-service.yamlmanifests/0000_20_crd-compatibility-checker_06_allow-egress-operators.yamlmanifests/0000_20_crd-compatibility-checker_07_allow-ingress-to-webhook.yamlmanifests/0000_21_crd-compatibility-checker-operator_00_service-account.yamlmanifests/0000_21_crd-compatibility-checker-operator_01_rbac-roles.yamlmanifests/0000_21_crd-compatibility-checker-operator_02_rbac-bindings.yamlmanifests/0000_21_crd-compatibility-checker-operator_03_deployment.yamlpkg/controllers/common_consts.gopkg/controllers/crdcompatibilityoperator/controller.gopkg/controllers/crdcompatibilityoperator/controller_test.gopkg/controllers/crdcompatibilityoperator/suite_test.go
50d7d00 to
3d46c93
Compare
|
@coderabbitai review |
✅ Action performedReview finished.
|
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@pkg/controllers/crdcompatibilityoperator/controller.go`:
- Around line 50-52: The releaseVersionEnv and defaultReleaseVersion constants
are defined but the code at lines 193-197 always injects the hardcoded
defaultReleaseVersion of "0.0.1-snapshot" instead of reading from the
RELEASE_VERSION environment variable at runtime. Fix this by updating the code
that injects the operand version metadata to first attempt to read the actual
release version from the RELEASE_VERSION environment variable, and only fall
back to the defaultReleaseVersion constant if that environment variable is not
set or is empty. This ensures the correct runtime version is used in diagnostics
and version reporting.
- Around line 263-273: The NodeSelector and Tolerations in the pod spec are
unconditionally hard-pinning pods to master nodes, which makes them
unschedulable on clusters without schedulable master nodes. Refactor this by
making the master node placement conditional based on cluster topology. Check
the topology configuration and only add the NodeSelector with
"node-role.kubernetes.io/master" and the corresponding Toleration for
corev1.TaintEffectNoSchedule when the topology supports it, rather than applying
these constraints unconditionally to all cluster configurations.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository YAML (base), Central YAML (inherited)
Review profile: CHILL
Plan: Enterprise
Run ID: f5693e19-a097-41a5-b91e-cbb5655a8ad2
📒 Files selected for processing (22)
Dockerfile.rhelMakefileREADME.mdcmd/crd-compatibility-checker-operator/README.mdcmd/crd-compatibility-checker-operator/main.gocmd/crd-compatibility-checker/main.godocs/controllers/crd-compatibility-checker-operator.mdmanifests/0000_20_crd-compatibility-checker_01_service_account.yamlmanifests/0000_20_crd-compatibility-checker_02_rbac_roles.yamlmanifests/0000_20_crd-compatibility-checker_03_rbac_bindings.yamlmanifests/0000_20_crd-compatibility-checker_04_metrics-service.yamlmanifests/0000_20_crd-compatibility-checker_05_webhook-service.yamlmanifests/0000_20_crd-compatibility-checker_06_allow-egress-operators.yamlmanifests/0000_20_crd-compatibility-checker_07_allow-ingress-to-webhook.yamlmanifests/0000_21_crd-compatibility-checker-operator_00_service-account.yamlmanifests/0000_21_crd-compatibility-checker-operator_01_rbac-roles.yamlmanifests/0000_21_crd-compatibility-checker-operator_02_rbac-bindings.yamlmanifests/0000_21_crd-compatibility-checker-operator_03_deployment.yamlpkg/controllers/common_consts.gopkg/controllers/crdcompatibilityoperator/controller.gopkg/controllers/crdcompatibilityoperator/controller_test.gopkg/controllers/crdcompatibilityoperator/suite_test.go
✅ Files skipped from review due to trivial changes (3)
- README.md
- cmd/crd-compatibility-checker-operator/README.md
- docs/controllers/crd-compatibility-checker-operator.md
🚧 Files skipped from review as they are similar to previous changes (12)
- Dockerfile.rhel
- manifests/0000_21_crd-compatibility-checker-operator_00_service-account.yaml
- manifests/0000_20_crd-compatibility-checker_06_allow-egress-operators.yaml
- cmd/crd-compatibility-checker/main.go
- pkg/controllers/common_consts.go
- manifests/0000_21_crd-compatibility-checker-operator_01_rbac-roles.yaml
- Makefile
- pkg/controllers/crdcompatibilityoperator/suite_test.go
- cmd/crd-compatibility-checker-operator/main.go
- manifests/0000_21_crd-compatibility-checker-operator_02_rbac-bindings.yaml
- manifests/0000_21_crd-compatibility-checker-operator_03_deployment.yaml
- pkg/controllers/crdcompatibilityoperator/controller_test.go
| releaseVersionEnv = "RELEASE_VERSION" | ||
| defaultReleaseVersion = "0.0.1-snapshot" | ||
| ) |
There was a problem hiding this comment.
RELEASE_VERSION is hard-coded, so operand version metadata is incorrect
Line 193 through Line 197 always injects "0.0.1-snapshot" and ignores runtime release version. This can skew diagnostics and version reporting.
Proposed fix
import (
"context"
"fmt"
+ "os"
@@
Env: []corev1.EnvVar{
{
Name: releaseVersionEnv,
- Value: defaultReleaseVersion,
+ Value: func() string {
+ if v := os.Getenv(releaseVersionEnv); v != "" {
+ return v
+ }
+ return defaultReleaseVersion
+ }(),
},
},Also applies to: 193-197
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@pkg/controllers/crdcompatibilityoperator/controller.go` around lines 50 - 52,
The releaseVersionEnv and defaultReleaseVersion constants are defined but the
code at lines 193-197 always injects the hardcoded defaultReleaseVersion of
"0.0.1-snapshot" instead of reading from the RELEASE_VERSION environment
variable at runtime. Fix this by updating the code that injects the operand
version metadata to first attempt to read the actual release version from the
RELEASE_VERSION environment variable, and only fall back to the
defaultReleaseVersion constant if that environment variable is not set or is
empty. This ensures the correct runtime version is used in diagnostics and
version reporting.
| NodeSelector: map[string]string{ | ||
| "node-role.kubernetes.io/master": "", | ||
| }, | ||
| PriorityClassName: "system-cluster-critical", | ||
| RestartPolicy: corev1.RestartPolicyAlways, | ||
| Tolerations: []corev1.Toleration{ | ||
| { | ||
| Key: "node-role.kubernetes.io/master", | ||
| Operator: corev1.TolerationOpExists, | ||
| Effect: corev1.TaintEffectNoSchedule, | ||
| }, |
There was a problem hiding this comment.
Control-plane-only scheduling is topology-unaware and can strand the operand
Line 263 through Line 273 hard-pin pods to master nodes for every topology. On clusters where control-plane nodes are not schedulable/present, this makes the checker permanently unschedulable.
Suggested direction
- NodeSelector: map[string]string{
- "node-role.kubernetes.io/master": "",
- },
PriorityClassName: "system-cluster-critical",
RestartPolicy: corev1.RestartPolicyAlways,
- Tolerations: []corev1.Toleration{
- {
- Key: "node-role.kubernetes.io/master",
- Operator: corev1.TolerationOpExists,
- Effect: corev1.TaintEffectNoSchedule,
- },
- },If control-plane placement is required for specific modes, make it conditional on topology instead of unconditional.
As per coding guidelines, "**/*controller*.go: ... Flag topology-unaware replica counts, control-plane node selectors, arbiter node scheduling, and PDB settings not appropriate for all topologies."
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@pkg/controllers/crdcompatibilityoperator/controller.go` around lines 263 -
273, The NodeSelector and Tolerations in the pod spec are unconditionally
hard-pinning pods to master nodes, which makes them unschedulable on clusters
without schedulable master nodes. Refactor this by making the master node
placement conditional based on cluster topology. Check the topology
configuration and only add the NodeSelector with
"node-role.kubernetes.io/master" and the corresponding Toleration for
corev1.TaintEffectNoSchedule when the topology supports it, rather than applying
these constraints unconditionally to all cluster configurations.
Source: Coding guidelines
|
/pipeline auto |
|
Pipeline controller notification The |
|
Scheduling tests matching the |
|
/pipeline required |
|
Scheduling tests matching the |
|
Tests from second stage were triggered manually. Pipeline can be controlled only manually, until HEAD changes. Use command to trigger second stage. |
|
Scheduling tests matching the |
3d46c93 to
1ac3753
Compare
There was a problem hiding this comment.
Actionable comments posted: 1
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
manifests/0000_21_crd-compatibility-checker-operator_04_deployment.yaml (1)
65-68:⚠️ Potential issue | 🟡 Minor | ⚡ Quick winAdd resource limits for CPU and memory.
The container defines resource requests but no limits. While OpenShift may have LimitRanges that apply defaults, explicit limits ensure predictable resource consumption and are required by coding guidelines.
Suggested fix
resources: requests: cpu: 10m memory: 50Mi + limits: + cpu: 100m + memory: 128Mi🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@manifests/0000_21_crd-compatibility-checker-operator_04_deployment.yaml` around lines 65 - 68, The resources section in the deployment is missing limits for CPU and memory, only defining requests. Add a limits subsection alongside the existing requests section that specifies appropriate CPU and memory limits. These limits should be set to reasonable values that prevent the container from consuming excessive resources while still allowing it to function properly. Both cpu and memory limits should be defined to ensure predictable resource consumption and compliance with organizational guidelines.Source: Coding guidelines
🧹 Nitpick comments (1)
manifests/0000_21_crd-compatibility-checker-operator_04_deployment.yaml (1)
26-73: 💤 Low valueConsider adding explicit securityContext for defense-in-depth.
Static analysis flags missing
securityContext. While theopenshift.io/required-scc: restricted-v3annotation ensures OpenShift enforces security constraints at admission, explicit settings provide documentation value and satisfy security scanners that lack OpenShift-awareness.Optional securityContext addition
containers: - name: compatibility-requirements-operator image: registry.ci.openshift.org/openshift:cluster-capi-operator + securityContext: + allowPrivilegeEscalation: false + capabilities: + drop: + - ALL + runAsNonRoot: true + seccompProfile: + type: RuntimeDefault command:🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@manifests/0000_21_crd-compatibility-checker-operator_04_deployment.yaml` around lines 26 - 73, The container specification for the compatibility-requirements-operator is missing an explicit securityContext block. Add a securityContext section to the container spec (after the image field and before or after the command/args section) that includes restricted security settings such as runAsNonRoot: true, allowPrivilegeEscalation: false, readOnlyRootFilesystem: true, and appropriate capabilities drop/add rules that align with the restricted-v3 SCC referenced in the pod annotations. This provides defense-in-depth security documentation and helps security scanners that are not OpenShift-aware validate the deployment configuration.Source: Linters/SAST tools
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@pkg/controllers/crdcompatibilityoperator/controller.go`:
- Line 154: The string literal "k8s-app" is repeated multiple times throughout
the file at lines 154, 161, 178, and 335, which violates the goconst linter
rule. Create a package-level or appropriate-scope constant to represent this
label key (for example, something like `const kubernetesAppLabel = "k8s-app"`)
and replace all four occurrences of the string literal "k8s-app" with references
to this new constant. This will ensure label consistency across the codebase and
satisfy the linter.
---
Outside diff comments:
In `@manifests/0000_21_crd-compatibility-checker-operator_04_deployment.yaml`:
- Around line 65-68: The resources section in the deployment is missing limits
for CPU and memory, only defining requests. Add a limits subsection alongside
the existing requests section that specifies appropriate CPU and memory limits.
These limits should be set to reasonable values that prevent the container from
consuming excessive resources while still allowing it to function properly. Both
cpu and memory limits should be defined to ensure predictable resource
consumption and compliance with organizational guidelines.
---
Nitpick comments:
In `@manifests/0000_21_crd-compatibility-checker-operator_04_deployment.yaml`:
- Around line 26-73: The container specification for the
compatibility-requirements-operator is missing an explicit securityContext
block. Add a securityContext section to the container spec (after the image
field and before or after the command/args section) that includes restricted
security settings such as runAsNonRoot: true, allowPrivilegeEscalation: false,
readOnlyRootFilesystem: true, and appropriate capabilities drop/add rules that
align with the restricted-v3 SCC referenced in the pod annotations. This
provides defense-in-depth security documentation and helps security scanners
that are not OpenShift-aware validate the deployment configuration.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository YAML (base), Central YAML (inherited)
Review profile: CHILL
Plan: Enterprise
Run ID: 0cde779d-a322-4a2d-bd85-f8b2d2f9f30e
📒 Files selected for processing (23)
Dockerfile.rhelMakefileREADME.mdcmd/crd-compatibility-checker-operator/README.mdcmd/crd-compatibility-checker-operator/main.gocmd/crd-compatibility-checker/main.godocs/controllers/crd-compatibility-checker-operator.mdmanifests/0000_20_crd-compatibility-checker_01_service_account.yamlmanifests/0000_20_crd-compatibility-checker_02_rbac_roles.yamlmanifests/0000_20_crd-compatibility-checker_03_rbac_bindings.yamlmanifests/0000_20_crd-compatibility-checker_04_metrics-service.yamlmanifests/0000_20_crd-compatibility-checker_05_webhook-service.yamlmanifests/0000_20_crd-compatibility-checker_06_allow-egress-operators.yamlmanifests/0000_20_crd-compatibility-checker_07_allow-ingress-to-webhook.yamlmanifests/0000_21_crd-compatibility-checker-operator_00_service-account.yamlmanifests/0000_21_crd-compatibility-checker-operator_01_rbac-roles.yamlmanifests/0000_21_crd-compatibility-checker-operator_02_rbac-bindings.yamlmanifests/0000_21_crd-compatibility-checker-operator_03_metrics-service.yamlmanifests/0000_21_crd-compatibility-checker-operator_04_deployment.yamlpkg/controllers/common_consts.gopkg/controllers/crdcompatibilityoperator/controller.gopkg/controllers/crdcompatibilityoperator/controller_test.gopkg/controllers/crdcompatibilityoperator/suite_test.go
✅ Files skipped from review due to trivial changes (5)
- docs/controllers/crd-compatibility-checker-operator.md
- cmd/crd-compatibility-checker-operator/README.md
- README.md
- pkg/controllers/common_consts.go
- manifests/0000_20_crd-compatibility-checker_06_allow-egress-operators.yaml
🚧 Files skipped from review as they are similar to previous changes (8)
- manifests/0000_21_crd-compatibility-checker-operator_00_service-account.yaml
- cmd/crd-compatibility-checker/main.go
- pkg/controllers/crdcompatibilityoperator/suite_test.go
- Makefile
- manifests/0000_21_crd-compatibility-checker-operator_02_rbac-bindings.yaml
- Dockerfile.rhel
- cmd/crd-compatibility-checker-operator/main.go
- pkg/controllers/crdcompatibilityoperator/controller_test.go
| deployment.Spec.Replicas = &replicas | ||
| deployment.Spec.Selector = &metav1.LabelSelector{ | ||
| MatchLabels: map[string]string{ | ||
| "k8s-app": operandLabel, |
There was a problem hiding this comment.
Extract repeated k8s-app key into a constant.
k8s-app is repeated multiple times and is currently flagged by goconst; pull it into a shared constant to keep labels consistent and satisfy lint.
Suggested diff
const (
controllerName = "CRDCompatibilityOperatorController"
+ k8sAppLabelKey = "k8s-app"
operandDeploymentName = "compatibility-requirements-controllers"
@@
deployment.Spec.Selector = &metav1.LabelSelector{
MatchLabels: map[string]string{
- "k8s-app": operandLabel,
+ k8sAppLabelKey: operandLabel,
},
}
@@
Labels: map[string]string{
- "k8s-app": operandLabel,
+ k8sAppLabelKey: operandLabel,
},
@@
MatchLabels: map[string]string{
- "k8s-app": operandLabel,
+ k8sAppLabelKey: operandLabel,
},
@@
pdb.Spec.Selector = &metav1.LabelSelector{
MatchLabels: map[string]string{
- "k8s-app": operandLabel,
+ k8sAppLabelKey: operandLabel,
},
}Also applies to: 161-161, 178-178, 335-335
🧰 Tools
🪛 golangci-lint (2.12.2)
[error] 154-154: string k8s-app has 4 occurrences, make it a constant
(goconst)
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@pkg/controllers/crdcompatibilityoperator/controller.go` at line 154, The
string literal "k8s-app" is repeated multiple times throughout the file at lines
154, 161, 178, and 335, which violates the goconst linter rule. Create a
package-level or appropriate-scope constant to represent this label key (for
example, something like `const kubernetesAppLabel = "k8s-app"`) and replace all
four occurrences of the string literal "k8s-app" with references to this new
constant. This will ensure label consistency across the codebase and satisfy the
linter.
Source: Linters/SAST tools
1ac3753 to
6b367dd
Compare
|
Scheduling tests matching the |
|
/hold WIP |
6b367dd to
04b5244
Compare
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@manifests/0000_21_crd-compatibility-checker-operator_04_deployment.yaml`:
- Around line 27-39: Add explicit securityContext hardening to the
compatibility-requirements-operator deployment manifest. At the pod level (under
the spec.template.spec section with serviceAccountName), add a securityContext
with runAsNonRoot set to true and readOnlyRootFilesystem set to true.
Additionally, at the container level (under the containers section for the
compatibility-requirements-operator container), add a securityContext with
allowPrivilegeEscalation set to false, readOnlyRootFilesystem set to true, and
capabilities with a drop list containing ALL, followed by an add list with only
the specific capabilities required for the operator to function (if any are
needed).
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository YAML (base), Central YAML (inherited)
Review profile: CHILL
Plan: Enterprise
Run ID: c9ddb148-60b9-4aea-83c5-17103e5d7559
📒 Files selected for processing (23)
Dockerfile.rhelMakefileREADME.mdcmd/crd-compatibility-checker-operator/README.mdcmd/crd-compatibility-checker-operator/main.gocmd/crd-compatibility-checker/main.godocs/controllers/crd-compatibility-checker-operator.mdmanifests/0000_20_crd-compatibility-checker_01_service_account.yamlmanifests/0000_20_crd-compatibility-checker_02_rbac_roles.yamlmanifests/0000_20_crd-compatibility-checker_03_rbac_bindings.yamlmanifests/0000_20_crd-compatibility-checker_04_metrics-service.yamlmanifests/0000_20_crd-compatibility-checker_05_webhook-service.yamlmanifests/0000_20_crd-compatibility-checker_06_allow-egress-operators.yamlmanifests/0000_20_crd-compatibility-checker_07_allow-ingress-to-webhook.yamlmanifests/0000_21_crd-compatibility-checker-operator_00_service-account.yamlmanifests/0000_21_crd-compatibility-checker-operator_01_rbac-roles.yamlmanifests/0000_21_crd-compatibility-checker-operator_02_rbac-bindings.yamlmanifests/0000_21_crd-compatibility-checker-operator_03_metrics-service.yamlmanifests/0000_21_crd-compatibility-checker-operator_04_deployment.yamlpkg/controllers/common_consts.gopkg/controllers/crdcompatibilityoperator/controller.gopkg/controllers/crdcompatibilityoperator/controller_test.gopkg/controllers/crdcompatibilityoperator/suite_test.go
✅ Files skipped from review due to trivial changes (3)
- docs/controllers/crd-compatibility-checker-operator.md
- cmd/crd-compatibility-checker-operator/README.md
- README.md
🚧 Files skipped from review as they are similar to previous changes (12)
- pkg/controllers/common_consts.go
- Dockerfile.rhel
- manifests/0000_20_crd-compatibility-checker_06_allow-egress-operators.yaml
- manifests/0000_21_crd-compatibility-checker-operator_03_metrics-service.yaml
- Makefile
- manifests/0000_21_crd-compatibility-checker-operator_00_service-account.yaml
- manifests/0000_21_crd-compatibility-checker-operator_02_rbac-bindings.yaml
- cmd/crd-compatibility-checker-operator/main.go
- cmd/crd-compatibility-checker/main.go
- manifests/0000_21_crd-compatibility-checker-operator_01_rbac-roles.yaml
- pkg/controllers/crdcompatibilityoperator/suite_test.go
- pkg/controllers/crdcompatibilityoperator/controller_test.go
| serviceAccountName: compatibility-requirements-operator | ||
| containers: | ||
| - name: compatibility-requirements-controllers | ||
| - name: compatibility-requirements-operator | ||
| image: registry.ci.openshift.org/openshift:cluster-capi-operator | ||
| command: | ||
| - ./crd-compatibility-checker | ||
| - ./crd-compatibility-checker-operator | ||
| args: | ||
| - --diagnostics-address=:8443 | ||
| env: | ||
| - name: RELEASE_VERSION | ||
| value: "0.0.1-snapshot" | ||
| - name: COMPATIBILITY_REQUIREMENTS_CONTROLLERS_IMAGE | ||
| value: registry.ci.openshift.org/openshift:cluster-capi-operator |
There was a problem hiding this comment.
Add explicit pod/container securityContext hardening.
Line 27 onward defines the operator pod without explicit hardening controls (runAsNonRoot, allowPrivilegeEscalation: false, capability drop, readOnlyRootFilesystem). This currently fails the security checks surfaced for this manifest.
As per coding guidelines, "If this is a Kubernetes/OpenShift manifest ... securityContext: runAsNonRoot, readOnlyRootFilesystem, allowPrivilegeEscalation: false ... Drop ALL capabilities, add only what is required."
Proposed patch
spec:
selector:
@@
template:
@@
spec:
+ securityContext:
+ runAsNonRoot: true
serviceAccountName: compatibility-requirements-operator
containers:
- name: compatibility-requirements-operator
image: registry.ci.openshift.org/openshift:cluster-capi-operator
+ securityContext:
+ allowPrivilegeEscalation: false
+ readOnlyRootFilesystem: true
+ capabilities:
+ drop:
+ - ALL
command:
- ./crd-compatibility-checker-operator🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@manifests/0000_21_crd-compatibility-checker-operator_04_deployment.yaml`
around lines 27 - 39, Add explicit securityContext hardening to the
compatibility-requirements-operator deployment manifest. At the pod level (under
the spec.template.spec section with serviceAccountName), add a securityContext
with runAsNonRoot set to true and readOnlyRootFilesystem set to true.
Additionally, at the container level (under the containers section for the
compatibility-requirements-operator container), add a securityContext with
allowPrivilegeEscalation set to false, readOnlyRootFilesystem set to true, and
capabilities with a drop list containing ALL, followed by an add list with only
the specific capabilities required for the operator to function (if any are
needed).
Sources: Coding guidelines, Linters/SAST tools
|
Scheduling tests matching the |
https://redhat.atlassian.net/browse/OCPCLOUD-3539 Implement OCPCLOUD-3539 by converting the CRD Compatibility Checker from a static CVO-deployed Deployment to a Manager/Operand pattern. This fixes SNO conformance by making the operand topology-aware. The new crd-compatibility-checker-operator: - Watches Infrastructure.status.controlPlaneTopology - Configures operand with 1 replica (no PDB) for SingleReplica topology - Configures operand with 2 replicas + PDB for HighlyAvailable topologies - Corrects drift when operand is manually modified - Uses separate ServiceAccounts (operator vs operand) for least privilege Changes: - New controller in pkg/controllers/crdcompatibilityoperator/ - New binary cmd/crd-compatibility-checker-operator/ - Integration tests covering topology transitions and drift correction - Operator manifests with RBAC for managing Deployments and PDBs - Updated build pipeline (Makefile, Dockerfile.rhel) - Renumbered CRD checker manifests to eliminate gaps The operand Deployment is now managed dynamically by the operator instead of statically by CVO, allowing proper HA configuration without breaking SNO.
04b5244 to
fe45557
Compare
There was a problem hiding this comment.
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
manifests/0000_21_crd-compatibility-checker-operator_04_deployment.yaml (1)
71-74:⚠️ Potential issue | 🟠 Major | ⚡ Quick winAdd CPU/memory limits to the operator container.
Line 71 defines requests only; without limits this pod can consume unbounded resources under pressure.
As per coding guidelines, "Resource limits (cpu, memory) on every container".
Proposed patch
resources: requests: cpu: 10m memory: 50Mi + limits: + cpu: 100m + memory: 200Mi🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@manifests/0000_21_crd-compatibility-checker-operator_04_deployment.yaml` around lines 71 - 74, The operator container in the crd-compatibility-checker-operator deployment defines resource requests (cpu: 10m, memory: 50Mi) but lacks resource limits, allowing unbounded consumption. Add a limits section under the resources field at the same indentation level as requests, specifying appropriate CPU and memory limits for the container to prevent it from consuming excessive resources under pressure.Source: Coding guidelines
🧹 Nitpick comments (1)
pkg/controllers/crdcompatibilityoperator/controller.go (1)
243-248: 🧹 Nitpick | 🔵 Trivial | ⚡ Quick winConsider adding resource limits.
Only
Requestsare defined;Limitsare absent. Without limits, a misbehaving container could consume unbounded CPU/memory, potentially affecting node stability.Suggested fix
Resources: corev1.ResourceRequirements{ Requests: corev1.ResourceList{ corev1.ResourceCPU: resource.MustParse("10m"), corev1.ResourceMemory: resource.MustParse("50Mi"), }, + Limits: corev1.ResourceList{ + corev1.ResourceCPU: resource.MustParse("100m"), + corev1.ResourceMemory: resource.MustParse("128Mi"), + }, },🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@pkg/controllers/crdcompatibilityoperator/controller.go` around lines 243 - 248, The ResourceRequirements struct in the controller definition only specifies resource requests without defining resource limits. Add a Limits field to the ResourceRequirements struct alongside the existing Requests field, setting appropriate CPU and memory limits (typically the same as or slightly higher than the request values) to prevent containers from consuming unbounded resources and destabilizing the node.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Outside diff comments:
In `@manifests/0000_21_crd-compatibility-checker-operator_04_deployment.yaml`:
- Around line 71-74: The operator container in the
crd-compatibility-checker-operator deployment defines resource requests (cpu:
10m, memory: 50Mi) but lacks resource limits, allowing unbounded consumption.
Add a limits section under the resources field at the same indentation level as
requests, specifying appropriate CPU and memory limits for the container to
prevent it from consuming excessive resources under pressure.
---
Nitpick comments:
In `@pkg/controllers/crdcompatibilityoperator/controller.go`:
- Around line 243-248: The ResourceRequirements struct in the controller
definition only specifies resource requests without defining resource limits.
Add a Limits field to the ResourceRequirements struct alongside the existing
Requests field, setting appropriate CPU and memory limits (typically the same as
or slightly higher than the request values) to prevent containers from consuming
unbounded resources and destabilizing the node.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository YAML (base), Central YAML (inherited)
Review profile: CHILL
Plan: Enterprise
Run ID: 33aa2b0e-b085-4fad-bd36-6777688b9de7
📒 Files selected for processing (23)
Dockerfile.rhelMakefileREADME.mdcmd/crd-compatibility-checker-operator/README.mdcmd/crd-compatibility-checker-operator/main.gocmd/crd-compatibility-checker/main.godocs/controllers/crd-compatibility-checker-operator.mdmanifests/0000_20_crd-compatibility-checker_01_service_account.yamlmanifests/0000_20_crd-compatibility-checker_02_rbac_roles.yamlmanifests/0000_20_crd-compatibility-checker_03_rbac_bindings.yamlmanifests/0000_20_crd-compatibility-checker_04_metrics-service.yamlmanifests/0000_20_crd-compatibility-checker_05_webhook-service.yamlmanifests/0000_20_crd-compatibility-checker_06_allow-egress-operators.yamlmanifests/0000_20_crd-compatibility-checker_07_allow-ingress-to-webhook.yamlmanifests/0000_21_crd-compatibility-checker-operator_00_service-account.yamlmanifests/0000_21_crd-compatibility-checker-operator_01_rbac-roles.yamlmanifests/0000_21_crd-compatibility-checker-operator_02_rbac-bindings.yamlmanifests/0000_21_crd-compatibility-checker-operator_03_metrics-service.yamlmanifests/0000_21_crd-compatibility-checker-operator_04_deployment.yamlpkg/controllers/common_consts.gopkg/controllers/crdcompatibilityoperator/controller.gopkg/controllers/crdcompatibilityoperator/controller_test.gopkg/controllers/crdcompatibilityoperator/suite_test.go
✅ Files skipped from review due to trivial changes (5)
- pkg/controllers/common_consts.go
- README.md
- cmd/crd-compatibility-checker-operator/README.md
- docs/controllers/crd-compatibility-checker-operator.md
- manifests/0000_21_crd-compatibility-checker-operator_00_service-account.yaml
🚧 Files skipped from review as they are similar to previous changes (9)
- Dockerfile.rhel
- manifests/0000_21_crd-compatibility-checker-operator_03_metrics-service.yaml
- cmd/crd-compatibility-checker/main.go
- manifests/0000_21_crd-compatibility-checker-operator_02_rbac-bindings.yaml
- Makefile
- manifests/0000_21_crd-compatibility-checker-operator_01_rbac-roles.yaml
- manifests/0000_20_crd-compatibility-checker_06_allow-egress-operators.yaml
- pkg/controllers/crdcompatibilityoperator/suite_test.go
- pkg/controllers/crdcompatibilityoperator/controller_test.go
|
Scheduling tests matching the |
|
/hold cancel /test e2e-gcp-ovn-techpreview |
|
@simkam: The following tests failed, say
Full PR test history. Your PR dashboard. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
|
/test e2e-aws-capi-disconnected-techpreview |
https://redhat.atlassian.net/browse/OCPCLOUD-3539
Implement OCPCLOUD-3539 by converting the CRD Compatibility Checker from a static CVO-deployed Deployment to a Manager/Operand pattern. This fixes SNO conformance by making the operand topology-aware.
The new crd-compatibility-checker-operator:
Changes:
The operand Deployment is now managed dynamically by the operator instead of statically by CVO, allowing proper HA configuration without breaking SNO.
Summary by CodeRabbit
Release Notes
New Features
Documentation
Chores
Tests