Skip to content

Drop operator privileges by shipping well-known component ServiceAccounts statically #1012

Description

@WanzenBug

Drop operator privileges by shipping well-known component ServiceAccounts statically

Summary

The operator currently runs under a "super" ServiceAccount that holds the union of every permission used by every component it deploys, plus full management of rbac.authorization.k8s.io resources. This proposes refactoring so that the component ServiceAccounts and their RBAC are deployed statically alongside the operator (Helm / Kustomize / OLM), letting us prune the operator's own role down to only what its reconcilers actually call.

Background: why the operator role is so broad

This is a consequence of Kubernetes RBAC privilege-escalation prevention, not an oversight. The operator server-side-applies the full RBAC stack (ServiceAccount + Role + ClusterRole + bindings) for each component at runtime, from manifests embedded in pkg/resources/cluster/<component>/.

To create a Role/ClusterRole with permission set P, the API server requires the caller to already hold every permission in P (or hold the escalate verb); to create a binding, it must hold the role's permissions (or bind). So config/rbac/role.yaml is forced to be the union of all component permissions plus create/update/delete on roles/clusterroles/rolebindings/clusterrolebindings.

That's why the operator holds permissions it never uses itself — only to be allowed to grant them:

  • rbac.authorization.k8s.io roles/clusterroles/rolebindings/clusterrolebindings (all verbs)
  • internal.linstor.linbit.com/* (the LINSTOR controller's database)
  • pods/eviction (ha-controller)
  • snapshot + groupsnapshot APIs, volumeattachments, csinodes, csistoragecapacities (CSI sidecars)
  • endpointslices (nfs-server)
  • securitycontextconstraints use (component pods)

There is already a precedent for the target pattern: the gencert job ships its own fixed SA + narrow RBAC statically (config/gencert/rbac.yaml), and all component SA names are already fixed/well-known.

Components and their ServiceAccounts

Component SA name Source dir
linstor-controller linstor-controller pkg/resources/cluster/controller/
csi-controller linstor-csi-controller pkg/resources/cluster/csi-controller/
csi-node linstor-csi-node pkg/resources/cluster/csi-node/
ha-controller ha-controller pkg/resources/cluster/ha-controller/rbac.yaml
affinity-controller linstor-affinity-controller pkg/resources/cluster/affinity-controller/
nfs-server linstor-csi-nfs-server pkg/resources/cluster/nfs-server/
satellite satellite pkg/resources/cluster/satellite-common/

Goal

  • Operator no longer holds any rbac.authorization.k8s.io permissions and no longer creates/owns component RBAC.
  • Component SAs/Roles/bindings are created by the installer with fixed, minimal, auditable permissions.
  • Operator role is reduced to the kinds its reconcilers directly apply or read.

Proposed approach

BEFORE: installer -> [operator SA = superset] -> operator creates all component SA+RBAC at runtime
AFTER:  installer -> operator SA (minimal) + all component SAs/Roles/bindings (fixed, static)
        operator only references SAs by name; never touches rbac.authorization.k8s.io

Key enabling decision: which optional components (NFS / HA / affinity / external-controller) get deployed is driven by the LinstorCluster CR at runtime, which the installer cannot know in advance. Resolve this by shipping all component SAs/Roles/bindings statically and unconditionally. An unused ServiceAccount + Role grants nothing (nothing runs under it), so pre-creating them all is inert and safe. This is what makes static RBAC compatible with CR-driven component selection.

Tasks

1. Lift component RBAC into install artifacts (keep 3 surfaces in sync)

  • Helm: add templates under charts/piraeus/templates/ for each component's SA/Role/ClusterRole/bindings (namespace .Release.Namespace, standard chart labels), optionally gated by .Values.<component>.rbac.create (default true).
  • Kustomize: add a config/component-rbac/ base and wire it into config/default/kustomization.yaml.
  • OLM: add each component SA + rules to bundle/manifests/piraeus-operator.clusterserviceversion.yaml under spec.install.spec.permissions / clusterPermissions (mirroring the existing gencert split).

2. Stop the operator creating/owning component RBAC

  • Remove the RBAC YAMLs from each component kustomization.yaml resources: list and from the //go:embed sets in pkg/resources/cluster/resources.go (and satellite equivalent).
  • Remove rbacv1.Role/ClusterRole/RoleBinding/ClusterRoleBinding and corev1.ServiceAccount from the prune lists in internal/controller/linstorcluster_controller.go (utils.PruneResources, ~line 256) and the satellite controller — otherwise the operator deletes the now-static objects.
  • Confirm deployments/daemonsets still reference fixed SA names (serviceAccountName: linstor-controller, etc.); no change expected since the SAs now come from the installer in the same namespace.

3. Drop the operator's privileges

  • Prune the //+kubebuilder:rbac markers in internal/controller/linstorcluster_controller.go (the rbac.authorization.k8s.io marker is the headline removal).
    • Keep: piraeus.io CRs/status/finalizers; apps deployments+daemonsets; core configmaps/services/secrets + nodes/pods read; apiextensions CRDs (audit); cert-manager.io certificates; cluster.x-k8s.io machines; storage.k8s.io csidrivers only; events; leader-election leases.
    • Drop: all rbac.authorization.k8s.io; serviceaccounts create/delete; internal.linstor.linbit.com/*; pods/eviction; snapshot + groupsnapshot APIs; volumeattachments/csinodes/csistoragecapacities/storageclasses/volumeattributesclasses; endpointslices; SCC use; PV/PVC write.
  • Run make manifests, then tools/copy-rbac-config-to-chart.sh, to regenerate config/rbac/role.yaml and charts/piraeus/templates/rbac.yaml.
  • Update the CSV clusterPermissions to match the regenerated operator role.

4. Migration / ownership handover

  • Add Helm adoption metadata to the new templates (meta.helm.sh/release-name, meta.helm.sh/release-namespace, label app.kubernetes.io/managed-by: Helm) so helm upgrade adopts the existing objects by name instead of erroring.
  • Provide a one-time step (Helm pre-upgrade hook Job, or operator startup logic) to strip ownerReferences from the pre-existing component RBAC objects, decoupling their lifecycle from the LinstorCluster CR (prevents GC from deleting them if the CR is later removed). Fresh installs need none of this.

5. Docs & validation

  • Update docs/ (RBAC/security + upgrade notes): document the well-known SAs and the reduced operator role.
  • Add a CI guard asserting the operator role contains no rbac.authorization.k8s.io rules, so a stray future +kubebuilder:rbac marker can't silently re-bloat it.
  • e2e: verify a clean install and an upgrade-from-previous-version both reconcile a healthy cluster.

Open decisions

  1. Backward compat: adoption migration (recommended) vs. documented uninstall/reinstall on upgrade.
  2. Component toggles: ship all component SAs unconditionally (recommended) vs. gate each behind Helm values.
  3. Rollout order: all three install surfaces at once vs. Helm first, then Kustomize/OLM.

Acceptance criteria

  • config/rbac/role.yaml contains no rbac.authorization.k8s.io rules and none of the "grant-only" permissions listed above.
  • Component SAs/Roles/bindings are created by Helm/Kustomize/OLM, not by the operator at runtime.
  • Fresh install and upgrade both produce a working cluster with no orphaned or duplicated RBAC objects.

Affected files (reference)

  • internal/controller/linstorcluster_controller.go (rbac markers, prune list)
  • internal/controller/linstorsatellite_controller.go (prune list)
  • pkg/resources/cluster/*/ (component RBAC + kustomization.yaml), pkg/resources/cluster/resources.go (embed sets)
  • config/rbac/role.yaml, config/default/kustomization.yaml, new config/component-rbac/
  • charts/piraeus/templates/rbac.yaml + new component RBAC templates, tools/copy-rbac-config-to-chart.sh
  • bundle/manifests/piraeus-operator.clusterserviceversion.yaml

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions