Skip to content

Mismatch between LinstorSatellite tolerations and actual Pods #957

Description

@WanzenBug

When a node is evacuated, a user might use something like kubectl taint node <node> piraeus.io/remove-satellite:NoSchedule and then delete the LinstorSatellite resource. This means that after evacuation, the Satellite is not recreated.

However, there is the chance that critical pods (linstor-csi-node, linstor-satellite, etc..) get removed first. This then blocks evacuation, as the DaemonSets get recreated, but because of the above taint, the Pod cannot be scheduled. So we have a satellite that should be evacuated, but cannot be because the Pod was removed and it does not get recreated because it can't be scheduled.

We should consider updating the Pod tolerations so that in this situations, critical pods can get recreated

Metadata

Metadata

Assignees

Labels

bugSomething isn't working

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions