Scale-down during failed rolling replacement can skip decommission

# Scale-down during failed rolling replacement can skip decommission

## Environment

- Kubernetes: local kind cluster
- CockroachDB operator image: `cockroachdb/cockroach-operator:v2.18.3`
- Operator args: default released `v2.18.3` args
- CockroachDB image: `cockroachdb/cockroach:v25.2.12`
- Test topology: 4-node insecure `CrdbCluster`, then shrink `4 -> 3`


## What Happened

When the user scales the CockroachDB CR down (spec.nodes 4 → 3) while the operator is in the middle of performing the StatefulSet rolling replacement of the highest-ordinal pod, the operator skips the node decommission and shrinks the StatefulSet directly. This cause the operator to delete the CockroachDB node without properly decommissioning it.

Specifically, the cluster started with four joined CockroachDB members, `pod-0` to `pod-3`. Then the operator starts to restart the last pod (pod-3). While the StatefulSet was in that intermediate state, the user changed `spec.nodes` from `4` to `3`. Instead of blocking scale-down or decommissioning the `pod-3`, the operator updated the StatefulSet to `spec.replicas=3`, deleting the last pod directly.

Therefore, Kubernetes and CockroachDB had different views of the same node. The pod-3 has been deleted by the Kubernetes, and it still remains in the CockroachDB membership list, considered as crashed.

I have attached a reproduction script here: https://gist.github.com/oyjhl/cd672bf67a70219514355a7c8bd3c607

## Expected Behavior

The operator must not lower StatefulSet replicas until CockroachDB reports the same node is fully decommissioned.

The operator should either:

- decommission the CockroachDB member `pod-3` before shrinking;
- block scale-down while the replacement is stuck;
- or recover the replacement first, rediscover the node identity, and then run
  the normal decommission flow.

## Where The Source Code Goes Wrong

Decommission can be skipped while Deploy is still allowed to shrink the StatefulSet.

In the bad run, the source behaves like this:

```text
pod-3 had joined CockroachDB
pod-3 replacement was stuck
user requested 4 -> 3 scale-down
operator skipped Decommission
operator ran Deploy
Deploy wrote StatefulSet spec.replicas=3
CockroachDB still reported pod-3's node as active with nonzero replicas
```

The decommission gate depends on StatefulSet rollout status:

```text
ss.Spec.Replicas             desired Kubernetes pod count stored in the StatefulSet
ss.Status.Replicas           pods currently observed by the StatefulSet controller
ss.Status.CurrentReplicas    pods already updated to the latest StatefulSet pod spec generated from the CR
cluster.Spec().Nodes         desired CockroachDB node count from the CrdbCluster
```

`CurrentReplicas` is not "how many CockroachDB nodes are safe to remove". It is only Kubernetes rollout status: how many observed pods already match the current StatefulSet revision. During a failed rolling replacement, the StatefulSet can still have four observed pods, but only three current-revision pods:

```text
ss.Spec.Replicas             = 4
ss.Status.Replicas           = 4
ss.Status.CurrentReplicas    = 3
cluster.Spec().Nodes         = 3
```

In that state, [the first predicate](https://github.com/cockroachdb/cockroach-operator/blob/3a76617b0704cf5becb4203237a209e0345a4187/pkg/actor/director.go#L279-L280
) is false (`3 != 4`).

```{go}
ss.Status.CurrentReplicas == ss.Status.Replicas
ss.Status.CurrentReplicas > cluster.Spec().Nodes
```

But the deploy path can still see that the desired StatefulSet should now have only three replicas and write that smaller replica count. Rather, it should check:

```
*ss.Spec.Replicas > cluster.Spec().Nodes
```


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Scale-down during failed rolling replacement can skip decommission #1151

Scale-down during failed rolling replacement can skip decommission

Environment

What Happened

Expected Behavior

Where The Source Code Goes Wrong

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Uh oh!

Scale-down during failed rolling replacement can skip decommission #1151

Description

Scale-down during failed rolling replacement can skip decommission

Environment

What Happened

Expected Behavior

Where The Source Code Goes Wrong

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions