Skip to content

WIP: update EtcdBackendQuota feature#2031

Open
atiratree wants to merge 1 commit into
openshift:masterfrom
atiratree:EtcdBackendQuota
Open

WIP: update EtcdBackendQuota feature#2031
atiratree wants to merge 1 commit into
openshift:masterfrom
atiratree:EtcdBackendQuota

Conversation

@atiratree

Copy link
Copy Markdown
Member

No description provided.

@openshift-ci openshift-ci Bot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Jun 3, 2026
@openshift-ci openshift-ci Bot requested review from benluddy and jaypoulz June 3, 2026 13:25
@openshift-ci

openshift-ci Bot commented Jun 3, 2026

Copy link
Copy Markdown
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please assign dustymabe for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@atiratree atiratree force-pushed the EtcdBackendQuota branch 2 times, most recently from d5e7f93 to 7c17dce Compare June 3, 2026 14:22
Comment thread enhancements/etcd/etcd-size-tuning.md Outdated
Decreasing the quota has the following benefits:
- The DB size has a big impact on the memory usage. Decreasing the quota will limit the memory usage.
This can prevent further cluster performance degradation
- During a cluster revert to a former state (downgrade, backup/restore) it might be beneficial to keep the old value. This also affects alerts / expectations and other metrics.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Downgrades should not be a factor as the CVO does not allow y-stream downgrades (since 4.14 or so); only z-stream from the one version previous to an upgrade.

Backup and restore should also not be a factor since restoring effectively moves the cluster back in time. Since this is a wholly destructive process, it does not matter what the current quota or current size is, they'll be reverted to the previous quota and all objects in etcd will be replaced with the previous state.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It might also include partial backup/restore done by other tools. Nevertheless, I removed it as it is not the most important flow. I am only highlighting the memory impact now.

Comment thread enhancements/etcd/etcd-size-tuning.md Outdated
Comment thread enhancements/etcd/etcd-size-tuning.md Outdated
* Allow configuration of the backend limit via human readable units: 16GiB.
* Add an API to allow admins to change the value.
* The backend limit can only be increased and not decreased.
* The backend cannot be decreased below the current DB size.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the general context of allowing decreases. I'm wondering if that should be a separate, future enhancement, and that we should just focus on getting the GA of this feature as-is.

Like with the lower max limitation, it's allowable to relax constraints in z-streams which I would argue this would be.

I think, in general, the customers asking for this feature are doing so because they're running against the current quota and wish to increase it forever. Given the memory consumption implications of increasing the quota (and using the headroom gained by increasing the quota), ensuring the sizing of the nodes prior to increase is essential, and I think that many, if not all, customers will view this as a one-time option. Because of those implications and requirements, the node will be sized larger anyway, regardless of what the quota is. Meaning, if they provision nodes that can handle 16GiB size-on-disk (and set the quota to 16GiB), and they're only using 12GiB on-disk. They would then need to re-provision the node to be smaller: just changing the quota doesn't directly effect how much memory or storage etcd takes, it's just a limit.

That is all to say, I don't know if allowing decrease is a feature that will be used, and in my opinion, it would be better to wait for a customer to ask for it - so we know there's a usecase - before putting in the work to implement it for this release cycle.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I still think, downsizing your cluster in order to reduce cost is still relevant part of the story, but it can be done later. I agree that the feature can be pushed as is, to not postpone the graduation. I moved the Quota decrease story to the Alternatives.

Comment thread enhancements/etcd/etcd-size-tuning.md Outdated
@atiratree atiratree force-pushed the EtcdBackendQuota branch 2 times, most recently from a37a0c5 to 3f5c998 Compare June 11, 2026 10:00
@openshift-ci

openshift-ci Bot commented Jun 11, 2026

Copy link
Copy Markdown
Contributor

@atiratree: all tests passed!

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.


In general increasing the quota is done in response to increased demand on the cluster.
Decreasing or reverting the quota is unlikely to happen often, but it has the following benefit:
- The DB size has a big impact on the memory usage. Decreasing the quota will limit the memory usage.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Decreasing the quota will limit the memory usage

That's slightly misleading. Decreasing the quota would allow a customer to downsize a node (to reduce cost) which I think we should highlight as the main motivation of this enhancement; just changing the quota only affects the etcd instances if they're above/at the quota: then they can't write, but if they're below the large quota, their memory usage is no different than if the quota was smaller.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants