WIP: update EtcdBackendQuota feature#2031
Conversation
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: The full list of commands accepted by this bot can be found here. DetailsNeeds approval from an approver in each of these files:Approvers can indicate their approval by writing |
d5e7f93 to
7c17dce
Compare
| Decreasing the quota has the following benefits: | ||
| - The DB size has a big impact on the memory usage. Decreasing the quota will limit the memory usage. | ||
| This can prevent further cluster performance degradation | ||
| - During a cluster revert to a former state (downgrade, backup/restore) it might be beneficial to keep the old value. This also affects alerts / expectations and other metrics. |
There was a problem hiding this comment.
Downgrades should not be a factor as the CVO does not allow y-stream downgrades (since 4.14 or so); only z-stream from the one version previous to an upgrade.
Backup and restore should also not be a factor since restoring effectively moves the cluster back in time. Since this is a wholly destructive process, it does not matter what the current quota or current size is, they'll be reverted to the previous quota and all objects in etcd will be replaced with the previous state.
There was a problem hiding this comment.
It might also include partial backup/restore done by other tools. Nevertheless, I removed it as it is not the most important flow. I am only highlighting the memory impact now.
| * Allow configuration of the backend limit via human readable units: 16GiB. | ||
| * Add an API to allow admins to change the value. | ||
| * The backend limit can only be increased and not decreased. | ||
| * The backend cannot be decreased below the current DB size. |
There was a problem hiding this comment.
In the general context of allowing decreases. I'm wondering if that should be a separate, future enhancement, and that we should just focus on getting the GA of this feature as-is.
Like with the lower max limitation, it's allowable to relax constraints in z-streams which I would argue this would be.
I think, in general, the customers asking for this feature are doing so because they're running against the current quota and wish to increase it forever. Given the memory consumption implications of increasing the quota (and using the headroom gained by increasing the quota), ensuring the sizing of the nodes prior to increase is essential, and I think that many, if not all, customers will view this as a one-time option. Because of those implications and requirements, the node will be sized larger anyway, regardless of what the quota is. Meaning, if they provision nodes that can handle 16GiB size-on-disk (and set the quota to 16GiB), and they're only using 12GiB on-disk. They would then need to re-provision the node to be smaller: just changing the quota doesn't directly effect how much memory or storage etcd takes, it's just a limit.
That is all to say, I don't know if allowing decrease is a feature that will be used, and in my opinion, it would be better to wait for a customer to ask for it - so we know there's a usecase - before putting in the work to implement it for this release cycle.
There was a problem hiding this comment.
I still think, downsizing your cluster in order to reduce cost is still relevant part of the story, but it can be done later. I agree that the feature can be pushed as is, to not postpone the graduation. I moved the Quota decrease story to the Alternatives.
a37a0c5 to
3f5c998
Compare
3f5c998 to
ba89826
Compare
|
@atiratree: all tests passed! Full PR test history. Your PR dashboard. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
|
|
||
| In general increasing the quota is done in response to increased demand on the cluster. | ||
| Decreasing or reverting the quota is unlikely to happen often, but it has the following benefit: | ||
| - The DB size has a big impact on the memory usage. Decreasing the quota will limit the memory usage. |
There was a problem hiding this comment.
Decreasing the quota will limit the memory usage
That's slightly misleading. Decreasing the quota would allow a customer to downsize a node (to reduce cost) which I think we should highlight as the main motivation of this enhancement; just changing the quota only affects the etcd instances if they're above/at the quota: then they can't write, but if they're below the large quota, their memory usage is no different than if the quota was smaller.
No description provided.