CNTRLPLANE-3367: Add KMS key rotation section#2036
Conversation
This covers the KEK rotation mechanism through external KMS via the plugin architecture. Signed-off-by: Thomas Jungblut <tjungblu@redhat.com>
This covers an annotation-based approach to detect and migrate on external KEK changes in the KMS plugin architecture. Replaces a previous PR in openshift#2000. Signed-off-by: Thomas Jungblut <tjungblu@redhat.com>
|
@tjungblu: This pull request references CNTRLPLANE-3367 which is a valid jira issue. Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "5.0.0" version, but no target version was set. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: The full list of commands accepted by this bot can be found here. DetailsNeeds approval from an approver in each of these files:Approvers can indicate their approval by writing |
ardaguclu
left a comment
There was a problem hiding this comment.
This is very well-written PR, thank you. Dropped a comment. Other than that looks good to me.
| |---------------------|--------|-------|---------| | ||
| | `encryption.apiserver.operator.openshift.io/target-kek-id` | KMS rotation controller | `kekId` | Target kekId to migrate toward (after 5m delay) | | ||
| | `encryption.apiserver.operator.openshift.io/migrated-kek-id` | migrationController | `kekId` | Last fully migrated kekId | | ||
| | `encryption.apiserver.operator.openshift.io/kek-converged-at` | KMS rotation controller | RFC3339 | When candidate `kekId` first achieved cluster convergence | |
There was a problem hiding this comment.
Just a comment about the implementation: Since these 2 annotations are only used by rotation controller, maybe we don't need to carry them in KeyState.
|
/lgtm |
|
/hold for others to review |
| 4. For **≥ 5m** converged on `kek-new`: if health diverges, clears `kek-converged-id` and `kek-converged-at`; if stable, sets **`encryption.apiserver.operator.openshift.io/target-kek-id = kek-new`** and clears convergence pair. | ||
| 5. **migrationController** sees `needsMigration`. **Migrator** runs with `encryption.apiserver.operator.openshift.io/write-key = {keyName}-kek-new`. | ||
| 6. **State machine** and **keyController** hold while `needsMigration`. | ||
| 7. **migrationController** completes all GRs → **`encryption.apiserver.operator.openshift.io/migrated-kek-id = kek-new`**. |
There was a problem hiding this comment.
What happens in the following scenario ?
- target-kek-id = kek-A, migration runs, SVMs annotated with write-key = {keyName}-kek-A
- All SVMs complete migration controller is about to write migrated-kek-id
- Rotation controller sets target-kek-id = kek-B right at this moment
does migrationController read target-kek-id to determine the value for migrated-kek-id, or does it use the kekId from the SVM's write-key annotation?
If it re-reads then kek-B migration might be skipped which is a bug.
There was a problem hiding this comment.
Getting the latest encryption key secret will certainly be a bug here, which we also can't avoid, e.g. when the operator restarts while the migration is running.
What do you think about ensuring there is no change to target-kek-id while migrated-kek-id != target-kek-id?
|
|
||
| | Full annotation key | Writer | Value | Meaning | | ||
| |---------------------|--------|-------|---------| | ||
| | `encryption.apiserver.operator.openshift.io/target-kek-id` | KMS rotation controller | `kekId` | Target kekId to migrate toward (after 5m delay on rotation); equals `migrated-kek-id` in steady state | |
There was a problem hiding this comment.
didn't we want to delegate this to the key controller? I think that would solve the race condition we discussed, wouldn't it?
There was a problem hiding this comment.
pong :) one change at a time, let's get the other things sorted out. I'll dedicate an alternative for this.
| Rotation progress is tracked on the write-key secret via annotations. | ||
|
|
||
| **Responsibilities:** | ||
| - **KMSPreflightController** writes the first `encryption.apiserver.operator.openshift.io/target-kek-id` on the write-key secret, using the `kekId` observed from the plugin `Status` check during [pre-flight](#pre-flight-checker-tech-preview-v2). |
There was a problem hiding this comment.
note that the "write-key" will be created after the preflight phase. so the preflight cannot store this information on the "write-key" secret.
There was a problem hiding this comment.
When new key is created, it will not be immediately the write key.
It will start as backup key;
- current write key (1)
- new key as read key (2)
After that new roll out, it will be promoted as write key
- new key as new write key (2)
- previous write key is now read key (1)
In that case, key controller can write the kekID retrieved from preflight checker into the new key.
There was a problem hiding this comment.
This is out of scope of this PR: This can be the integration logic between preflight checker and key controller. Key controller won't create the new key, until it sees a kekID for the calculated hash of the given KMS plugin.
There was a problem hiding this comment.
changed this interaction to the key controller
|
@tjungblu: all tests passed! Full PR test history. Your PR dashboard. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
This covers an annotation-based approach to detect and migrate on external KEK changes in the KMS plugin architecture. This design differs from openshift#2036 by centralizing everything in the existing key controller instead of creating a new rotation controller.
|
/lgtm |
|
/lgtm cancel |
|
/close we can circle back here if needed, otherwise this is superseded by #2041 |
|
@tjungblu: Closed this PR. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
This covers an annotation-based approach to detect and migrate on external KEK changes in the KMS plugin architecture.
Replaces a previous PR in #2000