Skip to content

CNTRLPLANE-3367: Add KMS key rotation section#2036

Closed
tjungblu wants to merge 5 commits into
openshift:masterfrom
tjungblu:kms_rotation_annotations
Closed

CNTRLPLANE-3367: Add KMS key rotation section#2036
tjungblu wants to merge 5 commits into
openshift:masterfrom
tjungblu:kms_rotation_annotations

Conversation

@tjungblu

Copy link
Copy Markdown
Contributor

This covers an annotation-based approach to detect and migrate on external KEK changes in the KMS plugin architecture.
Replaces a previous PR in #2000

tjungblu added 2 commits June 10, 2026 15:23
This covers the KEK rotation mechanism through external KMS via the plugin architecture.

Signed-off-by: Thomas Jungblut <tjungblu@redhat.com>
This covers an annotation-based approach to detect and migrate on external KEK changes in the KMS plugin architecture.

Replaces a previous PR in openshift#2000.

Signed-off-by: Thomas Jungblut <tjungblu@redhat.com>
@openshift-ci-robot openshift-ci-robot added the jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. label Jun 10, 2026
@openshift-ci-robot

openshift-ci-robot commented Jun 10, 2026

Copy link
Copy Markdown

@tjungblu: This pull request references CNTRLPLANE-3367 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "5.0.0" version, but no target version was set.

Details

In response to this:

This covers an annotation-based approach to detect and migrate on external KEK changes in the KMS plugin architecture.
Replaces a previous PR in #2000

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@openshift-ci openshift-ci Bot requested review from jupierce and stleerh June 10, 2026 14:50
@openshift-ci

openshift-ci Bot commented Jun 10, 2026

Copy link
Copy Markdown
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please assign jan--f for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@ardaguclu ardaguclu left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is very well-written PR, thank you. Dropped a comment. Other than that looks good to me.

|---------------------|--------|-------|---------|
| `encryption.apiserver.operator.openshift.io/target-kek-id` | KMS rotation controller | `kekId` | Target kekId to migrate toward (after 5m delay) |
| `encryption.apiserver.operator.openshift.io/migrated-kek-id` | migrationController | `kekId` | Last fully migrated kekId |
| `encryption.apiserver.operator.openshift.io/kek-converged-at` | KMS rotation controller | RFC3339 | When candidate `kekId` first achieved cluster convergence |

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a comment about the implementation: Since these 2 annotations are only used by rotation controller, maybe we don't need to carry them in KeyState.

Comment thread enhancements/kube-apiserver/kms-encryption-foundations.md Outdated
@ardaguclu

Copy link
Copy Markdown
Member

/lgtm

@openshift-ci openshift-ci Bot added the lgtm Indicates that a PR is ready to be merged. label Jun 11, 2026
@tjungblu

Copy link
Copy Markdown
Contributor Author

/hold

for others to review

@openshift-ci openshift-ci Bot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Jun 11, 2026
4. For **≥ 5m** converged on `kek-new`: if health diverges, clears `kek-converged-id` and `kek-converged-at`; if stable, sets **`encryption.apiserver.operator.openshift.io/target-kek-id = kek-new`** and clears convergence pair.
5. **migrationController** sees `needsMigration`. **Migrator** runs with `encryption.apiserver.operator.openshift.io/write-key = {keyName}-kek-new`.
6. **State machine** and **keyController** hold while `needsMigration`.
7. **migrationController** completes all GRs → **`encryption.apiserver.operator.openshift.io/migrated-kek-id = kek-new`**.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What happens in the following scenario ?

  1. target-kek-id = kek-A, migration runs, SVMs annotated with write-key = {keyName}-kek-A
  2. All SVMs complete migration controller is about to write migrated-kek-id
  3. Rotation controller sets target-kek-id = kek-B right at this moment

does migrationController read target-kek-id to determine the value for migrated-kek-id, or does it use the kekId from the SVM's write-key annotation?

If it re-reads then kek-B migration might be skipped which is a bug.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Getting the latest encryption key secret will certainly be a bug here, which we also can't avoid, e.g. when the operator restarts while the migration is running.

What do you think about ensuring there is no change to target-kek-id while migrated-kek-id != target-kek-id?


| Full annotation key | Writer | Value | Meaning |
|---------------------|--------|-------|---------|
| `encryption.apiserver.operator.openshift.io/target-kek-id` | KMS rotation controller | `kekId` | Target kekId to migrate toward (after 5m delay on rotation); equals `migrated-kek-id` in steady state |

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

didn't we want to delegate this to the key controller? I think that would solve the race condition we discussed, wouldn't it?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ping

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pong :) one change at a time, let's get the other things sorted out. I'll dedicate an alternative for this.

Comment thread enhancements/kube-apiserver/kms-encryption-foundations.md
@openshift-ci openshift-ci Bot removed the lgtm Indicates that a PR is ready to be merged. label Jun 15, 2026
Rotation progress is tracked on the write-key secret via annotations.

**Responsibilities:**
- **KMSPreflightController** writes the first `encryption.apiserver.operator.openshift.io/target-kek-id` on the write-key secret, using the `kekId` observed from the plugin `Status` check during [pre-flight](#pre-flight-checker-tech-preview-v2).

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

note that the "write-key" will be created after the preflight phase. so the preflight cannot store this information on the "write-key" secret.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When new key is created, it will not be immediately the write key.

It will start as backup key;

  • current write key (1)
  • new key as read key (2)

After that new roll out, it will be promoted as write key

  • new key as new write key (2)
  • previous write key is now read key (1)

In that case, key controller can write the kekID retrieved from preflight checker into the new key.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is out of scope of this PR: This can be the integration logic between preflight checker and key controller. Key controller won't create the new key, until it sees a kekID for the calculated hash of the given KMS plugin.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

changed this interaction to the key controller

@openshift-ci

openshift-ci Bot commented Jun 15, 2026

Copy link
Copy Markdown
Contributor

@tjungblu: all tests passed!

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

tjungblu added a commit to tjungblu/enhancements that referenced this pull request Jun 15, 2026
This covers an annotation-based approach to detect and migrate on external KEK changes in the KMS plugin architecture.

This design differs from openshift#2036 by centralizing everything in the existing key controller instead of creating a new rotation controller.
@ardaguclu

Copy link
Copy Markdown
Member

/lgtm
we definitely need this design in GA

@openshift-ci openshift-ci Bot added the lgtm Indicates that a PR is ready to be merged. label Jun 16, 2026
@ardaguclu

Copy link
Copy Markdown
Member

/lgtm cancel

@openshift-ci openshift-ci Bot removed the lgtm Indicates that a PR is ready to be merged. label Jun 17, 2026
@tjungblu

Copy link
Copy Markdown
Contributor Author

/close

we can circle back here if needed, otherwise this is superseded by #2041

@openshift-ci openshift-ci Bot closed this Jun 18, 2026
@openshift-ci

openshift-ci Bot commented Jun 18, 2026

Copy link
Copy Markdown
Contributor

@tjungblu: Closed this PR.

Details

In response to this:

/close

we can circle back here if needed, otherwise this is superseded by #2041

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. jira/valid-reference Indicates that this PR references a valid Jira ticket of any type.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants