Add installer/bootstrap-konnectivity-tunnel.md#1941
Conversation
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: The full list of commands accepted by this bot can be found here. DetailsNeeds approval from an approver in each of these files:Approvers can indicate their approval by writing |
8b9f582 to
156c5cc
Compare
| ### Workflow Description | ||
|
|
||
| This feature is transparent to the cluster administrator. | ||
| It requires no user action and does not change the `openshift-install` CLI interface. |
There was a problem hiding this comment.
Unfortunately this part will not be true in cases where users bring their own networks:
- baremetal/on-prem platforms
- azure byo vnet
Because the port management will need to be updated to open port 8091. That's not an unreasonable ask, but it is definitely not transparent.
There was a problem hiding this comment.
And I see you included this in open questions 👍
There was a problem hiding this comment.
Can we close this "open question" now that the installer PR is ready to merge?
| 1. How should the bootstrap script determine the bootstrap node IP? | ||
| This IP is required for the server certificate SAN and the agent `--proxy-server-host` argument. | ||
| 1. Clarify the impacts, if any, on SNO, MicroShift, and OKE. | ||
| For SNO, the bootstrap node is typically separate from the single cluster node, so the architecture still applies. |
There was a problem hiding this comment.
This is only true in the cloud. On-prem SNO uses bootstrap-in-place, and the bootstrap and control plane never interact.
There was a problem hiding this comment.
I'm testing e2e-metal-single-node-live-iso in the implementation to see if this adversely affects bootstrap in place, my understanding is that it will just route back to localhost.
My initial instinct was to not deploy this for bootstrap in place but considering this only uses 40m CPU, 50Mi memory, I think the reduced complexity of always deploying it is better than saving these resources.
There was a problem hiding this comment.
zane clarified that in single-node bootstrap in place the bootstrap and cluster control plane do not co-exist, so we cannot deploy konnectivity in that situation. we will modify bootkube to make the konnectivity deployment conditional.
| MicroShift does not use the bootstrap process and is likely not affected. | ||
| OKE uses the same installer and is likely affected in the same way as OCP. | ||
| These assumptions need confirmation. | ||
| 1. Determine all platforms requiring security group updates for TCP port 8091. |
There was a problem hiding this comment.
Please document this in https://github.com/openshift/enhancements/blob/master/dev-guide/host-port-registry.md
|
LGTM once existing comments are resolved. We can merge this and iterate during implementation as we answer open questions IMO |
|
|
||
| 10. The installer destroys the bootstrap node infrastructure. | ||
| The Konnectivity server is gone with it. | ||
| The production KAS instances have direct access to the pod network and do not need the tunnel. |
There was a problem hiding this comment.
The installer destroys the bootstrap node infrastructure
Quick question: For cloud platforms, this also implies deleting the rule that allows TCP/8091 to bootstrap node, right? Or are we leaving it behind?
There was a problem hiding this comment.
Background: On AWS, the bootstrap node shares the same security group(s) as the control plane nodes. So, adding TCP/8091 will need cleaning up post-bootstrap; otherwise, it will stay behind.
Looking at this change (still in-progress), I don't see the cleanup logic so just wanted to double-check 😅
There was a problem hiding this comment.
Sorry, nvmind me, I missed Patrick's openshift/installer#10344 (comment) 😩 (been a long day...)
nit: How about making the wordings a bit more explicit? Let's add the following to section Workflow:
- The konnectivity server listens on
TCP/8091 - The installer should configure platforms to open a security group rule/firewall rule to allow
TCP/8091(link to sectionCloud-specific Security Group Configuration). - After bootstrap, the installer
destroys bootstrap resources, including the new security/firewall rule.
IMO, allow TCP/8091 is an important workflow step so it should be mentioned here, which sounds clearer... That being said, it's just a nit 😅
| 4. The script creates cluster resources for the Konnectivity agent: | ||
| a. Resolves the `apiserver-network-proxy` image from the release payload. | ||
| b. Substitutes the image and bootstrap IP into the DaemonSet template and writes the resulting manifest to the manifests directory. | ||
| c. Writes the `openshift-installer-bootstrap` Namespace manifest to the manifests directory. |
There was a problem hiding this comment.
Looking at commit (still in progress), the namespace says openshift-bootstrap-konnectivity, but here the EP says openshift-installer-bootstrap. I just wanted to double-check which one is the correct 👀?
|
Inactive enhancement proposals go stale after 28d of inactivity. See https://github.com/openshift/enhancements#life-cycle for details. Mark the proposal as fresh by commenting If this proposal is safe to close now please do so with /lifecycle stale |
|
/remove-lifecycle stale |
|
@mdbooth: all tests passed! Full PR test history. Your PR dashboard. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
| ### Workflow Description | ||
|
|
||
| This feature is transparent to the cluster administrator. | ||
| It requires no user action and does not change the `openshift-install` CLI interface. |
There was a problem hiding this comment.
Can we close this "open question" now that the installer PR is ready to merge?
|
|
||
| ## Test Plan | ||
|
|
||
| **TBD** |
There was a problem hiding this comment.
Do we know how we can E2E this feature yet? I'd like to see one or more tests gated by both gates (OR semantic) so that we can verify this functionality during their promotions
There was a problem hiding this comment.
The bootstrap functionality is exercised by every e2e (except the live-iso bootstrap-in-place job which is permafailing anyway); so lack of regressions in the bootstrap/install test is our main quality signal. As konnectivity is only part of the bootstrapping process no artifacts should be left behind by the time e2e tests run.
IIUC the motivation for this is to allow setting webhook failurePolicy. We've already sanity checked it here:
openshift/cluster-capi-operator#486 (comment)
So perhaps the e2e test would be checking the failurePoilicy on the webhook, which would be impossible without konnectivity?
There was a problem hiding this comment.
Ack, so if we can't have failurePolicy: Deny without Konnectivity working, lets get that test in. If Konnectivity fails, cluster bootstrap then fails and it'll become fairly obvious fairly quickly that we regressed
Makes sense to me
| api-approvers: | ||
| - None | ||
| creation-date: 2026-02-12 | ||
| last-updated: 2026-02-12 |
There was a problem hiding this comment.
Let's update this to the most recent date?
| A proof-of-concept implementation is available at https://github.com/openshift/installer/pull/10280. | ||
| This PR will be closed and is not intended for direct implementation. | ||
| It serves as a working example to inform this enhancement. | ||
| The proposed solution may differ from the PoC in some details. |
There was a problem hiding this comment.
We can refer to openshift/installer#10344, which also has a reference to the PoC PR :D
| a. Determines the bootstrap node IP. | ||
| b. Generates a short-lived (1-day) self-signed CA. |
| #### Standalone Clusters | ||
|
|
||
| This enhancement targets standalone clusters. | ||
| It applies to all Installer-Provisioned Infrastructure (IPI) platforms. |
There was a problem hiding this comment.
What about UPI? We may need to inform customers to allow port 8091 from cluster nodes to bootstrap node.
| | `--server-cert` / `--server-key` | Paths on bootstrap node filesystem | Server certificate signed by the Konnectivity CA | | ||
| | `--cluster-cert` / `--cluster-key` | Same as server cert/key | Certificate used for cluster-facing traffic (reuses the server certificate) | |
There was a problem hiding this comment.
Flags --cluster-cert / --cluster-key are for "Server certificate signed by the Konnectivity CA" according to commit, right?
Also, flags --server-cert / --server-key should only be used if not using UDS socket so we don't need to mention them here? 🤔
| * `--agent-identifiers=default-route=true` — tells the server this agent can handle default-route traffic. | ||
| * mTLS certificates are mounted from the `konnectivity-agent-certs` Secret. | ||
|
|
||
| #### Cloud-specific Security Group Configuration |
There was a problem hiding this comment.
| #### Cloud-specific Security Group Configuration | |
| #### Cloud-specific Firewall Configuration |
Let's use "Firewall" here and elsewhere in this section? This sounds more platform-agnostic, WDYT?
| **Key properties:** | ||
|
|
||
| * A **self-signed Konnectivity CA** is generated with 1-day validity. | ||
| This CA exists only on the bootstrap node and is used to sign the server and agent certificates. |
There was a problem hiding this comment.
What about FIPS? I think implementation already generates FIPS-compliant certs, right? Should we mention it elsewhere in this enhancement?

An EP describing the investigation outcomes of a PoC implementing a Konnectivity tunnel between the bootstrap node and early cluster nodes.
Tracker: https://issues.redhat.com/browse/CORS-4334