Skip to content

Add installer/bootstrap-konnectivity-tunnel.md#1941

Open
mdbooth wants to merge 4 commits into
openshift:masterfrom
openshift-cloud-team:bootstrap-konnectivity
Open

Add installer/bootstrap-konnectivity-tunnel.md#1941
mdbooth wants to merge 4 commits into
openshift:masterfrom
openshift-cloud-team:bootstrap-konnectivity

Conversation

@mdbooth

@mdbooth mdbooth commented Feb 12, 2026

Copy link
Copy Markdown
Contributor

An EP describing the investigation outcomes of a PoC implementing a Konnectivity tunnel between the bootstrap node and early cluster nodes.

Tracker: https://issues.redhat.com/browse/CORS-4334

@openshift-ci

openshift-ci Bot commented Feb 12, 2026

Copy link
Copy Markdown
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please assign mandre for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

### Workflow Description

This feature is transparent to the cluster administrator.
It requires no user action and does not change the `openshift-install` CLI interface.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortunately this part will not be true in cases where users bring their own networks:

  • baremetal/on-prem platforms
  • azure byo vnet

Because the port management will need to be updated to open port 8091. That's not an unreasonable ask, but it is definitely not transparent.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And I see you included this in open questions 👍

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we close this "open question" now that the installer PR is ready to merge?

1. How should the bootstrap script determine the bootstrap node IP?
This IP is required for the server certificate SAN and the agent `--proxy-server-host` argument.
1. Clarify the impacts, if any, on SNO, MicroShift, and OKE.
For SNO, the bootstrap node is typically separate from the single cluster node, so the architecture still applies.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is only true in the cloud. On-prem SNO uses bootstrap-in-place, and the bootstrap and control plane never interact.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm testing e2e-metal-single-node-live-iso in the implementation to see if this adversely affects bootstrap in place, my understanding is that it will just route back to localhost.

My initial instinct was to not deploy this for bootstrap in place but considering this only uses 40m CPU, 50Mi memory, I think the reduced complexity of always deploying it is better than saving these resources.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

zane clarified that in single-node bootstrap in place the bootstrap and cluster control plane do not co-exist, so we cannot deploy konnectivity in that situation. we will modify bootkube to make the konnectivity deployment conditional.

MicroShift does not use the bootstrap process and is likely not affected.
OKE uses the same installer and is likely affected in the same way as OCP.
These assumptions need confirmation.
1. Determine all platforms requiring security group updates for TCP port 8091.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@JoelSpeed

Copy link
Copy Markdown
Contributor

LGTM once existing comments are resolved. We can merge this and iterate during implementation as we answer open questions IMO


10. The installer destroys the bootstrap node infrastructure.
The Konnectivity server is gone with it.
The production KAS instances have direct access to the pod network and do not need the tunnel.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The installer destroys the bootstrap node infrastructure

Quick question: For cloud platforms, this also implies deleting the rule that allows TCP/8091 to bootstrap node, right? Or are we leaving it behind?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Background: On AWS, the bootstrap node shares the same security group(s) as the control plane nodes. So, adding TCP/8091 will need cleaning up post-bootstrap; otherwise, it will stay behind.

Looking at this change (still in-progress), I don't see the cleanup logic so just wanted to double-check 😅

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, nvmind me, I missed Patrick's openshift/installer#10344 (comment) 😩 (been a long day...)

nit: How about making the wordings a bit more explicit? Let's add the following to section Workflow:

  1. The konnectivity server listens on TCP/8091
  2. The installer should configure platforms to open a security group rule/firewall rule to allow TCP/8091 (link to section Cloud-specific Security Group Configuration).
  3. After bootstrap, the installer destroys bootstrap resources, including the new security/firewall rule.

IMO, allow TCP/8091 is an important workflow step so it should be mentioned here, which sounds clearer... That being said, it's just a nit 😅

4. The script creates cluster resources for the Konnectivity agent:
a. Resolves the `apiserver-network-proxy` image from the release payload.
b. Substitutes the image and bootstrap IP into the DaemonSet template and writes the resulting manifest to the manifests directory.
c. Writes the `openshift-installer-bootstrap` Namespace manifest to the manifests directory.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking at commit (still in progress), the namespace says openshift-bootstrap-konnectivity, but here the EP says openshift-installer-bootstrap. I just wanted to double-check which one is the correct 👀?

@openshift-bot

Copy link
Copy Markdown

Inactive enhancement proposals go stale after 28d of inactivity.

See https://github.com/openshift/enhancements#life-cycle for details.

Mark the proposal as fresh by commenting /remove-lifecycle stale.
Stale proposals rot after an additional 7d of inactivity and eventually close.
Exclude this proposal from closing by commenting /lifecycle frozen.

If this proposal is safe to close now please do so with /close.

/lifecycle stale

@openshift-ci openshift-ci Bot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Apr 24, 2026
@tthvo

tthvo commented Apr 24, 2026

Copy link
Copy Markdown
Member

/remove-lifecycle stale

@openshift-ci openshift-ci Bot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Apr 24, 2026
@openshift-ci

openshift-ci Bot commented Jun 11, 2026

Copy link
Copy Markdown
Contributor

@mdbooth: all tests passed!

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

### Workflow Description

This feature is transparent to the cluster administrator.
It requires no user action and does not change the `openshift-install` CLI interface.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we close this "open question" now that the installer PR is ready to merge?


## Test Plan

**TBD**

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we know how we can E2E this feature yet? I'd like to see one or more tests gated by both gates (OR semantic) so that we can verify this functionality during their promotions

@patrickdillon patrickdillon Jun 24, 2026

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The bootstrap functionality is exercised by every e2e (except the live-iso bootstrap-in-place job which is permafailing anyway); so lack of regressions in the bootstrap/install test is our main quality signal. As konnectivity is only part of the bootstrapping process no artifacts should be left behind by the time e2e tests run.

IIUC the motivation for this is to allow setting webhook failurePolicy. We've already sanity checked it here:
openshift/cluster-capi-operator#486 (comment)

So perhaps the e2e test would be checking the failurePoilicy on the webhook, which would be impossible without konnectivity?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ack, so if we can't have failurePolicy: Deny without Konnectivity working, lets get that test in. If Konnectivity fails, cluster bootstrap then fails and it'll become fairly obvious fairly quickly that we regressed

Makes sense to me

api-approvers:
- None
creation-date: 2026-02-12
last-updated: 2026-02-12

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's update this to the most recent date?

Comment on lines +46 to +49
A proof-of-concept implementation is available at https://github.com/openshift/installer/pull/10280.
This PR will be closed and is not intended for direct implementation.
It serves as a working example to inform this enhancement.
The proposed solution may differ from the PoC in some details.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can refer to openshift/installer#10344, which also has a reference to the PoC PR :D

Comment on lines +128 to +129
a. Determines the bootstrap node IP.
b. Generates a short-lived (1-day) self-signed CA.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

styling: A spacing issue here because the rendered markdown shows these bullet points inline instead of on their own? Same for others...

Image

#### Standalone Clusters

This enhancement targets standalone clusters.
It applies to all Installer-Provisioned Infrastructure (IPI) platforms.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about UPI? We may need to inform customers to allow port 8091 from cluster nodes to bootstrap node.

Comment on lines +322 to +323
| `--server-cert` / `--server-key` | Paths on bootstrap node filesystem | Server certificate signed by the Konnectivity CA |
| `--cluster-cert` / `--cluster-key` | Same as server cert/key | Certificate used for cluster-facing traffic (reuses the server certificate) |

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Flags --cluster-cert / --cluster-key are for "Server certificate signed by the Konnectivity CA" according to commit, right?

Also, flags --server-cert / --server-key should only be used if not using UDS socket so we don't need to mention them here? 🤔

* `--agent-identifiers=default-route=true` — tells the server this agent can handle default-route traffic.
* mTLS certificates are mounted from the `konnectivity-agent-certs` Secret.

#### Cloud-specific Security Group Configuration

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
#### Cloud-specific Security Group Configuration
#### Cloud-specific Firewall Configuration

Let's use "Firewall" here and elsewhere in this section? This sounds more platform-agnostic, WDYT?

Comment on lines +278 to +281
**Key properties:**

* A **self-signed Konnectivity CA** is generated with 1-day validity.
This CA exists only on the bootstrap node and is used to sign the server and agent certificates.

@tthvo tthvo Jun 25, 2026

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about FIPS? I think implementation already generates FIPS-compliant certs, right? Should we mention it elsewhere in this enhancement?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants