CNTRLPLANE-3740: Add hypershift details for additional identity information sources#2050
Conversation
|
@liouk: This pull request references CNTRLPLANE-3740 which is a valid jira issue. Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the sub-task to target the "5.0.0" version, but no target version was set. DetailsIn response to this: Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: The full list of commands accepted by this bot can be found here. DetailsNeeds approval from an approver in each of these files:Approvers can indicate their approval by writing |
|
@liouk: all tests passed! Full PR test history. Your PR dashboard. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
|
This PR currently contains a set of open questions; my goal is to resolve these before merging and adjust the PR accordingly. Putting a hold until these are resolved. /hold |
| with upstream `v1beta1`, there's no type mismatch on the kubeapiserver generator output. The oauth-apiserver generator produces the oauth-apiserver's own `AuthenticationConfiguration` type, | ||
| which both topologies would need to depend on regardless. | ||
|
|
||
| To make this work we would need to do some minor refactoring/abstractions to the generators before extracting to a lib, such as CA cert & secret resolution and feature gate checks. |
There was a problem hiding this comment.
This was my concern going through the review - feature gates. Because in order for feature gates to work on hypershift, we need to propagate CPO's feature gates into hypershift-operator which will propagate through to CPO itself. That's what I'm doing in this PR .
So, soon enough, we will also have this dependency on CPO for feature gate propagation through hypershift-operator. It's probably not something we can't fix - maybe we can abstract these feature gates away from here for this proposal. Just wanted to point that out. :)
everettraven
left a comment
There was a problem hiding this comment.
Overall, I think this looks pretty good. A handful of comments on the open questions.
| When configuring the `authentication.config.openshift.io/cluster` resource to use the External OIDC feature in Hypershift, | ||
| the goal is the same: configure the oauth-apiserver to use its new external OIDC operation mode and enable it on the kube-apiserver | ||
| as a webhook authenticator. The oauth-server is still unused in this new mode of operation. |
There was a problem hiding this comment.
In hypershift, I think this configuration is done on the HostedCluster API. Let's make sure that is properly reflected here to prevent any confusion related to where things are configured on standalone vs hypershift.
| - Update the deployment predicate of the `openshift-oauth-apiserver` component to always create the oauth-apiserver deployment | ||
| (instead of only when authentication type is `IntegratedOAuth`). | ||
| - Update the deployment adapt function to configure the oauth-apiserver in the new external OIDC operation mode when authentication | ||
| type is `OIDC` in the `authentication.config.openshift.io/cluster` resource. | ||
| - When authentication type is `OIDC` in the `authentication.config.openshift.io/cluster` resource, the manifest adapter for the | ||
| `auth-config.yaml` manifest generates the required AuthenticationConfiguration object and serializes it into the manifest. With | ||
| the new mode, the manifest adapter will need to generate the updated AuthenticationConfiguration API introduced with this EP. | ||
| - Ensure that secrets required by the oauth-apiserver (e.g. `clientCredential.secret`, TLS CA bundles) are available on the management cluster and projected into the oauth-apiserver pod. |
There was a problem hiding this comment.
While this isn't necessarily a definitive yet, we have been considering having a net-new component instead of the oauth-apiserver in the future.
Does this approach account for the fact that we may end up needing to adopt the use of the net-new component in the future?
Maybe we should consider having this be a net-new component from the perspective of the CPO, but it just deploys the oauth-apiserver in the new mode? This might be a bit more work now, but would probably be easier to modify in the future if we do end up with a new component for this in the future.
There was a problem hiding this comment.
I did consider this initially, but decided against this approach until we'll need the new component, mainly because the suggested approach matches the reality of our current architecture naturally, and because for the moment there's really no concrete plan of whether/when we will go there.
In a nutshell, here's what we'll need for a new component:
- the framework won't allow multiple components with the same name
- currently, the
openshift-oauth-apiservercomponent is hardwired in multiple places (service, certs, webhook) - this means that we'll end up needing multiple new components and boilerplate (deployment, service, cert, webhook config, PDB, ...) and at the same time change how dependencies use the oauth-apiserver component
This feels like we'll be doing additional work which won't make sense until we need a completely separate new component; given that there's no concrete plan for that yet, the current suggested approach matches more closely the architecture we have in place in my view.
If you feel strongly about it, I'd be on board to go down that path, although I believe we'll benefit little in the future from that change, while we're increasing the current burden.
There was a problem hiding this comment.
I am currently doing a POC with both options in order to get a more detailed feeling of what either one would look like and what we'd get -- I'll report back with my findings.
| ##### Open Question: Compatibility across multiple Kubernetes API Servers | ||
|
|
||
| At the current state of HyperShift and the CPO v2, it seems like the CPO is baked into the payload, which means that the kube-apiserver version should be on par with CPO | ||
| within a payload. However, there has been discussion of evidence that this is not always true, and that we might still need to maintain compatibility with multiple | ||
| kube-apiserver versions. We need to clarify this before proceeding with decisions on how to manage the API types going forward. |
There was a problem hiding this comment.
I think some of the concern here may stem from the validation we added to prevent invalid KAS configurations where possible since HCP kind of eats KAS rollout failures IIRC.
More specifically: https://github.com/openshift/hypershift/blob/2d2b2d0805d36dcf401fdb5f3d913b9f7984ce42/support/validations/authentication.go#L51-L75
Maybe we ought to try to resolve that TODO in there to help have a more consistent validation pattern with the desired OCP installation version rather than always assuming the lowest possible version.
| Additionally, the oauth-apiserver introduces its own `AuthenticationConfiguration` type (`externaloidc/apis/authentication/v1alpha1`) for its `external-oidc` mode config | ||
| file format, which extends the upstream structure with `externalClaimsSources` and other structural differences (pointer fields, `omitempty` on `JWT`). | ||
|
|
||
| ##### Open Question: Replacing the internal copy with the upstream API |
There was a problem hiding this comment.
If this isn't causing any problems with the existing approach, should we just ignore this for now?
I think what we really need to be concerned about is potentially being OCP-version aware in our generation logic to prevent generating the oauth-apiserver config when we should be generating the KAS config instead.
| To simplify code maintenance and feature parity between the two topologies, we could extract the generators used in CAO to a library and reuse them in HyperShift; | ||
| both consume the same input (`configv1.Authentication`), both would produce `runtime.Object` through the same interface, and if we were to replace the internal type | ||
| with upstream `v1beta1`, there's no type mismatch on the kubeapiserver generator output. The oauth-apiserver generator produces the oauth-apiserver's own `AuthenticationConfiguration` type, | ||
| which both topologies would need to depend on regardless. |
There was a problem hiding this comment.
I'm +1 for re-use of existing logic and extracting to a library where we can.
| which both topologies would need to depend on regardless. | ||
|
|
||
| To make this work we would need to do some minor refactoring/abstractions to the generators before extracting to a lib, such as CA cert & secret resolution and feature gate checks. | ||
| Feature gate checking also needs abstraction, as CAO and HyperShift use different feature gate interfaces and may not share the same gate names for equivalent functionality. |
There was a problem hiding this comment.
It's true that gate names could differ, but in practice they shouldn't. I think there is desire to eventually align HyperShift's feature-gating behavior to align with standalone and actually use the o/api defined gate states.
That being said, we can ensure we only ever use the exact same gate names here.
There was a problem hiding this comment.
I agree overall, and indeed currently there's no such divergence -- it just felt more correct from a design perspective to make no such assumption for specific feature gates that control generation of specific fields, especially given that there's no actual link between HCP's and OCP's feature gate definitions.
I'll transform this to a note more in the style of a requirement rather than something we'll need to bake in 👍
| There is also a difference in how HyperShift and Standalone validate the generated configuration. CAO does inline validation during generation (CEL expression compilation, `email_verified` enforcement | ||
| when `claims.email` is used in username, SA issuer URL overlap check, CA cert reachability via HTTP to OIDC discovery endpoint), while HyperShift defers to | ||
| upstream `ValidateAuthenticationConfiguration` which does not cover these checks. If generators move to a shared lib, these validations should follow so both topologies benefit. |
There was a problem hiding this comment.
Where possible, I'd like to get the CAO aligned with HyperShift here and use the "upstream" validations (both in the KAS configuration and the oauth-apiserver configuration side of things - these are "upstream" of the $things configuring them).
No description provided.