Skip to content

OCPBUGS-86058: warn about Authentication CR key naming requirement for STS provisioning#2911

Open
chdeshpa-hue wants to merge 2 commits into
openshift:masterfrom
chdeshpa-hue:docs/aws-sts-manifest-key-naming
Open

OCPBUGS-86058: warn about Authentication CR key naming requirement for STS provisioning#2911
chdeshpa-hue wants to merge 2 commits into
openshift:masterfrom
chdeshpa-hue:docs/aws-sts-manifest-key-naming

Conversation

@chdeshpa-hue
Copy link
Copy Markdown

@chdeshpa-hue chdeshpa-hue commented May 20, 2026

Summary

Clarify the manifest secret key naming requirement for the Authentication CR
when provisioning AWS STS/IRSA clusters via Hive.

When Hive extracts manifestsSecretRef entries into the installer's manifest
directory, each secret key becomes the filename on disk. The kube-apiserver
bootstrap render step reads the Authentication CR from a hardcoded path:

--cluster-auth-file=/assets/manifests/cluster-authentication-02-config.yaml

If the secret key doesn't match this exact filename, the bootstrap kube-apiserver
silently starts with the default serviceAccountIssuer
(https://kubernetes.default.svc) instead of the custom S3 OIDC issuer. This
causes machine-api-controllers to receive tokens with the wrong issuer, and
AWS STS rejects them with InvalidIdentityToken. Workers never provision and
the install times out.

Other credential manifests (operator Secrets) are not affected — they are
applied by GVK/content, not by filename. Only the Authentication CR has this
filename dependency.

Changes

  • Added a WARNING callout in the "Create Hive ClusterDeployment" section
    about the required key name cluster-authentication-02-config.yaml
  • Expanded manifest secret creation guidance with --from-file as the
    recommended approach (preserves canonical filenames automatically)
  • Added a Troubleshooting section for the common InvalidIdentityToken
    failure mode with diagnostic steps and remediation

Why this matters

A customer experienced repeated STS install timeouts because their
manifestsSecretRef used a non-canonical key name for the Authentication CR.
The fix was simply using the correct key name — no code change required.
This documentation change prevents the same misconfiguration for future users.

Technical detail

The hardcoded filename expectation exists in:

  • installer/data/data/bootstrap/files/usr/local/bin/bootkube.sh.template:
    --cluster-auth-file=/assets/manifests/cluster-authentication-02-config.yaml
  • cluster-kube-apiserver-operator/pkg/cmd/render/render.go:
    --cluster-auth-file flag with silent os.IsNotExist handling

Made with Cursor

Summary by CodeRabbit

  • Documentation
    • Clarified AWS STS provisioning steps, including creating the installer-manifests Secret from generated installer output
    • Added a prominent warning about required Authentication manifest presence and exact key naming to ensure token issuer validation
    • Added troubleshooting for install timeouts with invalid identity tokens, plus diagnostic commands and guidance to recreate the manifest Secret

…visioning

When creating the manifestsSecretRef for AWS STS/IRSA clusters, the
Authentication CR must use the exact key name
cluster-authentication-02-config.yaml. The kube-apiserver bootstrap
render command reads this manifest from a hardcoded path. If the key
name differs, the custom serviceAccountIssuer is silently ignored and
the install fails with token issuer mismatch.

Add a warning about this requirement, recommend using --from-file to
preserve canonical filenames, and add troubleshooting guidance for the
common InvalidIdentityToken failure.

Co-authored-by: Cursor <cursoragent@cursor.com>
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 20, 2026

📝 Walkthrough

Walkthrough

This pull request updates AWS STS provisioning docs: adds an explicit step to create the installer-manifests Secret from ccoctl output, warns that the Authentication CR file must be named cluster-authentication-02-config.yaml, and adds troubleshooting steps for InvalidIdentityToken timeouts.

Changes

AWS STS Provisioning Documentation

Layer / File(s) Summary
Secret creation step
docs/aws-sts-provisioning.md
Adds a numbered instruction to create the cluster-manifests Secret from ccoctl _output/manifests/ using --from-file so filenames become secret keys, and wires this Secret into ClusterDeployment/InstallConfig guidance.
Troubleshooting and key-name warning
docs/aws-sts-provisioning.md
Adds troubleshooting for InvalidIdentityToken timeouts, warns the Authentication CR must be stored under the exact cluster-authentication-02-config.yaml key (else bootstrap falls back to default serviceAccountIssuer), and provides commands to check serviceAccountIssuer, verify secret data key names, and recreate the manifest Secret with --from-file=_output/manifests/.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~8 minutes

Poem

A rabbit hops where manifests lie,
Names must match or tokens cry,
Create the secret from the output tree,
Check issuers, keys, and then you’re free! 🐰

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title directly matches the main change: adding a warning about Authentication CR key naming requirement (cluster-authentication-02-config.yaml) for AWS STS provisioning in Hive documentation.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@openshift-ci openshift-ci Bot requested review from dlom and suhanime May 20, 2026 10:42
@openshift-ci openshift-ci Bot added the needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. label May 20, 2026
@openshift-ci
Copy link
Copy Markdown
Contributor

openshift-ci Bot commented May 20, 2026

Hi @chdeshpa-hue. Thanks for your PR.

I'm waiting for a openshift member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work.

Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (1)
docs/aws-sts-provisioning.md (1)

90-92: ⚡ Quick win

Add language specifier to code block.

The code block showing the error message should have a language specifier for proper syntax highlighting and to satisfy markdown linting rules.

📝 Proposed fix
-```
+```text
 error assuming role: InvalidIdentityToken: Token issuer does not match provider
</details>

<details>
<summary>🤖 Prompt for AI Agents</summary>

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In @docs/aws-sts-provisioning.md around lines 90 - 92, The fenced code block
containing the error message "error assuming role: InvalidIdentityToken: Token
issuer does not match provider" should include a language specifier to satisfy
markdown linting; update the triple-backtick fence that surrounds that string to
use "text" (i.e., ```text) so the block is rendered/highlighted correctly.


</details>

</blockquote></details>

</blockquote></details>

<details>
<summary>🤖 Prompt for all review comments with AI agents</summary>

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In @docs/aws-sts-provisioning.md:

  • Line 109: The JSONPath snippet using "{range .data}{@.key}{"\n"}{end}" fails
    because kubectl/oc JSONPath cannot range over maps (".data" is a map), so
    replace the jsonpath output with a supported alternative: use a go-template that
    ranges over .data keys (replace the "-o jsonpath=..." form in the oc get secret cluster-manifests -n <namespace> -o jsonpath command) or output JSON and pipe
    to jq to list .data | keys[] (replace the same oc get secret cluster-manifests ... -o jsonpath usage). Ensure the docs show one of these two
    alternatives instead of the JSONPath range example.

Nitpick comments:
In @docs/aws-sts-provisioning.md:

  • Around line 90-92: The fenced code block containing the error message "error
    assuming role: InvalidIdentityToken: Token issuer does not match provider"
    should include a language specifier to satisfy markdown linting; update the
    triple-backtick fence that surrounds that string to use "text" (i.e., ```text)
    so the block is rendered/highlighted correctly.

</details>

<details>
<summary>🪄 Autofix (Beta)</summary>

Fix all unresolved CodeRabbit comments on this PR:

- [ ] <!-- {"checkboxId": "4b0d0e0a-96d7-4f10-b296-3a18ea78f0b9"} --> Push a commit to this branch (recommended)
- [ ] <!-- {"checkboxId": "ff5b1114-7d8c-49e6-8ac1-43f82af23a33"} --> Create a new PR with the fixes

</details>

---

<details>
<summary>ℹ️ Review info</summary>

<details>
<summary>⚙️ Run configuration</summary>

**Configuration used**: Path: .coderabbit.yaml

**Review profile**: CHILL

**Plan**: Enterprise

**Run ID**: `d77a2e57-4739-4dc1-a43f-0c26f2562093`

</details>

<details>
<summary>📥 Commits</summary>

Reviewing files that changed from the base of the PR and between ab4b2490385a31a5481a0ef69a40a71f88ac2faf and 7b758cdb9b5f577ed1eba4ce2c90d8ab331cede0.

</details>

<details>
<summary>📒 Files selected for processing (1)</summary>

* `docs/aws-sts-provisioning.md`

</details>

</details>

<!-- This is an auto-generated comment by CodeRabbit for review status -->

Comment thread docs/aws-sts-provisioning.md Outdated
@chdeshpa-hue chdeshpa-hue changed the title docs: warn about Authentication CR key naming requirement for STS provisioning OCPBUGS-86058: warn about Authentication CR key naming requirement for STS provisioning May 20, 2026
@openshift-ci-robot openshift-ci-robot added jira/severity-important Referenced Jira bug's severity is important for the branch this PR is targeting. jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. jira/invalid-bug Indicates that a referenced Jira bug is invalid for the branch this PR is targeting. labels May 20, 2026
@openshift-ci-robot
Copy link
Copy Markdown

@chdeshpa-hue: This pull request references Jira Issue OCPBUGS-86058, which is invalid:

  • expected the bug to target the "5.0.0" version, but no target version was set

Comment /jira refresh to re-evaluate validity if changes to the Jira bug are made, or edit the title of this pull request to link to a different bug.

The bug has been updated to refer to the pull request using the external bug tracker.

Details

In response to this:

Summary

Clarify the manifest secret key naming requirement for the Authentication CR
when provisioning AWS STS/IRSA clusters via Hive.

When Hive extracts manifestsSecretRef entries into the installer's manifest
directory, each secret key becomes the filename on disk. The kube-apiserver
bootstrap render step reads the Authentication CR from a hardcoded path:

--cluster-auth-file=/assets/manifests/cluster-authentication-02-config.yaml

If the secret key doesn't match this exact filename, the bootstrap kube-apiserver
silently starts with the default serviceAccountIssuer
(https://kubernetes.default.svc) instead of the custom S3 OIDC issuer. This
causes machine-api-controllers to receive tokens with the wrong issuer, and
AWS STS rejects them with InvalidIdentityToken. Workers never provision and
the install times out.

Other credential manifests (operator Secrets) are not affected — they are
applied by GVK/content, not by filename. Only the Authentication CR has this
filename dependency.

Changes

  • Added a WARNING callout in the "Create Hive ClusterDeployment" section
    about the required key name cluster-authentication-02-config.yaml
  • Expanded manifest secret creation guidance with --from-file as the
    recommended approach (preserves canonical filenames automatically)
  • Added a Troubleshooting section for the common InvalidIdentityToken
    failure mode with diagnostic steps and remediation

Why this matters

A customer experienced repeated STS install timeouts because their
manifestsSecretRef used a non-canonical key name for the Authentication CR.
The fix was simply using the correct key name — no code change required.
This documentation change prevents the same misconfiguration for future users.

Technical detail

The hardcoded filename expectation exists in:

  • installer/data/data/bootstrap/files/usr/local/bin/bootkube.sh.template:
    --cluster-auth-file=/assets/manifests/cluster-authentication-02-config.yaml
  • cluster-kube-apiserver-operator/pkg/cmd/render/render.go:
    --cluster-auth-file flag with silent os.IsNotExist handling

Made with Cursor

Summary by CodeRabbit

  • Documentation
  • Enhanced AWS STS provisioning guide with clearer instructions for creating the installer-manifests Secret
  • Added warning about Authentication CR manifest requirements to ensure proper token validation
  • Added troubleshooting section for timeout errors with InvalidIdentityToken, including diagnostic commands

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@chdeshpa-hue
Copy link
Copy Markdown
Author

/jira refresh

@openshift-ci-robot openshift-ci-robot added jira/valid-bug Indicates that a referenced Jira bug is valid for the branch this PR is targeting. and removed jira/invalid-bug Indicates that a referenced Jira bug is invalid for the branch this PR is targeting. labels May 20, 2026
@openshift-ci-robot
Copy link
Copy Markdown

@chdeshpa-hue: This pull request references Jira Issue OCPBUGS-86058, which is valid. The bug has been moved to the POST state.

3 validation(s) were run on this bug
  • bug is open, matching expected state (open)
  • bug target version (5.0.0) matches configured target version for branch (5.0.0)
  • bug is in the state New, which is one of the valid states (NEW, ASSIGNED, POST)
Details

In response to this:

/jira refresh

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

Copy link
Copy Markdown
Member

@2uasimojo 2uasimojo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a nice find and a solid improvement.

Based on the verbosity I'm assuming an agent generated the delta. It's not bad, but let's tighten things up a bit. Specifically, I feel we don't need to go into nearly as much detail in the procedure section. Let's limit that part to emphasizing the --from-file=<dir> recommendation, and fold the remaining explanation into the Troubleshooting item. If we must mention this specific issue in the procedure section, let's keep it brief, along the lines of: "WARNING: Altering original filenames can have <link to troubleshooting section>consequences</link>"

- Condense verbose WARNING block in procedure section to a brief
  one-liner linking to the Troubleshooting section (per 2uasimojo)
- Move detailed technical explanation (hardcoded path, silent fallback,
  affected/unaffected manifests) into the Troubleshooting section
- Replace broken jsonpath command with go-template for listing secret
  data keys — jsonpath range only works on arrays, not maps (confirmed
  by 2uasimojo and CodeRabbit)
- Add text language specifier to error message code block

Co-authored-by: Cursor <cursoragent@cursor.com>
@chdeshpa-hue
Copy link
Copy Markdown
Author

@2uasimojo Thanks for the review — good call on tightening the procedure section. Pushed an update:

  • Condensed the WARNING in the procedure section to a 3-line note with a link to Troubleshooting (as you suggested)
  • Moved the detailed explanation (hardcoded path, silent fallback, which manifests are/aren't affected) into the Troubleshooting section where it belongs
  • Fixed the broken jsonpath command — replaced with go-template since range doesn't work on maps (as you confirmed)
  • Added text language specifier to the error message code block (CodeRabbit nitpick)

@openshift-ci-robot
Copy link
Copy Markdown

@chdeshpa-hue: This pull request references Jira Issue OCPBUGS-86058, which is valid.

3 validation(s) were run on this bug
  • bug is open, matching expected state (open)
  • bug target version (5.0.0) matches configured target version for branch (5.0.0)
  • bug is in the state POST, which is one of the valid states (NEW, ASSIGNED, POST)
Details

In response to this:

Summary

Clarify the manifest secret key naming requirement for the Authentication CR
when provisioning AWS STS/IRSA clusters via Hive.

When Hive extracts manifestsSecretRef entries into the installer's manifest
directory, each secret key becomes the filename on disk. The kube-apiserver
bootstrap render step reads the Authentication CR from a hardcoded path:

--cluster-auth-file=/assets/manifests/cluster-authentication-02-config.yaml

If the secret key doesn't match this exact filename, the bootstrap kube-apiserver
silently starts with the default serviceAccountIssuer
(https://kubernetes.default.svc) instead of the custom S3 OIDC issuer. This
causes machine-api-controllers to receive tokens with the wrong issuer, and
AWS STS rejects them with InvalidIdentityToken. Workers never provision and
the install times out.

Other credential manifests (operator Secrets) are not affected — they are
applied by GVK/content, not by filename. Only the Authentication CR has this
filename dependency.

Changes

  • Added a WARNING callout in the "Create Hive ClusterDeployment" section
    about the required key name cluster-authentication-02-config.yaml
  • Expanded manifest secret creation guidance with --from-file as the
    recommended approach (preserves canonical filenames automatically)
  • Added a Troubleshooting section for the common InvalidIdentityToken
    failure mode with diagnostic steps and remediation

Why this matters

A customer experienced repeated STS install timeouts because their
manifestsSecretRef used a non-canonical key name for the Authentication CR.
The fix was simply using the correct key name — no code change required.
This documentation change prevents the same misconfiguration for future users.

Technical detail

The hardcoded filename expectation exists in:

  • installer/data/data/bootstrap/files/usr/local/bin/bootkube.sh.template:
    --cluster-auth-file=/assets/manifests/cluster-authentication-02-config.yaml
  • cluster-kube-apiserver-operator/pkg/cmd/render/render.go:
    --cluster-auth-file flag with silent os.IsNotExist handling

Made with Cursor

Summary by CodeRabbit

  • Documentation
  • Clarified AWS STS provisioning steps, including creating the installer-manifests Secret from generated installer output
  • Added a prominent warning about required Authentication manifest presence and exact key naming to ensure token issuer validation
  • Added troubleshooting for install timeouts with invalid identity tokens, plus diagnostic commands and guidance to recreate the manifest Secret

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@docs/aws-sts-provisioning.md`:
- Around line 51-53: The fenced code block containing "kubectl create secret
generic cluster-manifests --from-file=_output/manifests/" lacks a language
specifier; update the markdown fenced block to include a language identifier
(e.g., "bash") so the block becomes ```bash ... ``` to satisfy MD040 and enable
syntax highlighting.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 9a46db80-986b-487f-853d-77ba37d7a765

📥 Commits

Reviewing files that changed from the base of the PR and between 7b758cd and 1a9aeac.

📒 Files selected for processing (1)
  • docs/aws-sts-provisioning.md

Comment on lines 51 to 53
```
kubectl create secret generic cluster-manifests --from-file=_output/manifests/
```
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Add language specifier to fenced code block.

The code block is missing a language identifier, which prevents proper syntax highlighting.

📝 Proposed fix
-  ```
+  ```bash
   kubectl create secret generic cluster-manifests --from-file=_output/manifests/
</details>

As per coding guidelines, the static analysis tool markdownlint-cli2 flagged this as MD040 (fenced-code-language).

<!-- suggestion_start -->

<details>
<summary>📝 Committable suggestion</summary>

> ‼️ **IMPORTANT**
> Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

```suggestion

🧰 Tools
🪛 markdownlint-cli2 (0.22.1)

[warning] 51-51: Fenced code blocks should have a language specified

(MD040, fenced-code-language)

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@docs/aws-sts-provisioning.md` around lines 51 - 53, The fenced code block
containing "kubectl create secret generic cluster-manifests
--from-file=_output/manifests/" lacks a language specifier; update the markdown
fenced block to include a language identifier (e.g., "bash") so the block
becomes ```bash ... ``` to satisfy MD040 and enable syntax highlighting.

Copy link
Copy Markdown
Member

@2uasimojo 2uasimojo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

@openshift-ci openshift-ci Bot added the lgtm Indicates that a PR is ready to be merged. label May 21, 2026
@openshift-ci
Copy link
Copy Markdown
Contributor

openshift-ci Bot commented May 21, 2026

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: 2uasimojo, chdeshpa-hue

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci openshift-ci Bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label May 21, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. jira/severity-important Referenced Jira bug's severity is important for the branch this PR is targeting. jira/valid-bug Indicates that a referenced Jira bug is valid for the branch this PR is targeting. jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. lgtm Indicates that a PR is ready to be merged. needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants