Skip to content

ci: re-enable in-process Helm render in validate (revert --skip-helm-render)#2273

Open
devantler wants to merge 5 commits into
mainfrom
claude/ci-restore-render
Open

ci: re-enable in-process Helm render in validate (revert --skip-helm-render)#2273
devantler wants to merge 5 commits into
mainfrom
claude/ci-restore-render

Conversation

@devantler

@devantler devantler commented Jun 25, 2026

Copy link
Copy Markdown
Contributor

🤖 Generated by the Daily AI Assistant

Why

We temporarily reduced manifest validation (#2270) to dodge a third-party tool bug; running with less validation means mistakes could reach the cluster unseen.

What

Restores full validation, staged so it can merge the moment the tool's fix ships.

⚠️ Do not merge yet: stays red until the upstream fix (yannh/kubeconform#363) is released and reaches ksail.

…render)

Removes the --skip-helm-render workaround added in #2270 and restores
full in-process rendering for `ksail workload validate`, so the
actually-applied manifests are validated again.

The non-determinism that forced the workaround was a buffer-aliasing data
race in kubeconform's resource.FromStream (ksail#5362), fixed upstream in
yannh/kubeconform#363 — not the Helm render itself. Merge only after
ksail's kubeconform dependency is bumped to include that fix, or validate
goes flaky again.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
@coderabbitai

coderabbitai Bot commented Jun 27, 2026

Copy link
Copy Markdown

Review Change Stack

📝 Walkthrough

Walkthrough

The CI workflow’s manifest validation job removes --skip-helm-render from validation commands for both default and prod configurations, and updates inline comments to reflect the upstream kubeconform race fix with a TRIPWIRE note.

Changes

CI Manifest Validation Update

Layer / File(s) Summary
Update comments and validation commands
.github/workflows/ci.yaml
Comments describing the previous workaround now note the upstream fix and TRIPWIRE, and both validation invocations no longer pass --skip-helm-render.

Estimated code review effort: 1 (Trivial) | ~3 minutes

Possibly related issues

  • devantler-tech/ksail#5362 — The workflow change removes the Helm-render workaround tied to the validation race described in this issue.
🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.
Title check ✅ Passed The title accurately summarizes the main change: restoring Helm rendering validation and removing the skip flag.
Description check ✅ Passed The description is directly related to the PR and explains the validation rollback and upstream-fix dependency.
✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch claude/ci-restore-render

Comment @coderabbitai help to get the list of available commands.

@devantler

Copy link
Copy Markdown
Contributor Author

🤖 Generated by the Daily AI Assistant

Why this PR's CI is red (documented so the failure is explained, per the open-PR hygiene sweep):

This PR re-enables the in-process Helm render — which re-exposes the kubeconform#363 FromStream buffer-alias race it was skipped to avoid. Two consecutive 🧪 Validate Manifests runs failed with different randomly-corrupted resources, the race's signature:

  • run 28542-era: ScheduledBackup/umami/umami-db-daily: EOF
  • run 28538578582 (today's re-run): providers/docker/infrastructure/controllers: ... yaml: line 54: mapping values are not allowed in this context

Re-running is a coin flip, not a fix. The root-cause fix is upstream (yannh/kubeconform#363, fix PR submitted); a ksail-side replace to a fork is ruled out (supply-chain policy). This PR stays draft-parked until ksail ships a kubeconform release containing the fix, then goes green and merges (tracked in ksail#5362).

Side-observation from the render log: the in-process render also surfaces the actual-budget chart-schema failure (login.openid oneOf) that #2359 fixes — one more reason to promote #2359.

@devantler

Copy link
Copy Markdown
Contributor Author

🤖 Generated by the Daily AI Assistant

Updated diagnosis (518th-run sweep): this revert is now blocked by two things, not just the kubeconform race —

  1. kubeconform#363 YAML-corruption race (the original gate): upstream fix PR is green and awaiting the kubeconform maintainer; after it lands ksail must bump kubeconform before this can go green reliably.
  2. NEW — the local overlay's actual-budget values are not schema-valid: with the in-process Helm render re-enabled, helm template fails deterministically on providers/docker at the chart's values schema (/login/openid oneOf — empty clientSecret/existingSecret/dicovertUrl [sic, the chart's own typo'd alt branch]). fix(actual-budget): inject OPENID env past chart ingress gate, keep schema-valid login.openid #2359 only made the prod values schema-valid (filler + the live actual-budget-oidc Secret); the local overlay needs the same schema-valid filler before this revert can pass. Main never hits this because it validates with --skip-helm-render.

Next actionable step for this PR: add the #2359-style schema-valid filler to the local actual-budget values (can ride in this branch — it only affects the render-enabled validation path), then wait out (1).

@devantler

Copy link
Copy Markdown
Contributor Author

🤖 Generated by the Daily AI Assistant

Correction to yesterday's re-diagnosis — deeper local repro (6× ksail workload validate on this branch) shows the actual-budget login.openid oneOf message is only a render-skip warning, not a deterministic failure: ksail falls back to validating the HelmRelease as-is and the leg passes whenever the render race doesn't fire (3/6 runs fully green, 101/101).

The sole real blocker is the known kubeconform YAML-corruption race (kubeconform#363), and on this branch it consistently bites the largest rendered documentCustomResourceDefinition/ephemeralreports.reports.kyverno.io (from HelmRelease kyverno/kyverno, providers/docker/infrastructure/controllers, same yaml: line 145 corruption in all 3 failing runs). So the original gate stands unchanged: merge only after ksail ships a kubeconform with #363 fixed. The "add a #2359-style local filler" next-step from yesterday is retracted — no values change is needed for CI here.

(Separately, making the base's schema-satisfaction offline-resolvable — existingSecret instead of the valuesFrom-injected clientSecret — would let CI really render-validate actual-budget instead of warning-skipping it; filed as its own issue since it touches the live prod values path.)

I've re-run the failed check; expect ~coin-flip greens until the upstream fix lands.

@devantler

Copy link
Copy Markdown
Contributor Author

🤖 Generated by the Daily AI Assistant

Status note on the promotion: this revert stays red by design until the root cause ships — the 🧪 Validate Manifests failure is the kubeconform YAML-corruption race itself (yannh/kubeconform#363, my upstream fix, still open as of 2026-07-02). The merge order is: kubeconform releases with #363 → ksail bumps kubeconform → this PR's gate turns green → merge restores in-process Helm render validation. I sweep the upstream PR every run; no action needed here — promoting it early is fine, it will be driven to merge the moment the chain unblocks.

@devantler

Copy link
Copy Markdown
Contributor Author

🤖 Generated by the Daily AI Assistant

@coderabbitai review

@coderabbitai

coderabbitai Bot commented Jul 3, 2026

Copy link
Copy Markdown

@devantler Understood — kicking off a review now, though per the discussion this PR is expected to remain red until the kubeconform #363 fix lands upstream and ksail bumps its dependency.

✅ Action performed

Review finished.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

@devantler

Copy link
Copy Markdown
Contributor Author

@coderabbitai review

@coderabbitai

coderabbitai Bot commented Jul 3, 2026

Copy link
Copy Markdown
✅ Action performed

Review finished.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: 🫴 Ready

Development

Successfully merging this pull request may close these issues.

1 participant