fix(talos): accept the actions publish-app identity at node image verification#2457
Conversation
…ification The shared publish-app.yaml workflow moved from reusable-workflows into actions (actions#425); the Talos ImageVerificationConfig catch-all still pinned the old identity, so every newly published app image fails node-level pull verification (ImagePullBackOff). Alternate both identities, same transition as the Flux OCIRepository verify blocks and verify-app-images. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
📝 WalkthroughWalkthroughTalos image verification config was updated to accept keyless signatures for ChangesImage verification rule update
Estimated code review effort: 1 (Trivial) | ~5 minutes Suggested labels: documentation, security Suggested reviewers: devantler Poem 🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Comment |
One-click needed — same circularity as #2456. The merge queue is still wedged: every Option 1 (preferred) — live patch, then promote this PR. Dry-run verified on for n in 10.0.1.1 10.0.1.2 10.0.1.3 10.0.1.4 10.0.1.6 10.0.1.9 10.0.1.5 10.0.1.8 10.0.1.11; do
talosctl -e 49.13.53.183 -n "$n" patch mc --mode=no-reboot --patch @talos/cluster/image-verification.yaml
done(run from this PR's branch so the patch file already carries the alternation; the file is a complete-document patch, so Option 2 — merge this PR past the queue (admin bypass) and dispatch cd.yaml from main; the deploy's cluster-update step applies it. Heavier, and the deploy still needs the health gate relaxed or the same live patch to pass — Option 1 is cleaner. Autoscaler nodes: patched live too (all 9), but nodes booted after the patch from the Hetzner snapshot get the old config until a |
|
🎉 This PR is included in version 1.96.8 🎉 The release is available on GitHub release Your semantic-release bot 📦🚀 |
Thanks for promoting — holding the enqueue until the Option-1 live patch runs (the queue circularity above: this PR's own I have a watcher armed: as soon as the patch is applied and wedding-app rolls out, I enqueue this PR (bare |
Merged, but the wedge persists — the fix can't reach the nodes on its own. Talos still runs the old One-click to unwedge (my attempt was permission-gated): from a checkout of current ksail --config ksail.prod.yaml cluster update(or dispatch cd.yaml — same composite... though it health-gates too; the local cluster-update is the reliable path). The delta is the one-line subjectRegex alternation — In-Place, no reboot. After it applies, wedding-app pods pull, the health gate clears, and I'll re-enqueue the promoted fleet (#2443, #2437, #2434, #2452, …). |
Cluster-update confirmed applied (thanks!) — images pull now, but the wedge has a 4th layer: wedding-app v1.14.1 itself never boots. New pods went ImagePullBackOff → CrashLoopBackOff ( Fix = wedding-app#152 (pin 5.5.4, dependabot-ignore 5.5.5, restore the Lighthouse lane as the boot canary; boot-smoke verified locally). Promote #152 → v1.14.2 releases → prod rollout heals → I drain the queue (promoted fleet #2443/#2437/#2436/#2434/#2452/#2433/#2440, renovate #2458, re-queue #2427/#2430). The site itself is still up meanwhile — the previous ReplicaSet keeps serving. |
Why
New app releases cannot start on prod: wedding-app's latest rollout is stuck in ImagePullBackOff because the Talos node-level image verification still only trusts the old reusable-workflows signing identity, while images are now signed by the actions workflow after the publish-app move. Every future app release hits the same wall until this lands.
What
Accepts both the actions and reusable-workflows publish-app signing identities in the Talos ImageVerificationConfig — the same transition alternation already in place for the Flux OCIRepository verification and the Kyverno verify-app-images policy (third and final verification layer to get this fix). Narrow to actions-only once every app has published a fresh actions-signed image.
Deploys via the merge queue's
ksail cluster updatestep; no manual action needed after merge.