chore: bump all component images except clusters-service (AROSLSRE-1395)#5926
chore: bump all component images except clusters-service (AROSLSRE-1395)#5926raelga wants to merge 4 commits into
Conversation
Advances every component in config/config.yaml to its latest digest **except clusters-service**, which is intentionally pinned at the last-known-good digest `sha256:6a49b32…` (vcs-ref `18b5a25`, 2026-06-19). ## Why The automated bulk bump Azure#5789 started failing `ci/prow/e2e-parallel` after the clusters-service image moved to `sha256:b17f6fe…` (vcs-ref `ee741db`). Bisecting the bump into per-component PRs isolated the culprit: - Azure#5910 (hypershift only) — e2e-parallel green - Azure#5911 (ACM/MCE only) — e2e-parallel green (merged) - Azure#5920 (**clusters-service only**) — e2e-parallel **fails reproducibly** (2 consecutive runs) on the exact same specs: `test/e2e/complete_cluster_create_multiversion.go:172` "verify simple web app runs" for the **candidate 4.22 and 5.0** channels, with `route was never reachable: dial tcp 10.0.0.5:443: i/o timeout` — the control plane + node pools provision, but the data-plane ingress on the newest OCP channels never becomes reachable. 4.20/4.21 are unaffected. - Azure#5912 (everything except CS) — only flaky/environmental failures (1 then 16, non-reproducible signature), consistent with the shared-CI ARM-throttling episode also hitting unrelated PRs (e.g. Azure#5915, which edits only alert YAML). An image-digest bump is a behavioral change (the cluster runs the new build), and the CS-only PR reproduces a version-specific data-plane regression, so we pin CS at the good digest while letting the remaining components advance. Follow-up on the bad CS image is tracked in the Jira below. Jira: https://redhat.atlassian.net/browse/AROSLSRE-1395 ## Components bumped (clusters-service pinned, not bumped) | Component | Old | New | | --- | --- | --- | | acrpull | v0.1.23 | v0.1.24 | | arobit forwarder | v5.0.4 (06-19) | v5.0.4 (07-03) | | mdsd | 1.42.0-20260615 | 1.42.0-20260629 | | kube-events | 20260621.1 | 20260701.1 | | maestro (provider) | v1.8.2 (06-11) | v1.8.2 (06-26) | | hypershift | 9aeb1f3 | 488ef0e | | OADP velero-server | 1.6.1 (06-25) | 1.6.1 (07-02) | | OADP velero azure-plugin | 1.6.1 (06-25) | 1.6.1 (07-02) | | OADP velero hypershift-plugin | 1.6.1 (06-25) | 1.6.1 (07-02) | | kube-state-metrics | v2.19.0 (06-12) | v2.19.0 (06-30) | | maestro-agent-sidecar (nginx) | azl3.0.20260602 | azl3.0.20260616 | | image-sync/oc-mirror | 690892d | 5bfc996 | | **clusters-service** | **6a49b32 (pinned — good)** | **— (b17f6fe held back)** |
There was a problem hiding this comment.
Pull request overview
This PR updates the repo’s source-of-truth configuration (config/config.yaml) and the materialized dev rendered configs to advance component image digests to their latest versions, while intentionally keeping clusters-service pinned to a last-known-good digest due to a known E2E regression tracked in AROSLSRE-1395.
Changes:
- Bump multiple component image digests (e.g., acrPull, arobit forwarder/mdsd, kube-events, hypershift, OADP velero + plugins, kube-state-metrics, prometheus operator/prometheus/config-reloader, oc-mirror).
- Update Istio
istioctlVersionfrom 1.30.1 → 1.30.2. - Commit regenerated
config/rendered/dev/*outputs consistent with the updated defaults.
Reviewed changes
Copilot reviewed 7 out of 7 changed files in this pull request and generated 1 comment.
Show a summary per file
| File | Description |
|---|---|
| config/config.yaml | Updates default component image digests/versions while leaving clusters-service pinned to the known-good digest. |
| config/rendered/dev/pers/westus3.yaml | Materialized dev config reflecting updated component digests/versions. |
| config/rendered/dev/perf/westus3.yaml | Materialized dev config reflecting updated component digests/versions. |
| config/rendered/dev/dev/westus3.yaml | Materialized dev config reflecting updated component digests/versions. |
| config/rendered/dev/cspr/westus3.yaml | Materialized dev config reflecting updated component digests/versions. |
| config/rendered/dev/ci01/centralus.yaml | Materialized dev config reflecting updated component digests/versions. |
| config/rendered/dev/ci00/centralus.yaml | Materialized dev config reflecting updated component digests/versions. |
Add an inline note on the pinned clustersService digest matching the existing shared-ingress HAProxy convention, so an automated bump does not silently reintroduce the known-bad digest without context. Addresses PR review feedback.
Confirmed root cause — clusters-service ARO-26913 (api.listening → HostedCluster Topology)Bisecting the #5789 bulk digest bump isolated clusters-service as the sole culprit (bisect PRs: #5910 hypershift ✅, #5911 ACM/MCE ✅ merged, #5912 rest-minus-CS ✅, #5920 CS-only ❌ reproducible). Single-variable proof: #5920 changes only the CS digest vs Mechanism (CS These This PR pins clusters-service to the last-known-good digest ( Tracking: AROSLSRE-1395 |
Corroboration — hcpctl snapshot
|
… 18b5a25 The automated image bump (tag: latest) advances clusters-service to vcs-ref ee741db (ARO-26913), which sets AzurePlatformSpec.Topology=PublicAndPrivate on the HostedCluster; on candidate 4.22/5.0 payloads the guest ingress-operator flips the ingress LB scope back to External, leaving the *.apps route unreachable and failing e2e-parallel. Pin the image-updater source tag to the last-known-good build 18b5a25 so the bump cannot re-advance it. See AROSLSRE-1395.
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: raelga The full list of commands accepted by this bot can be found here. The pull request process is described here DetailsNeeds approval from an approver in each of these files:
Approvers can indicate their approval by writing |
Added a hard pin in the image-updater (commit
|
|
/retest-required |
Advances every component in config/config.yaml to its latest digest except
clusters-service, which is intentionally pinned at the last-known-good digest
sha256:6a49b32…(vcs-ref18b5a25, 2026-06-19).Why
The automated bulk bump #5789 started failing
ci/prow/e2e-parallelafter theclusters-service image moved to
sha256:b17f6fe…(vcs-refee741db). Bisectingthe bump into per-component PRs isolated the culprit:
test/e2e/complete_cluster_create_multiversion.go:172"verify simple web app runs" for the candidate 4.22 and 5.0 channels, withroute was never reachable: dial tcp 10.0.0.5:443: i/o timeout— the control plane + node pools provision, but the data-plane ingress on the newest OCP channels never becomes reachable. 4.20/4.21 are unaffected.An image-digest bump is a behavioral change (the cluster runs the new build), and
the CS-only PR reproduces a version-specific data-plane regression, so we pin CS
at the good digest while letting the remaining components advance. Follow-up on
the bad CS image is tracked in the Jira below.
Jira: https://redhat.atlassian.net/browse/AROSLSRE-1395
Components bumped (clusters-service pinned, not bumped)