From ea1b490f09ae1ac6b0df650f885766b4da54127f Mon Sep 17 00:00:00 2001 From: Ali Syed Date: Wed, 17 Jun 2026 14:36:50 +0100 Subject: [PATCH 1/3] feat(konflux-release): Add Konflux release workflow plugin for EDO Add a new plugin that codifies the 6-phase Konflux release process for ExternalDNS Operator (EDO). Claude drives each phase end-to-end while a human reviewer stays in the loop to approve PRs and handle auth. Includes: - 3 commands: release, status, verify - EDO release skill with state tracking and cross-session resume - EDO constants (app names, registries, FBC versions, Konflux URLs) - Error handling for IIB timeouts, RADAS failures, auth expiry Assisted-By: Claude Opus 4.6 --- .claude-plugin/marketplace.json | 13 + .../.claude-plugin/plugin.json | 8 + plugins/konflux-release/README.md | 43 ++ plugins/konflux-release/commands/release.md | 76 +++ plugins/konflux-release/commands/status.md | 68 ++ plugins/konflux-release/commands/verify.md | 80 +++ .../skills/edo-release/SKILL.md | 633 ++++++++++++++++++ .../konflux-release/team-docs/constants.md | 86 +++ 8 files changed, 1007 insertions(+) create mode 100644 .claude-plugin/marketplace.json create mode 100644 plugins/konflux-release/.claude-plugin/plugin.json create mode 100644 plugins/konflux-release/README.md create mode 100644 plugins/konflux-release/commands/release.md create mode 100644 plugins/konflux-release/commands/status.md create mode 100644 plugins/konflux-release/commands/verify.md create mode 100644 plugins/konflux-release/skills/edo-release/SKILL.md create mode 100644 plugins/konflux-release/team-docs/constants.md diff --git a/.claude-plugin/marketplace.json b/.claude-plugin/marketplace.json new file mode 100644 index 0000000..bfd511b --- /dev/null +++ b/.claude-plugin/marketplace.json @@ -0,0 +1,13 @@ +{ + "name": "network-edge-tools", + "owner": { + "name": "openshift" + }, + "plugins": [ + { + "name": "konflux-release", + "source": "./plugins/konflux-release", + "description": "Konflux release workflow automation for ExternalDNS Operator (EDO)" + } + ] +} diff --git a/plugins/konflux-release/.claude-plugin/plugin.json b/plugins/konflux-release/.claude-plugin/plugin.json new file mode 100644 index 0000000..e6dde99 --- /dev/null +++ b/plugins/konflux-release/.claude-plugin/plugin.json @@ -0,0 +1,8 @@ +{ + "name": "konflux-release", + "description": "Konflux release workflow automation for ExternalDNS Operator (EDO)", + "version": "0.1.0", + "author": { + "name": "github.com/openshift/network-edge-tools" + } +} diff --git a/plugins/konflux-release/README.md b/plugins/konflux-release/README.md new file mode 100644 index 0000000..4a9cd08 --- /dev/null +++ b/plugins/konflux-release/README.md @@ -0,0 +1,43 @@ +# konflux-release + +Konflux release workflow automation for the ExternalDNS Operator (EDO). + +## Overview + +This plugin codifies the 6-phase Konflux release process that the NID team follows for EDO releases. Claude drives the workflow end-to-end — opening PRs, creating Release CRs, polling status, running verification — while a human reviewer stays in the loop to approve PRs and handle auth. + +## Commands + +| Command | Description | +|---------|-------------| +| `/konflux-release:release ` | Run the full 6-phase EDO release workflow | +| `/konflux-release:status` | Check current release state and progress | +| `/konflux-release:verify ` | Run production verification across all OCP versions | + +## Prerequisites + +- `oc` CLI authenticated to the Konflux cluster +- `gh` CLI authenticated to GitHub +- `kubectl` access to `external-dns-operator-tenant` namespace +- `podman` installed (for verification) +- `jira` CLI (optional, for closing tickets) + +## Process Phases + +1. **Code Readiness** — Verify all PRs merged, VERSION file correct +2. **RPA Verification** — Confirm ReleasePlanAdmission exists in konflux-release-data +3. **Stage Release** — Update FBC catalogs with stage bundle, create stage Release CRs +4. **Prod Bundle Release** — Create prod Release CR for the bundle +5. **FBC Prod Release** — Swap stage→prod registry in catalogs, create 11 FBC Release CRs +6. **Verify + Close** — Verify bundle in all prod indexes, close Jira tickets + +## State Tracking + +Release state is persisted to `.work/konflux-release/release-state-{version}.json` so the workflow can be resumed across Claude sessions. + +## Component Ownership + +| Component | Owners | +|-----------|--------| +| Plugin | @Thealisyed | +| EDO release process | @alebedev87, @grzpiotrowski | diff --git a/plugins/konflux-release/commands/release.md b/plugins/konflux-release/commands/release.md new file mode 100644 index 0000000..3aaf2d0 --- /dev/null +++ b/plugins/konflux-release/commands/release.md @@ -0,0 +1,76 @@ +--- +description: Run the full 6-phase EDO Konflux release workflow with Claude driving and a human reviewing +argument-hint: " [--resume]" +--- + +## Name +konflux-release:release + +## Synopsis +``` +/konflux-release:release [--resume] +``` + +## Description + +The `konflux-release:release` command runs the complete EDO (ExternalDNS Operator) Konflux release workflow. Claude drives each phase — creating branches, editing FBC catalogs, opening PRs, generating Release CRs, polling status, and running verification. The human stays in the loop to review PRs, apply kubectl commands, and handle Konflux auth. + +## Implementation + +1. **Parse the version argument**. Must match pattern `X.Y.Z` (e.g., `1.2.2`, `1.3.6`). + +2. **Determine RHEL base**: + - Version starts with `1.2` → `rhel8`, release branch `release-1.2`, bundle app `ext-dns-optr-1-2-rhel-8` + - Version starts with `1.3` or higher → `rhel9`, release branch `main`, bundle app `ext-dns-optr-1-3-rhel-9` + +3. **Check for existing state file** at `.work/konflux-release/release-state-{version}.json`: + - If `--resume` is passed or the file exists, load state and resume from `current_phase`. + - If the file exists but `--resume` is NOT passed, ask the human: "A release state file exists for v{version} at phase {N}. Resume? (y/n)" + - If no file exists, initialize a new state and prompt the human for: + - **NE story number** (e.g., `NE-2730`) — used in PR titles + - **OCPBUGS ticket** (e.g., `OCPBUGS-78658`) — the CVE/bug driving this release + +4. **Verify prerequisites**: + - `oc whoami` succeeds (Konflux auth) + - `gh auth status` succeeds (GitHub auth) + - Current directory is the EDO repo (check for `bundle-hack/container_digest.sh`) + +5. **Load constants** from `plugins/konflux-release/team-docs/constants.md` (resolve the absolute path within the network-edge-tools repo). + +6. **Follow the `edo-release` skill** at `plugins/konflux-release/skills/edo-release/SKILL.md`. Execute each phase sequentially, updating the state file after each phase checkpoint. + +### Prerequisites + +- EDO repo must be cloned locally with `upstream` remote → `openshift/external-dns-operator` +- `oc` CLI authenticated to the Konflux cluster +- `gh` CLI authenticated to GitHub +- `kubectl` access to `external-dns-operator-tenant` namespace +- `podman` installed (for Phase 6 verification) +- `jira` CLI (optional, for closing tickets in Phase 6) + +## Arguments + +- **version** *(required)* + The target release version in `X.Y.Z` format. + Example: `1.3.6` + +- **--resume** *(optional)* + Resume an in-progress release from the saved state file without prompting. + +## Examples + +1. **Start a new release**: + ``` + /konflux-release:release 1.3.6 + ``` + +2. **Resume an in-progress release**: + ``` + /konflux-release:release 1.3.6 --resume + ``` + +## See Also +- `plugins/konflux-release/skills/edo-release/SKILL.md` — Detailed phase procedures +- `plugins/konflux-release/team-docs/constants.md` — EDO-specific constants +- `/konflux-release:status` — Check release progress +- `/konflux-release:verify` — Run production verification diff --git a/plugins/konflux-release/commands/status.md b/plugins/konflux-release/commands/status.md new file mode 100644 index 0000000..2bcd248 --- /dev/null +++ b/plugins/konflux-release/commands/status.md @@ -0,0 +1,68 @@ +--- +description: Check the current state and progress of an in-progress EDO Konflux release +argument-hint: "[version]" +--- + +## Name +konflux-release:status + +## Synopsis +``` +/konflux-release:status [version] +``` + +## Description + +The `konflux-release:status` command reads the release state file and reports the current phase, completed phases, and any failed Release CRs. If Konflux auth is active, it also polls live Release CR statuses. + +## Implementation + +1. **Find state files**. Look for `.work/konflux-release/release-state-*.json` in the current directory: + - If `version` is provided, read `release-state-{version}.json` directly. + - If no version is provided, list all state files and report on the most recent one. + - If no state files exist, report "No active releases found." + +2. **Read and parse** the state JSON file. + +3. **Display the phase summary**: + ``` + EDO v{version} Konflux Release Status + ====================================== + Phase 1: Code Readiness ✓ Completed + Phase 2: RPA Verification ✓ Completed + Phase 3: Stage Release ✓ Completed + Phase 4: Prod Bundle Release → In Progress + Phase 5: FBC Prod Release · Pending + Phase 6: Verify + Close · Pending + + Current phase: 4 — Prod Bundle Release + Started: 2026-06-17 10:30 UTC + Last updated: 2026-06-17 14:22 UTC + ``` + +4. **Show key values** if populated: + - Bundle digest + - Snapshot name + - PR numbers + - Failed Release CRs (with error messages if available) + +5. **Optionally poll live status** if Konflux auth is active (`oc whoami` succeeds): + - Check Release CR status for the current phase + - Report live status alongside saved state + +## Arguments + +- **version** *(optional)* + The release version to check. If omitted, shows the most recent active release. + +## Examples + +1. **Check a specific release**: + ``` + /konflux-release:status 1.3.6 + ``` + +2. **Check the most recent release**: + ``` + /konflux-release:status + ``` diff --git a/plugins/konflux-release/commands/verify.md b/plugins/konflux-release/commands/verify.md new file mode 100644 index 0000000..7b1fe68 --- /dev/null +++ b/plugins/konflux-release/commands/verify.md @@ -0,0 +1,80 @@ +--- +description: Run production verification for an EDO release across all OCP operator indexes +argument-hint: "" +--- + +## Name +konflux-release:verify + +## Synopsis +``` +/konflux-release:verify +``` + +## Description + +The `konflux-release:verify` command verifies that an EDO release bundle is present in all production Red Hat operator indexes (v4.12 through v4.22). It runs `podman` with `--pull=always` to ensure fresh index images are used. + +## Implementation + +1. **Verify podman is available**: + ```bash + which podman + ``` + +2. **Run verification across all 11 OCP versions**: + ```bash + for ver in 12 13 14 15 16 17 18 19 20 21 22; do + RESULT=$(podman run --pull=always --rm \ + registry.redhat.io/redhat/redhat-operator-index:v4.${ver} \ + ls /configs/external-dns-operator/ 2>&1) + EXIT_CODE=$? + if [ $EXIT_CODE -eq 0 ] && [ -n "$RESULT" ]; then + echo "v4.${ver}: FOUND" + else + echo "v4.${ver}: NOT FOUND" + fi + done + ``` + + **CRITICAL**: Always use `--pull=always`. Without it, podman reuses cached index images that may not contain the latest bundle, producing false NOT FOUND results. + +3. **Present results as a table**: + ``` + EDO v{version} Production Verification + ======================================= + v4.12: FOUND ✓ + v4.13: FOUND ✓ + v4.14: FOUND ✓ + v4.15: FOUND ✓ + v4.16: FOUND ✓ + v4.17: FOUND ✓ + v4.18: FOUND ✓ + v4.19: FOUND ✓ + v4.20: FOUND ✓ + v4.21: FOUND ✓ + v4.22: FOUND ✓ + + Result: 11/11 verified ✓ + ``` + +4. **If any versions show NOT FOUND**, advise: + - Confirm the FBC prod Release CR for that version succeeded + - Image propagation may take a few minutes after Release CR success + - Re-run with `--pull=always` (this command always does, but warn if running manually) + +## Arguments + +- **version** *(required)* + The release version to verify (e.g., `1.2.2`). + +## Examples + +1. **Verify a completed release**: + ``` + /konflux-release:verify 1.2.2 + ``` + +## See Also +- `/konflux-release:status` — Check release state +- `/konflux-release:release` — Run the full release workflow diff --git a/plugins/konflux-release/skills/edo-release/SKILL.md b/plugins/konflux-release/skills/edo-release/SKILL.md new file mode 100644 index 0000000..42573b2 --- /dev/null +++ b/plugins/konflux-release/skills/edo-release/SKILL.md @@ -0,0 +1,633 @@ +--- +name: EDO Konflux Release Workflow +description: Complete 6-phase Konflux release procedure for ExternalDNS Operator (EDO) with state tracking, error handling, and retry logic +--- + +# EDO Konflux Release Workflow + +This skill defines the complete release procedure for ExternalDNS Operator (EDO) through Konflux. Claude drives each phase end-to-end — creating branches, editing files, opening PRs, creating Release CRs, polling status, and running verification. The human reviewer stays in the loop at natural checkpoints: approving PRs, handling auth, and making judgment calls on failures. + +## When to Use This Skill + +- **Referenced by**: `/konflux-release:release` command +- **Trigger**: When performing a Konflux release of EDO (any version) +- **Covers**: Both rhel8 (1.2.x on `release-1.2` branch) and rhel9 (1.3.x+ on `main` branch) + +## Prerequisites + +Before starting, verify: + +1. **Konflux auth**: Run `oc whoami` — if it fails, ask the human to run: + ```bash + oc login --web https://api.kflux-prd-rh03.nnv1.p1.openshiftapps.com:6443 + ``` +2. **GitHub auth**: Run `gh auth status` — must be authenticated +3. **EDO repo**: Must be cloned locally with `upstream` remote pointing to `openshift/external-dns-operator` +4. **podman**: Must be installed (for Phase 6 verification) +5. **jira CLI**: Optional but needed for closing tickets in Phase 6 + +## Constants + +Load all constants from `team-docs/constants.md`. Key values that vary per release: + +| Variable | How to determine | +|----------|-----------------| +| `VERSION` | The target release version (e.g., `1.2.2`, `1.3.6`) | +| `RHEL_BASE` | `rhel8` if version starts with `1.2`, else `rhel9` | +| `RELEASE_BRANCH` | `release-1.2` for rhel8, `main` for rhel9 | +| `BUNDLE_APP` | `ext-dns-optr-1-2-rhel-8` for rhel8, `ext-dns-optr-1-3-rhel-9` for rhel9 | +| `NE_STORY` | The NE Jira story for this release (ask the human) | +| `OCPBUGS_TICKET` | The OCPBUGS CVE/bug ticket driving this release (ask the human) | +| `BUNDLE_DIGEST` | The bundle image digest — discovered during Stage Release | + +## State Management + +### State File + +Persist release state to `.work/konflux-release/release-state-{VERSION}.json` in the EDO repo working directory. Create the `.work/` directory if it doesn't exist (it should be gitignored). + +### State Schema + +```json +{ + "version": "1.3.6", + "rhel_base": "rhel9", + "release_branch": "main", + "bundle_app": "ext-dns-optr-1-3-rhel-9", + "ne_story": "NE-2747", + "ocpbugs_ticket": "OCPBUGS-86279", + "started_at": "2026-06-17T10:30:00Z", + "updated_at": "2026-06-17T14:22:00Z", + "current_phase": 3, + "phases": { + "1_code_readiness": { "status": "completed" }, + "2_rpa_verification": { "status": "completed" }, + "3_stage_release": { + "status": "in_progress", + "bundle_stage_digest": "sha256:...", + "fbc_stage_pr": "", + "release_crs": {} + }, + "4_prod_bundle": { "status": "pending" }, + "5_fbc_prod": { "status": "pending" }, + "6_verify_close": { "status": "pending" } + }, + "key_values": { + "bundle_prod_digest": "", + "prod_snapshot": "", + "pr_numbers": [], + "failed_release_crs": [] + } +} +``` + +### Resume Logic + +At the start of each session: +1. Check if `.work/konflux-release/release-state-{VERSION}.json` exists. +2. If it does, load it and resume from `current_phase`. +3. If not, initialize a new state file and start from Phase 1. +4. After each phase completion, update the state file. + +--- + +## Phase 1: Code Readiness Check + +**Goal**: Verify all code changes are merged and the repo is ready for release. + +### What Claude Does + +1. **Check VERSION file** on the release branch: + ```bash + git fetch upstream + git show upstream/{RELEASE_BRANCH}:VERSION + ``` + Confirm it matches `{VERSION}`. + +2. **Check container_digest.sh**: + ```bash + git show upstream/{RELEASE_BRANCH}:bundle-hack/container_digest.sh + ``` + Verify all image pullspecs are present (operator, operand, kube-rbac-proxy). + +3. **Check for open PRs** on the release branch: + ```bash + gh pr list --repo openshift/external-dns-operator --base {RELEASE_BRANCH} --state open --json number,title,author + ``` + Flag any release-related PRs (version bumps, golang bumps, nudging PRs) that haven't merged yet. + +4. **Check recent merges** to confirm release prep PRs landed: + ```bash + gh pr list --repo openshift/external-dns-operator --base {RELEASE_BRANCH} --state merged --limit 10 --json number,title,mergedAt + ``` + +5. **Present checklist** to the human: + - [ ] VERSION file shows `{VERSION}` + - [ ] container_digest.sh has all 3 image pullspecs + - [ ] No blocking open PRs on `{RELEASE_BRANCH}` + - [ ] Version bump PR merged + - [ ] Nudging PRs merged (operator digest, operand digest) + +### Human Action + +Review the checklist. Confirm all items are green, or explain exceptions (e.g., "Konflux references PR is open but Andrey said it's not needed before release"). + +### State Update + +Mark `1_code_readiness.status = "completed"`. Advance `current_phase = 2`. + +--- + +## Phase 2: RPA Verification + +**Goal**: Confirm the ReleasePlanAdmission (RPA) exists in the konflux-release-data GitLab repo for this release version. + +### What Claude Does + +1. **Explain the RPA** to the human: + > The RPA (ReleasePlanAdmission) defines how Konflux promotes images from stage to prod. It lives in the `konflux-release-data` GitLab repo. Andrey or Greg typically set this up as part of release prep. + +2. **Provide the link**: + > Check: `https://gitlab.cee.redhat.com/releng/konflux-release-data` + > Look for the EDO tenant's RPA configuration. The `topic` field should reference this release version, and both stage and prod RPAs should exist. + +3. **Ask the human** to confirm the RPA is in place. + +### Human Action + +Navigate to GitLab, check the RPA exists. Report back (e.g., "Yes, Andrey's MR !19004 merged"). + +### State Update + +Mark `2_rpa_verification.status = "completed"`. Advance `current_phase = 3`. + +--- + +## Phase 3: Stage Release + +**Goal**: Update FBC catalogs with the stage bundle digest, create stage Release CRs, and verify they succeed. + +### Sub-step 3a: Get Stage Bundle Digest + +1. **Ask the human** for the stage bundle digest, or look it up: + ```bash + kubectl get snapshots -n external-dns-operator-tenant \ + -l appstudio.openshift.io/application={BUNDLE_APP} \ + --sort-by=.metadata.creationTimestamp -o jsonpath='{.items[-1].spec.artifacts.componentDigests}' 2>/dev/null + ``` + The digest looks like: `sha256:b1fed7a0188328e58b56c9681e567eb02d2de6860315478a33a1ffa24dee9ccc` + +2. Save the digest to state: `phases.3_stage_release.bundle_stage_digest`. + +### Sub-step 3b: Update FBC Catalogs with Stage Bundle + +**IMPORTANT**: FBC catalogs live on the `main` branch, NOT the release branch. + +1. **Create a working branch** from `upstream/main`: + ```bash + git fetch upstream + git checkout -b {NE_STORY}-fbc-stage-v{VERSION} upstream/main + ``` + +2. **Determine the new catalog entries** needed. For each of the 11 OCP versions (v4.12–v4.22), update both `catalog-template.yaml` and `catalog.yaml`: + + **In `catalog-template.yaml`**: Add a new entry in the `stable-v1` channel and the appropriate `stable-v1.X` minor channel, plus a new `olm.bundle` entry. The new version must: + - Have `replaces:` pointing to the previous version in the chain + - Have `skipRange: <{VERSION}` + - Use `registry.stage.redhat.io` for the bundle image + + **In `catalog.yaml`**: Add the rendered equivalent entries. + + Examine the existing entries in a reference catalog-template.yaml to understand the exact format: + ```bash + cat catalog/v4.14/catalog-template.yaml + ``` + +3. **Apply edits across all 11 versions**. Use `sed` or direct file editing. The edits are identical across versions — only the file paths differ. + +4. **Verify the changes**: + ```bash + git diff --stat + ``` + Should show 22 files changed (11 catalog-template.yaml + 11 catalog.yaml). + +5. **Commit and push**: + ```bash + git add catalog/ + git commit -m "{NE_STORY}: Update FBCs v4.12-v4.22 with v{VERSION} stage bundle" + git push origin {NE_STORY}-fbc-stage-v{VERSION} + ``` + +6. **Create PR**: + ```bash + gh pr create --repo openshift/external-dns-operator \ + --base main \ + --title "{NE_STORY}: Update FBCs v4.12-v4.22 with v{VERSION} stage bundle" \ + --body "Update FBC catalogs (v4.12-v4.22) with v{VERSION} stage bundle digest. + + Bundle: registry.stage.redhat.io/edo/external-dns-operator-bundle@{BUNDLE_DIGEST} + + Channels updated: stable-v1, stable-v1.X + Jira: {OCPBUGS_TICKET}" + ``` + +7. **Present the PR** to the human for review. + +### Sub-step 3c: Create Stage Release CRs + +After the PR is merged and Konflux builds the new images: + +1. **Wait for Konflux pipelines** to complete. Check the Konflux UI or: + ```bash + kubectl get pipelineruns -n external-dns-operator-tenant \ + --sort-by=.metadata.creationTimestamp -o name | tail -5 + ``` + +2. **Look up the bundle stage snapshot**: + ```bash + kubectl get snapshots -n external-dns-operator-tenant \ + -l appstudio.openshift.io/application={BUNDLE_APP} \ + --sort-by=.metadata.creationTimestamp -o name | tail -1 + ``` + +3. **Create the bundle stage Release CR**: + ```yaml + apiVersion: appstudio.redhat.com/v1alpha1 + kind: Release + metadata: + name: edo-bundle-stage-{VERSION}-{TIMESTAMP} + namespace: external-dns-operator-tenant + spec: + releasePlan: + snapshot: + ``` + + Ask the human to apply: `kubectl apply -f /tmp/release-cr.yaml` + +4. **Create FBC stage Release CRs** for all 11 versions. For each version: + ```bash + SNAPSHOT=$(kubectl get snapshots -n external-dns-operator-tenant \ + -l appstudio.openshift.io/application=ext-dns-optr-fbc-v4-{VER} \ + --sort-by=.metadata.creationTimestamp -o name | tail -1) + ``` + Generate a Release CR for each. + +5. **Monitor Release CR status** (see Monitoring section below). + +### Human Action + +- Review and approve the FBC stage PR +- Apply Release CRs when presented +- Handle auth re-login if sessions expire + +### State Update + +Save PR number, snapshot names, Release CR names and statuses. Mark `3_stage_release.status = "completed"` when all Release CRs succeed. Advance `current_phase = 4`. + +--- + +## Phase 4: Prod Bundle Release + +**Goal**: Create the prod Release CR for the operator bundle and verify it succeeds. + +### What Claude Does + +1. **Verify container_digest.sh** points to `registry.redhat.io` (not stage): + ```bash + git show upstream/{RELEASE_BRANCH}:bundle-hack/container_digest.sh + ``` + All pullspecs should use `registry.redhat.io`. If any use `registry.stage.redhat.io`, generate the edit to swap them and create a PR. (With the mirror set fix from PR #477, this should already point to prod.) + +2. **Look up the prod bundle snapshot**: + ```bash + kubectl get snapshots -n external-dns-operator-tenant \ + -l appstudio.openshift.io/application={BUNDLE_APP} \ + --sort-by=.metadata.creationTimestamp -o name | tail -1 + ``` + +3. **Generate the prod Release CR**: + ```yaml + apiVersion: appstudio.redhat.com/v1alpha1 + kind: Release + metadata: + name: edo-bundle-prod-{VERSION}-{TIMESTAMP} + namespace: external-dns-operator-tenant + spec: + releasePlan: + snapshot: + ``` + +4. **Present to the human** for `kubectl apply`. + +5. **Monitor** until the Release CR shows `Released: True`. + +### Human Action + +- Apply the Release CR +- Re-login to Konflux if session expired + +### State Update + +Save snapshot name and Release CR status. Mark `4_prod_bundle.status = "completed"`. Advance `current_phase = 5`. + +--- + +## Phase 5: FBC Prod Release + +**Goal**: Swap stage→prod registry in FBC catalogs, merge, create 11 FBC prod Release CRs, and handle failures with retries. + +This is the most complex phase. Take it step by step. + +### Sub-step 5a: Registry Swap + +1. **Create a working branch** from `upstream/main`: + ```bash + git fetch upstream + git checkout -b {NE_STORY}-fbc-prod-v{VERSION} upstream/main + ``` + +2. **Swap stage registry to prod** across all 22 catalog files: + ```bash + for ver in 12 13 14 15 16 17 18 19 20 21 22; do + sed -i 's|registry.stage.redhat.io|registry.redhat.io|g' \ + catalog/v4.${ver}/catalog-template.yaml \ + catalog/v4.${ver}/catalog.yaml + done + ``` + +3. **Verify the swap**: + ```bash + # Should return 0 — no stage references remain + grep -r "registry.stage.redhat.io" catalog/ | wc -l + + # Should return matches — prod references present + grep -c "registry.redhat.io/edo/external-dns-operator-bundle@{BUNDLE_DIGEST}" catalog/v4.14/catalog-template.yaml + ``` + +4. **Verify diff**: + ```bash + git diff --stat + ``` + Should show 22 files changed. + +### Sub-step 5b: Create PR + +1. **Commit and push**: + ```bash + git add catalog/ + git commit -m "{NE_STORY}: Update FBCs v4.12-v4.22 with v{VERSION} prod bundle" + git push origin {NE_STORY}-fbc-prod-v{VERSION} + ``` + +2. **Create PR**: + ```bash + gh pr create --repo openshift/external-dns-operator \ + --base main \ + --title "{NE_STORY}: Update FBCs v4.12-v4.22 with v{VERSION} prod bundle" \ + --body "Swap stage → prod registry for v{VERSION} bundle in all FBC catalogs (v4.12-v4.22). + + Bundle: registry.redhat.io/edo/external-dns-operator-bundle@{BUNDLE_DIGEST} + + Jira: {OCPBUGS_TICKET}" + ``` + +3. **Present the PR** to the human for review. + +### Sub-step 5c: Create FBC Prod Release CRs + +After the PR merges and Konflux pipelines build new FBC images: + +1. **Wait for all 11 FBC pipelines** to complete. Monitor: + ```bash + kubectl get pipelineruns -n external-dns-operator-tenant \ + -l appstudio.openshift.io/application=ext-dns-optr-fbc-v4-14 \ + --sort-by=.metadata.creationTimestamp -o name | tail -1 + ``` + +2. **Look up all 11 FBC snapshots** and generate Release CRs: + ```bash + for ver in 12 13 14 15 16 17 18 19 20 21 22; do + SNAPSHOT=$(kubectl get snapshots -n external-dns-operator-tenant \ + -l appstudio.openshift.io/application=ext-dns-optr-fbc-v4-${ver} \ + --sort-by=.metadata.creationTimestamp -o name | tail -1) + echo "v4.${ver}: ${SNAPSHOT}" + done + ``` + +3. **Generate all 11 Release CRs** as a single YAML file (separated by `---`). + +4. **Present to the human** for `kubectl apply -f`. + +### Sub-step 5d: Monitor and Retry + +1. **Poll all 11 Release CRs**: + ```bash + for ver in 12 13 14 15 16 17 18 19 20 21 22; do + STATUS=$(kubectl get release -n external-dns-operator-tenant \ + {CR_NAME_FOR_VERSION} \ + -o jsonpath='{.status.conditions[?(@.type=="Released")].status}' 2>/dev/null) + echo "v4.${ver}: ${STATUS:-Pending}" + done + ``` + +2. **Present a status table**: + ``` + FBC Prod Release Status + ----------------------- + v4.12: Running + v4.13: Running + v4.14: Succeeded ✓ + v4.15: Succeeded ✓ + ... + v4.22: Failed ✗ + ``` + +3. **Handle failures** — check the Release CR conditions for error details: + ```bash + kubectl get release -n external-dns-operator-tenant {CR_NAME} \ + -o jsonpath='{.status.conditions[?(@.type=="Released")].message}' + ``` + + **Known failure patterns and recovery**: + + | Pattern in error message | Cause | Recovery | + |--------------------------|-------|----------| + | `PipelineRunTimeout` or `timed out` | IIB timeout (common on v4.12/v4.13 — larger older indexes) | Create a new Release CR with a new name. May need multiple retries. | + | `sign-index-image` or `RADAS` | RADAS signing service outage/degradation | Wait 10-15 minutes, then create a new Release CR. | + | `create-pyxis-image` | Often caused by upstream RADAS issues | Wait and retry, same as signing failures. | + +4. **Generate retry Release CRs** for failed versions. Use a new name (append `-retry-{N}`). + +5. **Repeat polling** until all 11 versions show `Succeeded`. + +### Human Action + +- Review and approve the FBC prod PR +- Apply Release CRs and retry CRs when presented +- Handle auth re-login +- Escalate to Andrey/Greg if retries don't resolve failures (especially v4.12/v4.13 IIB timeouts) + +### State Update + +Track each FBC version's Release CR name and status. Save failed CRs to `key_values.failed_release_crs`. Mark `5_fbc_prod.status = "completed"` when all 11 succeed. Advance `current_phase = 6`. + +--- + +## Phase 6: Verify and Close + +**Goal**: Verify the new bundle appears in all production operator indexes, then close Jira tickets. + +### Sub-step 6a: Production Verification + +1. **Generate verification commands**: + ```bash + for ver in 12 13 14 15 16 17 18 19 20 21 22; do + echo "# v4.${ver}" + echo "podman run --pull=always --rm registry.redhat.io/redhat/redhat-operator-index:v4.${ver} ls /configs/external-dns-operator/" + done + ``` + + **CRITICAL**: Always use `--pull=always`. Without it, podman uses cached index images that may not contain the new bundle, leading to false NOT FOUND results. + +2. **Run verification** and collect results: + ```bash + for ver in 12 13 14 15 16 17 18 19 20 21 22; do + RESULT=$(podman run --pull=always --rm \ + registry.redhat.io/redhat/redhat-operator-index:v4.${ver} \ + ls /configs/external-dns-operator/ 2>/dev/null) + if [ -n "$RESULT" ]; then + echo "v4.${ver}: FOUND" + else + echo "v4.${ver}: NOT FOUND" + fi + done + ``` + +3. **Present results table**: + ``` + Production Verification + ----------------------- + v4.12: FOUND ✓ + v4.13: FOUND ✓ + ... + v4.22: FOUND ✓ + ``` + +4. If any show NOT FOUND: + - Confirm `--pull=always` was used + - Check if the Release CR for that version succeeded + - Image propagation can take a few minutes — wait and retry + +### Sub-step 6b: Close Jira Tickets + +1. **Close the OCPBUGS ticket** with verification output: + ```bash + jira issue move {OCPBUGS_TICKET} "Closed" --resolution "Done" \ + --comment "EDO v{VERSION} released via Konflux. Bundle verified in all production operator indexes (v4.12-v4.22). All {N} FBC Release CRs succeeded." + ``` + +2. **Close the NE story**: + ```bash + jira issue move {NE_STORY} "Closed" --resolution "Done" \ + --comment "EDO v{VERSION} Konflux release complete. All phases succeeded." + ``` + + Note: NE stories use "Closed" not "Done" as the terminal status. + +### Human Action + +- Review verification results +- Confirm Jira ticket closure is appropriate +- Post completion update to the team Slack thread + +### State Update + +Mark `6_verify_close.status = "completed"`. The release is done. + +--- + +## Monitoring Release CRs + +This is a reusable pattern used in Phases 3, 4, and 5. + +### Polling Loop + +```bash +kubectl get release -n external-dns-operator-tenant {CR_NAME} \ + -o jsonpath='{.status.conditions[?(@.type=="Released")]}' +``` + +Check the `status` field: +- `True` → Succeeded +- `False` → Check `reason` and `message` for failure details +- Empty/missing → Still running + +### Poll Frequency + +- First check: 2 minutes after creating the Release CR +- Subsequent checks: every 3-5 minutes +- Bundle Release CRs typically complete in 5-15 minutes +- FBC Release CRs take 10-30 minutes (longer for v4.12/v4.13) + +### Timeout Expectations + +| Component | Typical duration | Timeout concern | +|-----------|-----------------|-----------------| +| Bundle prod release | 5-15 min | Rare | +| FBC v4.14–v4.22 | 10-20 min | Uncommon | +| FBC v4.12–v4.13 | 15-45 min | Common — older indexes are larger | + +--- + +## Error Handling + +### Auth Expiry + +Konflux sessions expire frequently. Before any `kubectl`/`oc` command sequence: + +```bash +oc whoami 2>/dev/null || echo "AUTH_EXPIRED" +``` + +If expired, tell the human: +> Konflux session expired. Please re-login: +> ``` +> oc login --web https://api.kflux-prd-rh03.nnv1.p1.openshiftapps.com:6443 +> ``` + +After re-auth, re-read any snapshot names or CR statuses since the connection context may have changed. + +### Error Reference + +| Failure | Detection | Recovery | +|---------|-----------|----------| +| Konflux auth expired | `oc whoami` returns error | Human runs `oc login --web` | +| IIB timeout | Release CR message contains `timeout` or `PipelineRunTimeout` | Create new Release CR for failed version | +| RADAS signing failure | Message contains `sign-index-image` or `RADAS` or `create-pyxis-image` | Wait 10-15 min, create new Release CR | +| Snapshot not found | `kubectl get snapshots` returns empty | Verify the push pipeline ran on Konflux UI; may need to trigger rebuild | +| PR merge conflict | `gh pr create` or push fails | Rebase onto latest `upstream/main` and re-push | +| Podman stale cache | Verification shows NOT FOUND despite Release CR success | Re-run with `--pull=always` (mandatory) | +| FBC on wrong branch | Catalog files missing from checkout | FBC catalogs are on `main`, not `release-X.Y` | +| Prow valid-label failure | PR CI check fails | Use NE story number in PR title, not OCPBUGS | + +--- + +## Checkpoint Summary + +After each phase, pause and present a status summary to the human: + +``` +EDO v{VERSION} Konflux Release — Phase {N} Complete +==================================================== +Phase 1: Code Readiness ✓ +Phase 2: RPA Verification ✓ +Phase 3: Stage Release ✓ +Phase 4: Prod Bundle Release ← CURRENT +Phase 5: FBC Prod Release Pending +Phase 6: Verify + Close Pending + +Next: Phase 4 — Create prod Release CR for the operator bundle. +Proceed? (y/n) +``` + +Wait for the human to confirm before advancing to the next phase. This is a guided workflow — never proceed to a new phase without explicit confirmation. diff --git a/plugins/konflux-release/team-docs/constants.md b/plugins/konflux-release/team-docs/constants.md new file mode 100644 index 0000000..62d1105 --- /dev/null +++ b/plugins/konflux-release/team-docs/constants.md @@ -0,0 +1,86 @@ +# EDO Konflux Release Constants + +Reference data for ExternalDNS Operator (EDO) Konflux releases. + +## Operator Variants + +| Property | rhel8 (1.2.x) | rhel9 (1.3.x+) | +|----------|---------------|-----------------| +| Release branch | `release-1.2` | `main` / `release-1.3` | +| Bundle app name | `ext-dns-optr-1-2-rhel-8` | `ext-dns-optr-1-3-rhel-9` | +| Operator image path | `edo/external-dns-rhel8-operator` | `edo/external-dns-rhel9-operator` | +| Operand image path | `edo/external-dns-rhel8` | `edo/external-dns-rhel9` | +| kube-rbac-proxy max version | v4.15 (rhel8 only) | v4.17+ (rhel9) | + +## Konflux Infrastructure + +| Property | Value | +|----------|-------| +| Konflux cluster | `api.kflux-prd-rh03.nnv1.p1.openshiftapps.com:6443` | +| Konflux UI | `https://konflux-ui.apps.kflux-prd-rh03.nnv1.p1.openshiftapps.com/ns/external-dns-operator-tenant` | +| Namespace | `external-dns-operator-tenant` | +| Login command | `oc login --web https://api.kflux-prd-rh03.nnv1.p1.openshiftapps.com:6443` | + +## FBC Catalogs + +| Property | Value | +|----------|-------| +| FBC app name pattern | `ext-dns-optr-fbc-v4-{VER}` (e.g., `ext-dns-optr-fbc-v4-14`) | +| OCP versions | v4.12, v4.13, v4.14, v4.15, v4.16, v4.17, v4.18, v4.19, v4.20, v4.21, v4.22 | +| Version numbers (for loops) | `12 13 14 15 16 17 18 19 20 21 22` | +| Catalog directory pattern | `catalog/v4.{VER}/` | +| Files per version | `catalog-template.yaml`, `catalog.yaml` | +| FBC branch | `main` (NOT release-X.Y) | +| OLM channels | `stable-v1` (default), `stable-v1.X` (per minor) | + +## Registries + +| Registry | URL | Usage | +|----------|-----|-------| +| Stage | `registry.stage.redhat.io` | Stage release bundles | +| Production | `registry.redhat.io` | Prod release bundles | +| Bundle image path | `edo/external-dns-operator-bundle` | Both stage and prod | + +## Repos + +| Repo | URL | +|------|-----| +| Operator | `https://github.com/openshift/external-dns-operator` | +| Operand | `https://github.com/openshift/external-dns` | +| Release data (GitLab) | `https://gitlab.cee.redhat.com/releng/konflux-release-data` | +| Release process doc | `https://github.com/openshift/external-dns-operator/pull/391` | + +## Key Commands + +### Snapshot lookup +```bash +kubectl get snapshots -n external-dns-operator-tenant \ + -l appstudio.openshift.io/application={APP_NAME} \ + --sort-by=.metadata.creationTimestamp -o name | tail -1 +``` + +### Release CR status check +```bash +kubectl get release -n external-dns-operator-tenant {NAME} \ + -o jsonpath='{.status.conditions[?(@.type=="Released")].status}' +``` + +### Auth check +```bash +oc whoami 2>/dev/null || echo "AUTH_EXPIRED" +``` + +### Podman verification +```bash +podman run --pull=always --rm \ + registry.redhat.io/redhat/redhat-operator-index:v4.{VER} \ + ls /configs/external-dns-operator/ +``` + +## PR Title Convention + +PR titles for EDO releases MUST use NE stories (e.g., `NE-2730`), NOT OCPBUGS bug numbers. OCPBUGS numbers fail the `valid-label` Prow check. + +Format: `NE-XXXX: ` + +Example: `NE-2730: Update FBCs v4.12-v4.22 with v1.2.2 prod bundle` From 1771e66bb93508c83b5d4c5be21180f8f60c6c24 Mon Sep 17 00:00:00 2001 From: Ali Syed Date: Wed, 17 Jun 2026 14:45:31 +0100 Subject: [PATCH 2/3] docs: Reference Andrey's release process doc (PR #391) in SKILL.md and README Assisted-By: Claude --- plugins/konflux-release/README.md | 2 +- plugins/konflux-release/skills/edo-release/SKILL.md | 4 +++- 2 files changed, 4 insertions(+), 2 deletions(-) diff --git a/plugins/konflux-release/README.md b/plugins/konflux-release/README.md index 4a9cd08..3e294cb 100644 --- a/plugins/konflux-release/README.md +++ b/plugins/konflux-release/README.md @@ -4,7 +4,7 @@ Konflux release workflow automation for the ExternalDNS Operator (EDO). ## Overview -This plugin codifies the 6-phase Konflux release process that the NID team follows for EDO releases. Claude drives the workflow end-to-end — opening PRs, creating Release CRs, polling status, running verification — while a human reviewer stays in the loop to approve PRs and handle auth. +This plugin codifies the 6-phase Konflux release process that the NID team follows for EDO releases, based on the [release process documentation](https://github.com/openshift/external-dns-operator/pull/391) by Andrey Lebedev. Claude drives the workflow end-to-end — opening PRs, creating Release CRs, polling status, running verification — while a human reviewer stays in the loop to approve PRs and handle auth. ## Commands diff --git a/plugins/konflux-release/skills/edo-release/SKILL.md b/plugins/konflux-release/skills/edo-release/SKILL.md index 42573b2..73c7825 100644 --- a/plugins/konflux-release/skills/edo-release/SKILL.md +++ b/plugins/konflux-release/skills/edo-release/SKILL.md @@ -5,7 +5,9 @@ description: Complete 6-phase Konflux release procedure for ExternalDNS Operator # EDO Konflux Release Workflow -This skill defines the complete release procedure for ExternalDNS Operator (EDO) through Konflux. Claude drives each phase end-to-end — creating branches, editing files, opening PRs, creating Release CRs, polling status, and running verification. The human reviewer stays in the loop at natural checkpoints: approving PRs, handling auth, and making judgment calls on failures. +This skill defines the complete release procedure for ExternalDNS Operator (EDO) through Konflux. It is based on the [Konflux release process documentation](https://github.com/openshift/external-dns-operator/pull/391) created by Andrey Lebedev, combined with operational lessons learned from the EDO v1.2.2 release (OCPBUGS-78658 / NE-2730). + +Claude drives each phase end-to-end — creating branches, editing files, opening PRs, creating Release CRs, polling status, and running verification. The human reviewer stays in the loop at natural checkpoints: approving PRs, handling auth, and making judgment calls on failures. ## When to Use This Skill From 05d133c8da76879207ac81e540410614c2caa01c Mon Sep 17 00:00:00 2001 From: Ali Syed Date: Wed, 17 Jun 2026 14:50:58 +0100 Subject: [PATCH 3/3] docs: Add pre-approved permissions setup to README and SKILL.md Users need bash patterns in .claude/settings.local.json to avoid clicking approve on every git/gh/kubectl command during the workflow. Assisted-By: Claude --- plugins/konflux-release/README.md | 36 +++++++++++++++++++ .../skills/edo-release/SKILL.md | 2 ++ 2 files changed, 38 insertions(+) diff --git a/plugins/konflux-release/README.md b/plugins/konflux-release/README.md index 3e294cb..e8925e3 100644 --- a/plugins/konflux-release/README.md +++ b/plugins/konflux-release/README.md @@ -22,6 +22,42 @@ This plugin codifies the 6-phase Konflux release process that the NID team follo - `podman` installed (for verification) - `jira` CLI (optional, for closing tickets) +## Setup: Pre-approve Permissions + +The release workflow runs many read and write commands. To avoid clicking "Yes" on every `git`, `gh`, `kubectl`, and `podman` command, add these patterns to your EDO repo's `.claude/settings.local.json`: + +```json +{ + "permissions": { + "allow": [ + "Bash(git fetch *)", + "Bash(git show *)", + "Bash(git checkout *)", + "Bash(git branch *)", + "Bash(git add *)", + "Bash(git commit *)", + "Bash(git push *)", + "Bash(git diff *)", + "Bash(git log *)", + "Bash(git status*)", + "Bash(gh pr *)", + "Bash(gh auth *)", + "Bash(kubectl get *)", + "Bash(kubectl apply *)", + "Bash(oc whoami*)", + "Bash(grep *)", + "Bash(sed *)", + "Bash(cat *)", + "Bash(wc *)", + "Bash(podman run *)", + "Bash(jira issue *)" + ] + } +} +``` + +This only needs to be done once per repo clone. + ## Process Phases 1. **Code Readiness** — Verify all PRs merged, VERSION file correct diff --git a/plugins/konflux-release/skills/edo-release/SKILL.md b/plugins/konflux-release/skills/edo-release/SKILL.md index 73c7825..e88c27b 100644 --- a/plugins/konflux-release/skills/edo-release/SKILL.md +++ b/plugins/konflux-release/skills/edo-release/SKILL.md @@ -19,6 +19,8 @@ Claude drives each phase end-to-end — creating branches, editing files, openin Before starting, verify: +0. **Permissions**: The EDO repo must have pre-approved bash patterns in `.claude/settings.local.json` (see plugin README). Without this, every `git`, `gh`, `kubectl` command will prompt for manual approval. + 1. **Konflux auth**: Run `oc whoami` — if it fails, ask the human to run: ```bash oc login --web https://api.kflux-prd-rh03.nnv1.p1.openshiftapps.com:6443