Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5 changes: 1 addition & 4 deletions CLAUDE.md
Original file line number Diff line number Diff line change
Expand Up @@ -52,9 +52,6 @@ ansible-playbook setup.yml -i inventory.ini
ansible-playbook setup.yml -e "topology=arbiter" -e "interactive_mode=false" -i inventory.ini
ansible-playbook setup.yml -e "topology=fencing" -e "interactive_mode=false" -i inventory.ini

# Redfish stonith configuration (for fencing topology)
ansible-playbook redfish.yml -i inventory.ini

# Cleanup
ansible-playbook clean.yml -i inventory.ini
```
Expand Down Expand Up @@ -110,7 +107,7 @@ make shellcheck
- `assisted/acm-install`: Install ACM/MCE + assisted service + enable TNF on hub
- `assisted/assisted-spoke`: Deploy spoke TNF cluster via assisted installer + BMH
- `proxy-setup`: Squid proxy for cluster external access
- `redfish`: Automated stonith configuration for fencing topology
- `kcli/kcli-redfish`: ksushy BMC simulator startup for kcli fencing deployments
- `config`: SSH key and git configuration
- `git-user`: Git user configuration for development

Expand Down
2 changes: 1 addition & 1 deletion deploy/aws-hypervisor/scripts/configure.sh
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@ sudo hostnamectl set-hostname "aws-${STACK_NAME}"

function get_ocp_version() {
local latest_ga_ocp_version
local default_version="${DEFAULT_OCP_VERSION:-4.20}"
local default_version="${DEFAULT_OCP_VERSION:-4.22}"
if latest_ga_ocp_version="$(curl -sL https://sippy.dptools.openshift.org/api/releases | jq -re '.ga_dates | to_entries | max_by(.value) | .key')";
then
echo "${latest_ga_ocp_version:-$default_version}"
Expand Down
42 changes: 7 additions & 35 deletions deploy/openshift-clusters/README-kcli.md
Original file line number Diff line number Diff line change
Expand Up @@ -292,54 +292,26 @@ ansible-playbook kcli-install.yml -i inventory.ini \
jq '.auths | has("registry.ci.openshift.org")' < roles/kcli/kcli-install/files/pull-secret.json
```

## 7. Fencing Configuration (Post-Deployment)
## 7. Fencing Configuration

After a successful 4.19 kcli deployment with fencing topology, STONITH fencing needs to be configured to enable automatic node recovery. *If you are using the kcli-install playbook, this will be done for you automatically via kcli-redfish.yml**. If you're doing it some other way, you can use the kcli-redfish,yml playbook manually.
For kcli deployments with fencing topology, the `kcli/kcli-redfish` role starts the ksushy BMC simulator before cluster installation. The cluster-etcd-operator (CEO) then auto-configures STONITH fencing during installation using the simulated BMC endpoints.

The existing `redfish.yml` playbook **will not work** with kcli deployments because it expects BMH resources that don't exist in virtualized environments.
### ksushy BMC Simulator

### kcli Fencing Configuration

The specialized `kcli-redfish.yml` playbook is designed for kcli deployments. **All configuration is automatically detected** - no manual variables required:

```bash
# Configure fencing for kcli-deployed cluster (fully automatic)
ansible-playbook kcli-redfish.yml -i inventory.ini
```

The kcli-redfish playbook automatically:
1. **Detects cluster name** from running kcli clusters or kcli-install defaults
2. **Uses hypervisor IP** from ansible inventory host
3. **Pulls BMC credentials** from kcli-install role defaults
4. **Discovers cluster nodes** from the OpenShift API
5. **Calculates BMC endpoints** using the ksushy simulator configuration
6. **Configures PCS stonith resources** on each node
7. **Enables stonith globally** in the cluster

### Default Configuration

The playbook uses reasonable defaults that work for typical kcli deployments:
The ksushy service provides Redfish BMC simulation for virtual machines:

| Variable | Default Value | Description |
|----------|---------------|-------------|
| `test_cluster_name` | `tnt-cluster` | From kcli-install defaults |
| `ksushy_ip` | `192.168.122.1` | Standard libvirt network gateway |
| `bmc_user` | `admin` | From kcli-install defaults |
| `bmc_password` | `admin123` | From kcli-install defaults |
| `ksushy_port` | `9000` | From kcli-install defaults |

These defaults work for standard kcli deployments where VMs use the default libvirt network (`192.168.122.x/24`).

### Why Not Use redfish.yml?

**Do not use the `redfish.yml` playbook** with kcli deployments. It will fail because:
The ksushy service is managed automatically by `kcli-install.yml`. To verify it is running:

```bash
# This will fail for kcli deployments
ansible-playbook redfish.yml # Expects BMH resources that don't exist

# Use this instead for kcli deployments
ansible-playbook kcli-redfish.yml # Uses defaults optimized for kcli
systemctl --user status ksushy.service
curl -sk https://192.168.122.1:9000/redfish/v1/Systems/local
```

## 8. Troubleshooting
Expand Down
12 changes: 1 addition & 11 deletions deploy/openshift-clusters/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -183,17 +183,7 @@ For more information on STONITH, go to the [official RHEL HA documentation](http

For clusters using the fencing topology on OpenShift 4.19.x, automatic Redfish stonith configuration is available. This feature configures Pacemaker stonith resources using Redfish fencing for BareMetalHost resources.

Redfish configuration can be applied in two ways:

**Integrated Usage:**
- When running the main deployment playbook in interactive mode with fencing topology, you will be prompted to configure Redfish stonith automatically
- Redfish configuration runs as part of the main deployment workflow

**Standalone Usage:**
- Redfish configuration can be run independently using: `ansible-playbook redfish.yml`
- This allows for running it separately from the main deployment or re-running it if needed

For detailed configuration options, verification commands, and requirements, refer to the [Redfish role documentation](roles/redfish/README.md).
Fencing topology clusters use automatic fencing configuration via the cluster-etcd-operator (CEO). The CEO discovers BareMetalHost resources and configures STONITH automatically during installation. No manual Redfish configuration is required.
Comment on lines 184 to +186

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

📐 Maintainability & Code Quality | 🟡 Minor | ⚡ Quick win

Remove the stale 4.19.x qualifier.

This section still limits automatic Redfish STONITH to OpenShift 4.19.x, which conflicts with the new 4.22 flow and makes the docs read as version-locked to an old release. Please update or drop the version reference.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@deploy/openshift-clusters/README.md` around lines 184 - 186, Remove the stale
OpenShift 4.19.x version qualifier from the automatic Redfish STONITH
description in the fencing topology section. Update the wording around the
automatic fencing behavior for cluster-etcd-operator (CEO) and BareMetalHost
resources so it no longer reads as version-locked, and ensure the README text
reflects the current 4.22 flow instead of referencing only 4.19.x.



### Optional: Attaching Extra Disks
Expand Down
11 changes: 1 addition & 10 deletions deploy/openshift-clusters/kcli-install.yml
Original file line number Diff line number Diff line change
Expand Up @@ -75,7 +75,7 @@
tasks_from: prerequisites.yml

roles:
# Start ksushy BEFORE cluster installation (required for 4.20+)
# Start ksushy BMC simulator before cluster installation
- role: kcli/kcli-redfish
when: topology == "fencing"
- kcli/kcli-install
Expand All @@ -95,15 +95,6 @@
- name: Update inventory with cluster VMs
include_tasks: roles/common/tasks/update-cluster-inventory.yml

# Configure stonith fencing after cluster installation
- name: Configure Redfish BMC simulation for fencing topology
shell: ansible-playbook kcli-redfish.yml -i {{ inventory_file | default('inventory.ini') }}
args:
chdir: "{{ playbook_dir }}"
delegate_to: localhost
run_once: true
when: topology == "fencing"

- name: "Final verification message"
ansible.builtin.debug:
msg: |-
Expand Down
101 changes: 0 additions & 101 deletions deploy/openshift-clusters/kcli-redfish.yml

This file was deleted.

92 changes: 0 additions & 92 deletions deploy/openshift-clusters/redfish.yml

This file was deleted.

Original file line number Diff line number Diff line change
Expand Up @@ -24,8 +24,6 @@ export AGENT_E2E_TEST_SCENARIO="TNA_IPV4"
## END Agent Specific Install Config Variables
####

# TechPreview FeatureSet not needed for 4.20 and above OCP
# export FEATURE_SET="TechPreviewNoUpgrade"
export OPENSHIFT_CI="true"

# If you want to avoid using the CI_TOKEN, uncomment this variable, but it has side effects.
Expand All @@ -35,7 +33,7 @@ export OPENSHIFT_CI="true"
# You can find the latest public images in https://quay.io/repository/openshift-release-dev/ocp-release?tab=tags
# and select your preferred version. Public sources can be found at https://mirror.openshift.com/pub/openshift-v4/

export OPENSHIFT_RELEASE_IMAGE=quay.io/openshift-release-dev/ocp-release:4.21.0-x86_64
export OPENSHIFT_RELEASE_IMAGE=quay.io/openshift-release-dev/ocp-release:4.22.0-x86_64

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🎯 Functional Correctness | 🟠 Major | ⚡ Quick win

Add the aarch64 override guidance next to this release image.

This example now hardcodes an x86_64 payload image but still omits the ARM64 notes that config_fencing_example.sh already carries. Copying this file onto Graviton/aarch64 hypervisors will still point users at the wrong payload architecture and miss the required Metal3 container overrides.

Suggested update
 export OPENSHIFT_RELEASE_IMAGE=quay.io/openshift-release-dev/ocp-release:4.22.0-x86_64
+# aarch64 (Graviton): switch the payload image to the matching architecture and
+# override Metal3 infrastructure images with arm64 rebuilds.
+# if [ "$(uname -m)" = "aarch64" ]; then
+#     export OPENSHIFT_RELEASE_IMAGE=quay.io/openshift-release-dev/ocp-release:4.22.0-aarch64
+#     export IRONIC_IMAGE=quay.io/rh-edge-enablement/ironic:2026-06
+#     export VBMC_IMAGE=quay.io/rh-edge-enablement/vbmc:2026-06
+#     export SUSHY_TOOLS_IMAGE=quay.io/rh-edge-enablement/sushy-tools:2026-06
+# fi

Based on learnings, "When reviewing dev-scripts config example shell files in two-node-toolbox, ensure that the ARM64 (aarch64/Graviton) image override variables are handled explicitly... Upstream Metal3 images ... are x86_64-only and will fail on aarch64 hosts with Exec format error."

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
export OPENSHIFT_RELEASE_IMAGE=quay.io/openshift-release-dev/ocp-release:4.22.0-x86_64
export OPENSHIFT_RELEASE_IMAGE=quay.io/openshift-release-dev/ocp-release:4.22.0-x86_64
# aarch64 (Graviton): switch the payload image to the matching architecture and
# override Metal3 infrastructure images with arm64 rebuilds.
# if [ "$(uname -m)" = "aarch64" ]; then
# export OPENSHIFT_RELEASE_IMAGE=quay.io/openshift-release-dev/ocp-release:4.22.0-aarch64
# export IRONIC_IMAGE=quay.io/rh-edge-enablement/ironic:2026-06
# export VBMC_IMAGE=quay.io/rh-edge-enablement/vbmc:2026-06
# export SUSHY_TOOLS_IMAGE=quay.io/rh-edge-enablement/sushy-tools:2026-06
# fi
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In
`@deploy/openshift-clusters/roles/dev-scripts/install-dev/files/config_arbiter_example.sh`
at line 36, The OPENSHIFT_RELEASE_IMAGE example in config_arbiter_example.sh is
x86_64-only and currently lacks the ARM64/aarch64 guidance that
config_fencing_example.sh includes. Update this example to explicitly document
the aarch64/Graviton override variables next to OPENSHIFT_RELEASE_IMAGE,
including the required Metal3 container image overrides, so users copying the
script to ARM64 hosts can switch to the correct payload architecture instead of
the default x86_64 image.

Source: Learnings

# Unless you need to override the installer image, this is not needed
# export OPENSHIFT_INSTALL_RELEASE_IMAGE_OVERRIDE=""

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -5,11 +5,8 @@ export NUM_WORKERS=0
export MASTER_MEMORY=32768
export MASTER_DISK=100
export NUM_MASTERS=2
export FEATURE_SET="DevPreviewNoUpgrade"

# redfish or ipmi, but if not set and using OPENSHIF_CI=true,
# mixed drivers will be used and automatic fencing configuration in 4.19 won't work
export BMC_DRIVER=redfish
# Ensure consistent BMC driver across all hosts for automatic fencing configuration
export BMC_DRIVER=redfish

# If you want to avoid using the CI_TOKEN, uncomment this variable, but it has side effects.
# You can read more on this here: https://github.com/openshift-metal3/dev-scripts/blob/3f070cfd36977381a186cadfb44887856d652bed/config_example.sh#L21
Expand All @@ -22,7 +19,7 @@ export CI_TOKEN="sha256~<PASTE_YOUR_CI_TOKEN_HERE>"
# You can find the latest public images in https://quay.io/repository/openshift-release-dev/ocp-release?tab=tags
# and select your preferred version. Public sources can be found at https://mirror.openshift.com/pub/openshift-v4/

export OPENSHIFT_RELEASE_IMAGE=quay.io/openshift-release-dev/ocp-release:4.21.0-multi
export OPENSHIFT_RELEASE_IMAGE=quay.io/openshift-release-dev/ocp-release:4.22.0-multi
# Unless you need to override the installer image, this is not needed
# export OPENSHIFT_INSTALL_RELEASE_IMAGE_OVERRIDE=""

Expand Down
Loading