Skip to content

Fix ISO end-of-install reboot hang#19

Open
Mondrethos wants to merge 9 commits into
mainfrom
fix-iso-reboot-hang
Open

Fix ISO end-of-install reboot hang#19
Mondrethos wants to merge 9 commits into
mainfrom
fix-iso-reboot-hang

Conversation

@Mondrethos

Copy link
Copy Markdown
Owner

Problem

ISOs from the Generate ISO workflow hang on a grey/white screen at the end-of-install reboot — reproducible by multiple people, including in VMs. Only a hard power-cycle recovers.

Root cause

Live-ISO overlay shutdown hang (systemd#17988): the installer can't unmount the live medium at shutdown (/run/live/medium + loop device stay busy), so the reboot stalls forever. Reproduces in VMs because it's inherent to the live image, not the firmware. Fedora's own Silverblue ISO behaves the same (fedora-silverblue#560).

Fix

bluebuild generate-iso hides the boot-param knob we need. This drives JasonN3/build-container-installer@v1.4.0 directly, which:

  • Keeps Secure Boot MOK pre-enrollment via the action's native secure_boot_key_url + enrollment_password inputs (same cert + monolith password).
  • Adds extra_boot_params: rd.live.overlay.overlayfs=1 — native overlayfs unmounts cleanly at shutdown where the legacy dm-based overlay leaves the medium busy.

rd.live.ram would also free the medium but needs ~16 GB RAM for our ~7 GB image, so it isn't viable.

Testing

Run the Generate ISO workflow on this branch (workflow_dispatch), install the ISO in a VM, confirm the end-of-install reboot completes instead of hanging.

⚠️ This is an upstream live-env bug; the overlayfs flag is the highest-probability no-cost fix but may need one iteration. Fallback is a custom Lorax additional_templates forced-reboot.

…er directly

The bluebuild generate-iso wrapper only exposes variant/secure-boot-url/
enrollment-password, so generated ISOs hung on a grey screen at the
end-of-install reboot: the live medium can't be unmounted at shutdown
(systemd#17988), stalling the reboot until a hard power-cycle. It reproduces
in VMs because it's inherent to the live image, not the firmware.

Call JasonN3/build-container-installer@v1.4.0 directly instead. This keeps the
same Secure Boot MOK pre-enrollment via the action's native secure_boot_key_url
and enrollment_password inputs, and lets us pass extra_boot_params to fix the
hang: rd.live.overlay.overlayfs=1 uses native overlayfs, which unmounts cleanly
at shutdown. (rd.live.ram would also free the medium but needs ~16 GB RAM for
our ~7 GB image, so it's not viable.)
@github-actions

Copy link
Copy Markdown

🧪 Test this PR on a real install

Once the build checks on this PR pass, a signed test image is published for each edition. Pick the one matching your hardware and, from an existing Monolith install (which already has the signing policy), rebase onto it:

monolith-gnome

rpm-ostree rebase ostree-image-signed:docker://ghcr.io/mondrethos/monolith-gnome:pr-19-44
systemctl reboot

monolith-gnome-nvidia

rpm-ostree rebase ostree-image-signed:docker://ghcr.io/mondrethos/monolith-gnome-nvidia:pr-19-44
systemctl reboot

The tags are rebuilt on every new commit here, so rpm-ostree upgrade pulls the latest build. When you're done testing, return to your edition's released image (:latest).

The test tags stop updating once this PR is merged or closed.

The ISO pre-staging (build-container-installer secure_boot_key_url/
enrollment_password) only saved users from running one command, and it's a
footgun: if the ISO's key drifts from the image's key, installs black-screen
(cf. ublue/bazzite#688). Secure Boot is fully image-side anyway -- kernel and
modules are signed at build time and the public cert ships in the image -- so
the ISO doesn't need to touch it.

Drop secure_boot_key_url/enrollment_password from the ISO job and the
BB_GENISO_* vars from the justfile. Enrollment is now uniform regardless of
install method: ujust enroll-monolith-secure-boot-key. This also frees the ISO
tooling choice, since Secure Boot no longer constrains it.

Also correct the reboot-fix comment: rd.live.overlay.overlayfs=1 did NOT clear
the end-of-install hang in testing.
The end-of-install reboot hung on a grey screen until a hard power-cycle. The
earlier rd.live.overlay.overlayfs=1 attempt was a no-op -- this is an Anaconda
installer image, not a dmsquash-live image, so rd.live.* params don't apply.

Anaconda reboots with 'systemctl --no-wall reboot' (no --force), which runs the
full systemd shutdown and stalls tearing down the busy installer root. Add a
Lorax template (installer/force-reboot.tmpl, passed via additional_templates)
that patches the installer environment:
- systemd-reboot.service -> 'systemctl --force --force reboot' (reboot(2) syscall,
  skips the unmount loop)
- DefaultTimeoutStopSec=15s so a hung stop job earlier in the transition is
  abandoned fast instead of blocking forever

Re-add actions/checkout (the template must be in the workspace, mounted at
/github/workspace/ in the build container) and pin the artifact upload to the
explicit ISO filename so the checked-out repo isn't uploaded too.
The forced-reboot Lorax template did not clear the end-of-install grey-screen
hang in VM testing, same as the earlier rd.live.overlay boot param. Remove that
machinery (installer/force-reboot.tmpl, the checkout step, additional_templates)
and instead enable Anaconda's WebUI installer (web_ui: true), Fedora's default
since F43, as an alternate installer/reboot path.
The Anaconda installer ISO (build-container-installer) hung on a grey screen at
the end-of-install reboot: its installer environment had no clean way to unmount
the live medium and detach the loop device at shutdown. Replace it with a live
ISO built by ublue-os/titanoboa, whose dracut-live initramfs tears that overlay
down cleanly on reboot.

titanoboa's contract takes only an image reference, so bake everything the live
medium needs into a transient layer (iso/) built on top of the published edition
image: regenerate the initramfs with dmsquash-live, enable the GNOME live
session, stage EFI binaries where the contract expects them, and drop in
iso.yaml. Local 'just generate-iso' mirrors the workflow.
titanoboa's main.sh cd's into its own action directory and resolves iso-dest
relative to that, so the ISO does not land where the workflow expects and the
upload found no file. Read the real path from the step's iso-dest output, move
it to output/ under a clean name with a checksum, and upload that.
The live session was a zirconium-style minimal image with no installer, so
installing meant hand-typing bootc install. Bake Fedora's Anaconda WebUI into
the live layer (anaconda-live + anaconda-webui) with a Monolith profile, a
default kickstart that installs the edition this ISO was built from, and a dock
launcher, so installing is click-through. The kickstart pulls the image from the
registry (titanoboa squashes only the rootfs) and repoints the installed system
at the cosign-signed image for verified updates. The edition image ref is
threaded in from the Containerfile's BASE_IMAGE. No Secure Boot kickstart --
enrollment stays image-side via ujust.
…utput

titanoboa only masters a UEFI El Torito + GPT ESP, so its ISOs don't boot under
legacy BIOS -- which is why they fail in GNOME Boxes / VirtualBox at their
default firmware and on old CSM-only hardware. Vendor titanoboa's build_iso.sh
(pinned to main @ 5c457c3) and patch only the boot mastering: install
grub2-tools/grub2-pc-modules, build a self-contained i386-pc El Torito core, and
extend the xorriso command with the BIOS boot entry + grub2 hybrid MBR while
keeping the existing UEFI entry. Run it in a Fedora container the same way
titanoboa's main.sh does, dropping the titanoboa action dependency (which also
insulates us from its churn). Local smoke test confirms the ISO carries both a
BIOS and a UEFI El Torito image plus a GPT EFI System partition.
The live session's Install button launched liveinst but nothing appeared:
Anaconda's WebUI draws its interface via /usr/libexec/anaconda/webui-desktop,
which runs firefox + cockpit-ws. anaconda-webui doesn't hard-require firefox and
Monolith removes it from the image, so the installer had no browser to render
in. Add firefox to the live-prep layer only; the installed system stays
firefox-free.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant