Skip to content

feat: NixOS migration infrastructure#433

Open
mrosseel wants to merge 7 commits into
brickbots:mainfrom
mrosseel:migration
Open

feat: NixOS migration infrastructure#433
mrosseel wants to merge 7 commits into
brickbots:mainfrom
mrosseel:migration

Conversation

@mrosseel
Copy link
Copy Markdown
Collaborator

@mrosseel mrosseel commented May 24, 2026

Summary

Lets an existing PiFinder install migrate in-place to a NixOS-based PiFinder build, driven from the existing Software-update screen.

What's included

Python

  • PiFinder/ui/software.py — migration-confirm screen + on-device progress flow alongside the existing software update path. Validates the connected display is a supported migration display before allowing migration.
  • PiFinder/sys_utils.pystart_nixos_migration(version_info) plumbing: requires a SHA256, forwards display class + resolution to the script.
  • tests/test_software.py — unit coverage for the migration UI flow.

Scripts (under python/scripts/)

  • nixos_migration.sh — pre-migration host phase: validates env, downloads + verifies the tarball, assembles the initramfs (busybox + fs tools + SPI kernel modules), configures /boot/config.txt to boot it.
  • nixos_migration_init.sh — initramfs entry point: saves WiFi + user backup to RAM (per-file sizing with pifinder.log truncation allowance), DDs the image, expands the partition, restores data, reboots into NixOS. WiFi keyfile emission hex-encodes the SSID, escapes the PSK, and sanitizes the filename.
  • nixos_migration_calc.py — pre-flight checks (RAM, free space, WiFi mode, supported display, stock SD layout) with a JSON output mode for the shell wrapper. Migration aborts if / is not on /dev/mmcblk0p2 or if the SD has more than the expected two partitions (USB/NVMe boot, custom layouts).
  • migration_progress.c + pre-compiled migration_progress binary — OLED progress UI rendered directly from the initramfs (no Python). Picks the SPI controller and layout from MIGRATION_DISPLAY_CLASS / MIGRATION_DISPLAY_RESOLUTION env vars passed through the initramfs.

Other

  • migration_gate.json — server-side config fetched from the release branch of brickbots/PiFinder. Two fields: nixos_for_everyone (the gate — the migration UI only appears when this is true) and nixos_url (the tarball location). The .sha256 URL is derived from nixos_url.
  • noxfile.py — narrows mypy's first run to PiFinder/ (avoids the broken tetra3 symlink in the tree).

Test plan

  • nox -s lint type_hints smoke_tests unit_tests passes.
  • End-to-end migration on a real device (SD card flash → tarball extract → first boot → OLED progress → completion) on each supported hardware variant.
  • Existing software-update screen on a non-NixOS PiFinder install still works (no regression).

Notes for review

  • The migration_progress binary is pre-compiled and committed (~80 KB) so the initramfs is self-contained — one-shot binary for the migration release.
  • SUPPORTED_DISPLAYS is duplicated between software.py (early UI gate) and nixos_migration_calc.py (pre-flight gate). Belt-and-braces; both must be updated together if a new display ships.

@mrosseel mrosseel changed the title feat: NixOS migration infrastructure + observing-list support feat: NixOS migration infrastructure May 24, 2026
upstream defines utils.tetra3_dir as the inner package path
(python/PiFinder/tetra3/tetra3) and every sys.path.append(tetra3_dir)
site relies on that. migration had the short submodule-root path plus
an extra sys.path.append(tetra3_dir / 'tetra3') workaround in
solver_main.py and ui/preview.py, but solver.py never got the
workaround — so 'from tetra3 import cedar_detect_client' fails when
the inner module does the bare 'import cedar_detect_pb2'.

Take upstream's pattern verbatim: long tetra3_dir, single
sys.path.append, no workaround. Fixes PR brickbots#433's nox ui_tests
failure.
Two coupled changes for upstream's new test_all_ui_modules_covered
guard (PR brickbots#438):

- Wire UIMigrationConfirm and UIMigrationProgress into _DYNAMIC_IDS
  with item_definition fixtures. Their __init__ methods use .get()
  with defaults so a stub version_info dict exercises construction
  + key handlers.

- UIReleaseNotes stays in _COVERAGE_SKIP — its active() fetches
  markdown over HTTP and needs a network mock.

- UIMigrationProgress.update() was crashing under the smoke harness
  because sys_utils mock returned MagicMock for percent/status.
  Coerce percent to int and accept status only as str; on bad data
  keep the prior value. This also hardens against a corrupt
  /tmp/nixos_migration_progress JSON file at runtime.
mrosseel added 3 commits May 25, 2026 17:41
Dead code dropped from the tree:
  * python/PiFinder/sys_utils_nixos.py (~591 lines): never imported,
    get_sys_utils() has no NixOS dispatch path. The NixOS-side system
    utilities ship inside the migration tarball as
    python/PiFinder/sys_utils.py on the nixos branch.
  * python/pyproject.toml dbus/gi mypy ignores: only made sense for the
    above; belong on the nixos branch.
  * python/scripts/migration_calc.py (~509 lines): described an
    unimplemented A/B-partition layout; no caller in the active flow
    (nixos_migration.sh uses nixos_migration_calc.py instead).

Out-of-tree (moved to local notes):
  * MIGRATION_BRANCH_STATE.md (107 lines): internal hand-off notes, not
    user-facing docs.
  * python/scripts/test_migration_loopdev.sh (~498 lines): offline test
    harness that re-implemented an older design (tar.gz + magic-header
    staging) rather than invoking nixos_migration_init.sh — it had
    already drifted from the real flow (which is tar.zst + RAM-staged).
    Useful as a future starting point for a real integration test but
    misleading to ship in the repo.

Documentation:
  * migration_gate.txt: add header comment explaining the killswitch
    contract; update _fetch_migration_gate parser to skip "#" lines so
    the file is self-documenting without breaking semantics.
…-fail)

Three small fixes from the PR review:

- ui/software.py UIMigrationProgress.update: replace the redundant
  `except (AttributeError, Exception)` with a targeted `AttributeError`
  guard around the sys_utils.get_migration_progress() lookup (only
  needed when running against sys_utils_fake). The helper itself
  swallows OS/JSON errors and returns {}, so wrapping everything in
  `except Exception` was hiding real failures from the polling loop.

- ui/software.py get_release_version: add timeout=REQUEST_TIMEOUT to the
  requests.get call (it previously had none and would hang the UI
  thread if GitHub stalled). Widen the except to RequestException so
  Timeout, ReadTimeout, etc. are all caught.

- sys_utils.start_nixos_migration: hard-fail with ValueError when neither
  migration_sha256_url nor migration_sha256 produces a value. Previously
  the helper logged a warning and returned "", which the migration
  script then treated as "skip checksum verification". An in-place OS
  replacement must not run without integrity verification.
The initramfs WiFi-migration step generated NetworkManager .nmconnection
files by interpolating the SSID and PSK directly from wpa_supplicant.conf
into a heredoc. SSIDs containing characters with semantic meaning in NM
keyfile format (semicolon, brackets, equals, leading/trailing whitespace)
or in the filesystem (slash, NUL, "..") could break the connection file,
the file name, or both.

Failure mode: WiFi config goes missing after migration -> headless device
is unreachable until re-flashed.

Fixes:

- Encode SSID as semicolon-separated hex bytes (ssid=4d;79;...). This
  is NM keyfile's standard binary form and is safe for any byte content
  including non-ASCII and special chars.
- Escape the id= and psk= values for NM keyfile format: backslashes
  doubled, semicolons backslashed.
- Sanitize the filename to [A-Za-z0-9._-]; empty / "." / ".." after
  sanitization fall back to "wifi".
- Use printf %s instead of echo when feeding the parser, so SSIDs
  starting with "-" or containing backslash escapes are not mangled by
  echo's flag interpretation.

Verified end-to-end with a sample wpa_supplicant.conf containing spaces,
slashes, and a semicolon in the PSK -- files generated cleanly with the
expected escaping.
@mrosseel mrosseel force-pushed the migration branch 8 times, most recently from 678f552 to 18c22fd Compare May 26, 2026 08:31
ui: render migration status from frame 0, expose underlying error,
and allow back/exit on terminal pre-start failure so the user isn't
trapped on the failure screen.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant