Skip to content

feat(packaging): one-command upgrade — auto-migrate the DB safely on dnf/apt update#569

Merged
remyluslosius merged 5 commits into
mainfrom
feat/auto-upgrade-migrate
Jun 16, 2026
Merged

feat(packaging): one-command upgrade — auto-migrate the DB safely on dnf/apt update#569
remyluslosius merged 5 commits into
mainfrom
feat/auto-upgrade-migrate

Conversation

@remyluslosius

Copy link
Copy Markdown
Contributor

Goal

sudo dnf update -y openwatch* (or apt) should be all an operator does — the package handles the app + DB-schema upgrade automatically, with a backup, without breaking the app or losing data. This implements exactly that for the single-instance appliance model.

What happens on upgrade (automatic)

The RPM %post (when $1 -ge 2) / DEB postinst (when the old-version $2 is set) — upgrade only, never a fresh install — runs openwatch-upgrade.sh:

  1. Skip + warn if the DB is unreachable (don't fail the package).
  2. Stop the service (never run a binary against a mismatched schema).
  3. openwatch migrate --backup-dirpg_dump a restore point, then apply. Each migration is transactional (atomic rollback on failure).
  4. Success → start the new version. Failure → leave the service STOPPED, print the restore path, exit non-zero so dnf/apt flag it. Data stays intact (failed migration rolled back).

Safety properties

  • Password never in argvinternal/dbbackup passes connection params to pg_dump via PG* env only (unit-pinned, AC-01).
  • Fail-closed backup — if the backup fails, it does not migrate (AC-03).
  • Fail-safe service state — a failed migration never leaves a running binary on a half-migrated schema (AC-05). It can't be half-migrated anyway (transactional).
  • Never lose the last restore point — the cleanup timer always keeps the most recent dump (AC-06).

Operator surface

  • openwatch migrate --status — see pending migrations before upgrading.
  • /etc/openwatch/upgrade.confAUTO_BACKUP, BACKUP_DIR, BACKUP_RETENTION_DAYS (config-noreplace).
  • openwatch-backup-cleanup.timer (daily prune), enabled on install/upgrade.
  • docs/runbooks/UPGRADING.md — the one-command flow, recovery, restore-from-backup.

Deliberately out of scope (documented)

  • PostgreSQL ENGINE major-version upgrades (e.g. 15→16) — operator-supervised pg_upgrade, never silently from a scriptlet (it can lose the whole DB). Minor/patch Postgres comes via dnf/apt dependencies.
  • Multi-instance / zero-downtime — the appliance model accepts brief downtime during the migrate.

Verification

  • Built both packages; their scriptlets render the upgrade-only guard; payloads ship the helper, cleanup script, two systemd units, the empty /var/lib/openwatch/backups dir, and upgrade.conf (config).
  • End-to-end against a live DB: migrate --status reports correctly; migrate --backup-dir produced a valid 82 KB pg_dump then migrated; DSN redacted in logs.
  • gofmt/go vet/go build clean; specter check 107 specs; dbbackup + release-upgrade AC-01..07 tests pass.

Spec: new release-upgrade (C-01..05 / AC-01..07).

`dnf update openwatch*` (or apt) now applies pending DB migrations
automatically, with a restore point and a fail-safe service state — for
the single-instance appliance model. The operator runs one command.

Mechanism:
- internal/dbbackup: builds a pg_dump command that takes connection
  params (esp. the password) via PG* env, NEVER argv (no ps leak).
- `openwatch migrate` gains --status (report pending without applying)
  and --backup-dir (pg_dump restore point BEFORE Apply; skipped on a
  fresh DB; FAILS CLOSED — never migrates if the backup fails).
- packaging/common/openwatch-upgrade.sh (RPM %post when $1>=2 / DEB
  postinst when old-version $2 is set — upgrade only, never fresh
  install): stop -> openwatch migrate --backup-dir -> start on success;
  on failure leave the service STOPPED + print the restore path + exit
  non-zero so the package manager surfaces it. Migrations are
  transactional, so a failure rolls back atomically (data intact).
- Backup retention: openwatch-backup-cleanup.timer (daily) prunes dumps
  past BACKUP_RETENTION_DAYS but always keeps the most recent.
  Operator-tunable via /etc/openwatch/upgrade.conf (AUTO_BACKUP, etc.).

Deliberately OUT of scope (documented): PostgreSQL ENGINE major-version
upgrades (operator-supervised pg_upgrade, never from a scriptlet), and
multi-instance/zero-downtime upgrades.

Spec release-upgrade (C-01..05 / AC-01..07); docs/runbooks/UPGRADING.md.
Verified end to end: migrate --status + --backup-dir produce a real
pg_dump then migrate against a live DB; both packages ship + render the
upgrade-only guard.
@github-actions github-actions Bot added documentation Improvements or additions to documentation size/XL labels Jun 16, 2026
Proves the real upgrade path end to end, beyond the source-inspection +
scriptlet-logic tests: install the OLD openwatch RPM (release 1), stand up
Postgres, roll the schema back one migration to simulate the prior
version, then `rpm -U` the NEW RPM (release 2) and assert the package's
%post scriptlet migrated the DB to head (34 -> 35, host_connection_profile
created), took a pre-upgrade backup, and issued the service stop/start.

- packaging/tests/upgrade-container-test.sh runs inside rockylinux:9 (a
  systemctl shim records stop/start since the container has no systemd).
- packaging/tests/run-upgrade-container-test.sh is the one-command host
  driver: builds the two RPM releases and runs the container.
- openwatch-upgrade.sh: OPENWATCH_UPGRADE_CONF / OPENWATCH_SECRETS_ENV
  overrides (default to the production /etc paths) so the test can point
  config + secrets at a scratch fixture. No production behavior change.

Verified locally: RESULT PASS (34 -> 35, backup taken, stop+start issued).
The two scripts were silently caught by .gitignore's blanket `*test*.sh`
and dropped from the previous commit. Add a `!packaging/tests/*test*.sh`
exception so committed test-harness scripts are tracked, and add the
container upgrade test + its host driver.
Adds an `upgrade` job to package-smoke that exercises the full package
UPGRADE path the per-distro `smoke` (fresh install) job can't: build an
old (release 1) + new (release 2) RPM, then in a rockylinux:9 container
install the old, stand up Postgres, roll the schema back one migration,
and `rpm -U` the new — asserting the %post scriptlet migrates the DB to
head, takes a pre-upgrade backup, and stop/starts the service. Runs the
committed packaging/tests/run-upgrade-container-test.sh driver. Fires on
the same packaging-change / tag / dispatch triggers as the rest of the
workflow, so it runs on this PR.
The command is the constant pg_dump, args are package-controlled flags +
an output path (never user input), and connection params travel via env
not argv. Annotate both exec sites with // #nosec G204 (the repo's
convention) so make lint passes.
@remyluslosius remyluslosius merged commit ca9e440 into main Jun 16, 2026
21 checks passed
@remyluslosius remyluslosius deleted the feat/auto-upgrade-migrate branch June 16, 2026 04:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

documentation Improvements or additions to documentation size/XL

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant