Skip to content

Fix notifier rc handling and spurious 'session finalized as incomplete' WARN#4

Open
hostarts wants to merge 2 commits into
doutsis:mainfrom
hostarts:fix/notifier-rc-and-session-finalize
Open

Fix notifier rc handling and spurious 'session finalized as incomplete' WARN#4
hostarts wants to merge 2 commits into
doutsis:mainfrom
hostarts:fix/notifier-rc-and-session-finalize

Conversation

@hostarts
Copy link
Copy Markdown

Summary

Two related log-noise / correctness issues observed on every successful run, both visible in the same trailing log block:

[INFO] [main] Sending email report to ...
Skipping email report (EMAIL_ON_SUCCESS=no)
[WARN] [main] Failed to send email report (backup data preserved)
[INFO] [main] Slack notification sent
[INFO] [main] Session ended successfully - exit code 0
[WARN] [cleanup_on_exit] SQLite session finalized as 'incomplete' (exit code 0)

Neither WARN reflects an actual problem.

Bug 1 — Intentional notifier skip logged as failure

send_backup_report() distinguishes:

  • 0 — delivered
  • 1 — transport failure
  • 2 — intentionally skipped (module disabled, EMAIL_ON_SUCCESS=no, EMAIL_ON_FAILURE=no)

But all four call sites used the binary form if send_backup_report ...; then INFO; else WARN; fi, collapsing rc=2 into the failure log line. Operators running with EMAIL_ON_SUCCESS=no (very common — "only email me when something breaks") get a misleading WARN on every successful run.

This PR adds a small _handle_notifier_rc() helper that interprets the three return codes correctly and is wired into all eight call sites (4× email, 4× Slack from the sister Slack PR).

Side benefit: _EMAIL_SENT / _SLACK_SENT sent-guards are also set on intentional skip, so later code paths (handle_sigterm, cleanup_on_exit) don't try to re-send a notification the operator explicitly suppressed.

Bug 2 — cleanup_on_exit overwrites successful session log line

The catch-all sqlite_session_end block in cleanup_on_exit() exists to finalize sessions on early-error paths and signal exits. On a normal successful exit, main() has already called sqlite_session_end with status='success', and _SQLITE_SESSION_ENDED is set. The idempotency guard inside sqlite_session_end() correctly drops the duplicate DB write — but the surrounding vmbackup.sh log line still emits:

[WARN] [cleanup_on_exit] SQLite session finalized as 'incomplete' (exit code 0)

…which suggests the session was downgraded when in fact the DB is fine. Fix: gate the catch-all block on _SQLITE_SESSION_ENDED != 1.

What changed

  • vmbackup.sh:
    • New _handle_notifier_rc() helper (24 lines, defined just above cleanup_on_exit).
    • All 8 notifier call sites converted from if send_X; then INFO; else WARN; fi to send_X || _rc=$?; _handle_notifier_rc ....
    • cleanup_on_exit's SQLite finalize block gated on _SQLITE_SESSION_ENDED != 1 (avoid spurious WARN; existing idempotency guard inside sqlite_session_end() is unchanged and continues to protect data).
  • CHANGELOG.md: [Unreleased] entries under ### Fixed.

Dependency

Touches both the email and Slack call sites. Based on feat/slack-notifications (#3); please merge #3 first or merge both together.

Test plan

  • Helper return-code matrix (rc=0,1,2,3) verified manually: rc=0→INFO+SENT, rc=2→DEBUG+SENT, rc=1/other→WARN+!SENT.
  • bash -n vmbackup.sh passes.
  • Real backup run on a production host: previous "Failed to send email report" + "finalized as 'incomplete'" WARNs no longer fire on a successful single-VM run with EMAIL_ON_SUCCESS=no.

🤖 Generated with Claude Code

Mirrors the existing email module so Slack delivery is wired into the
same four call sites (cleanup_on_exit, handle_sigterm, replicate-only
end, normal session end). This means failure-path notifications fire
even when a run aborts before the normal end-of-main code path.

The two notifiers are independent: SLACK_ENABLED, SLACK_ON_SUCCESS,
and SLACK_ON_FAILURE let operators run Slack-only, email-only, or
both. Session totals (VMs ok/failed/skipped/excluded, total bytes,
duration) are pulled from the same sessions row the email module uses
via sqlite_query_session_summary. curl is the only new dependency.
Two distinct bugs in the post-session notification + cleanup paths:

1. send_backup_report() returns 2 for intentional skip (module disabled,
   EMAIL_ON_SUCCESS=no, etc.) and 1 for real transport failure. All four
   call sites used 'if send_backup_report; then INFO; else WARN; fi',
   collapsing the skip case into the failure log line. New
   _handle_notifier_rc() helper interprets the rc correctly and is wired
   into all email and Slack call sites. As a side benefit, the sent-guard
   flags (_EMAIL_SENT / _SLACK_SENT) are also set on intentional skip so
   later code paths don't retry a notification the operator suppressed.

2. cleanup_on_exit's catch-all sqlite_session_end ran on every exit. On a
   successful run main() had already ended the session as 'success';
   sqlite_session_end's idempotency guard correctly dropped the duplicate
   DB write, but the surrounding log line still claimed the session was
   finalized as 'incomplete'. Catch-all is now gated on
   _SQLITE_SESSION_ENDED != 1 so the misleading WARN no longer fires when
   the normal exit path already finalized the session.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants