Fix notifier rc handling and spurious 'session finalized as incomplete' WARN#4
Open
hostarts wants to merge 2 commits into
Open
Fix notifier rc handling and spurious 'session finalized as incomplete' WARN#4hostarts wants to merge 2 commits into
hostarts wants to merge 2 commits into
Conversation
Mirrors the existing email module so Slack delivery is wired into the same four call sites (cleanup_on_exit, handle_sigterm, replicate-only end, normal session end). This means failure-path notifications fire even when a run aborts before the normal end-of-main code path. The two notifiers are independent: SLACK_ENABLED, SLACK_ON_SUCCESS, and SLACK_ON_FAILURE let operators run Slack-only, email-only, or both. Session totals (VMs ok/failed/skipped/excluded, total bytes, duration) are pulled from the same sessions row the email module uses via sqlite_query_session_summary. curl is the only new dependency.
Two distinct bugs in the post-session notification + cleanup paths: 1. send_backup_report() returns 2 for intentional skip (module disabled, EMAIL_ON_SUCCESS=no, etc.) and 1 for real transport failure. All four call sites used 'if send_backup_report; then INFO; else WARN; fi', collapsing the skip case into the failure log line. New _handle_notifier_rc() helper interprets the rc correctly and is wired into all email and Slack call sites. As a side benefit, the sent-guard flags (_EMAIL_SENT / _SLACK_SENT) are also set on intentional skip so later code paths don't retry a notification the operator suppressed. 2. cleanup_on_exit's catch-all sqlite_session_end ran on every exit. On a successful run main() had already ended the session as 'success'; sqlite_session_end's idempotency guard correctly dropped the duplicate DB write, but the surrounding log line still claimed the session was finalized as 'incomplete'. Catch-all is now gated on _SQLITE_SESSION_ENDED != 1 so the misleading WARN no longer fires when the normal exit path already finalized the session.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Two related log-noise / correctness issues observed on every successful run, both visible in the same trailing log block:
Neither WARN reflects an actual problem.
Bug 1 — Intentional notifier skip logged as failure
send_backup_report()distinguishes:0— delivered1— transport failure2— intentionally skipped (module disabled,EMAIL_ON_SUCCESS=no,EMAIL_ON_FAILURE=no)But all four call sites used the binary form
if send_backup_report ...; then INFO; else WARN; fi, collapsing rc=2 into the failure log line. Operators running withEMAIL_ON_SUCCESS=no(very common — "only email me when something breaks") get a misleading WARN on every successful run.This PR adds a small
_handle_notifier_rc()helper that interprets the three return codes correctly and is wired into all eight call sites (4× email, 4× Slack from the sister Slack PR).Side benefit:
_EMAIL_SENT/_SLACK_SENTsent-guards are also set on intentional skip, so later code paths (handle_sigterm,cleanup_on_exit) don't try to re-send a notification the operator explicitly suppressed.Bug 2 —
cleanup_on_exitoverwrites successful session log lineThe catch-all
sqlite_session_endblock incleanup_on_exit()exists to finalize sessions on early-error paths and signal exits. On a normal successful exit,main()has already calledsqlite_session_endwithstatus='success', and_SQLITE_SESSION_ENDEDis set. The idempotency guard insidesqlite_session_end()correctly drops the duplicate DB write — but the surroundingvmbackup.shlog line still emits:…which suggests the session was downgraded when in fact the DB is fine. Fix: gate the catch-all block on
_SQLITE_SESSION_ENDED != 1.What changed
vmbackup.sh:_handle_notifier_rc()helper (24 lines, defined just abovecleanup_on_exit).if send_X; then INFO; else WARN; fitosend_X || _rc=$?; _handle_notifier_rc ....cleanup_on_exit's SQLite finalize block gated on_SQLITE_SESSION_ENDED != 1(avoid spurious WARN; existing idempotency guard insidesqlite_session_end()is unchanged and continues to protect data).CHANGELOG.md:[Unreleased]entries under### Fixed.Dependency
Touches both the email and Slack call sites. Based on
feat/slack-notifications(#3); please merge #3 first or merge both together.Test plan
bash -n vmbackup.shpasses.EMAIL_ON_SUCCESS=no.🤖 Generated with Claude Code