p2 of recovery_pause_on_logical_slot_conflict (auto-resume)#30
Open
NikolayS wants to merge 5 commits into
Open
p2 of recovery_pause_on_logical_slot_conflict (auto-resume)#30NikolayS wants to merge 5 commits into
NikolayS wants to merge 5 commits into
Conversation
Add a new GUC, recovery_pause_on_logical_slot_conflict (PGC_SIGHUP, default off). When enabled, WAL replay on a standby pauses instead of invalidating an active logical replication slot whose catalog_xmin would be overtaken by a Heap2/PRUNE_ON_ACCESS record's snapshotConflictHorizon. An operator can then drain the slot via pg_logical_slot_get_changes and call pg_wal_replay_resume() to continue. On resume, the patch advances the drained slot's catalog_xmin past the conflict horizon so the subsequent InvalidateObsoleteReplicationSlots call becomes a no-op; replay continues to the next conflict and the cycle repeats. This makes logical decoding from an archive-only standby (no streaming replication link to the primary) viable for continuous CDC. Without this GUC, slots on such standbys are invalidated the first time replay applies a catalog vacuum record whose horizon exceeds the slot's catalog_xmin — typically ~2 * autovacuum_naptime after slot creation. Hooks into ResolveRecoveryConflictWithSnapshot(), the single choke point in the replay path for RS_INVAL_HORIZON conflicts, via a new MaybePauseOnLogicalSlotConflict() function. Reuses the existing SetRecoveryPause / recoveryNotPausedCV machinery — no new shared-memory state. Hot path when GUC off is one boolean early-return. Edge cases handled: - Slots still inside DecodingContextFindStartpoint (effective_catalog_xmin not yet valid) are skipped. Pausing for them would deadlock: snapbuild needs WAL to advance, pause holds it back. Invalidating an in-progress slot is harmless — the caller retries. - Pause-check uses TransactionIdPrecedesOrEquals to match the semantics of DetermineSlotInvalidationCause. Without that, a slot whose catalog_xmin was just advanced to horizon+1 by a previous pause cycle would fail to re-pause on a subsequent record with horizon == horizon+1, yet would still be invalidated by the fall-through. - CheckForStandbyTrigger() is called in the wait loop so pg_promote() does not stall while paused. Mirrors the existing recoveryPausesHere escape loop. - Synced slots (data.synced == true, i.e. managed by the slot-sync worker per sync_replication_slots) are skipped in both the pause-check and advance scans. Writing to their fields from the startup process would race with the slot-sync worker, and ALTER / DROP_REPLICATION_SLOT on a synced slot errors out — so the operator-facing "drain or drop" recipe does not apply. ConfirmRecoveryPaused() and CheckForStandbyTrigger() are made extern for use by MaybePauseOnLogicalSlotConflict's wait loop — the pause is entered from inside ResolveRecoveryConflictWithSnapshot rather than the main replay loop, so we need to transition RECOVERY_PAUSE_REQUESTED -> RECOVERY_PAUSED ourselves and consume PROMOTE_SIGNAL_FILE ourselves. Known limitation: the advance marks slots dirty but does not force an immediate SaveSlotToPath. If the standby crashes between resume and the next restartpoint, the advance is lost — on restart replay re-encounters the same conflict record, re-pauses, and the operator re-drains (idempotent). A future iteration could tighten this.
10 assertions, ~30 wallclock seconds. Two-phase flow: Phase 1 sets up an archive-only standby from a clean basebackup + pg_log_standby_snapshot and creates logical slots on TWO standbys while the archive contains no catalog-prune records. One standby has the GUC on, the other off. Phase 2 then runs catalog- churning workload on the primary (transient tables + VACUUM on pg_class, pg_attribute, pg_type, pg_depend, pg_statistic) and waits for those segments to archive. When the standbys replay through those segments, the GUC-on one pauses; a Perl orchestrator drains the slot with pg_logical_slot_get_changes and calls pg_wal_replay_resume. The GUC-off baseline standby lets its slot invalidate — the upstream default behavior, unchanged. A third standby is created after Phase 2 archives (so its replay will pause quickly on first conflict record). The test then calls pg_promote(wait=>true, wait_seconds=>30) on the paused standby and asserts that promote returns true in under 10 seconds. Guards the CheckForStandbyTrigger() escape path — without that, pg_promote stalls for the full wait_seconds and returns false. Assertions: ok 1 - GUC is registered ok 2 - slot created cleanly in Phase 1 (GUC on, state: reserved) ok 3 - baseline slot created cleanly in Phase 1 (GUC off, reserved) ok 4 - slot survived catalog prune with GUC on (reserved) ok 5 - at least one pause event was handled ok 6 - at least 2000 decoded events ok 7 - baseline (GUC off): slot invalidates as expected (lost) ok 8 - promote-test standby reached paused state before promotion ok 9 - pg_promote returned true while standby was paused by GUC ok 10 - pg_promote completed in under 10s
Extract four named subs so the top-level script reads as a sequence of phases rather than one long procedure. No behavior change: all 10 assertions are preserved verbatim, as are the load-bearing comments (two-phase rationale, double pg_switch_wal rationale, GUC-off baseline rationale, pg_promote escape-path rationale). Helpers extracted: * setup_primary_with_clean_archive * create_archive_standby * run_catalog_churn * drain_and_resume_loop * wait_for_replay_paused
The previous behavior under recovery_pause_on_logical_slot_conflict
required the operator to both drain (or drop / advance) the slot AND
call pg_wal_replay_resume() to continue — two steps, even though the
first step is the one that matters semantically. That split also meant
the feature couldn't underpin a continuous-CDC service without
external orchestration to issue the resume.
Lift the scan predicate ("does any slot in `dboid` still block this
conflict?") out of the initial check into a helper
AnySlotStillBlocksConflict(). Call it again every 1s inside the
existing wait loop. When it returns false, flip the pause state to
NOT_PAUSED and let the loop exit; the existing post-wait advance then
bumps catalog_xmin past the horizon on drained slots so the
fall-through InvalidateObsoleteReplicationSlots() is a no-op.
"No longer blocking" covers every unblock path, not just drain:
* drained past the pause LSN (confirmed_flush >= captured
conflict_lsn) — the main case
* slot dropped (pg_drop_replication_slot) — removed from the scan
* slot advanced (pg_replication_slot_advance) — catalog_xmin moves
past the horizon
* slot invalidated for another reason (e.g. RS_INVAL_WAL_REMOVED
from max_slot_wal_keep_size, applied by the checkpointer, which
runs even while the startup process is asleep in our wait loop)
— data.invalidated != RS_INVAL_NONE, scan skips it
Manual pg_wal_replay_resume() still works as the "give up on this
slot and let it invalidate" escape hatch, and CheckForStandbyTrigger
still breaks the loop for pg_promote().
Capture conflict_lsn once at pause time and reuse it for both the
in-wait predicate and the post-wait advance, replacing the redundant
second GetXLogReplayRecPtr() call.
GUC long_desc, postgresql.conf.sample comment, and the xlogrecovery.c
variable-decl comment updated to describe auto-resume.
69fb1b6 to
39adedd
Compare
Combines PR 27 (pause-on-conflict + TAP test) and PR 30 (refactor + auto-resume) into the story we would send to -hackers. Covers motivation, mechanism, edge cases, known limitations, tests, files touched, and open questions (GUC name, single-vs-mode flag, persistence, scope). Draft only — not sent.
ffd897c to
0ce8b52
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
No description provided.