Skip to content

Test design: end-to-end PoC for archive-fed logical decoding (recovery_pause_on_logical_slot_conflict) #43

@NikolayS

Description

@NikolayS

Goal

Empirically prove archive-fed logical decoding works end-to-end: a disposable standby that recovers only from archived WAL (no connection to the primary) keeps its logical slot alive across catalog-prune recovery conflicts — pausing recovery and then auto-resuming — while a consumer receives the complete change stream with no gaps.

This complements the in-tree TAP test src/test/recovery/t/054_recovery_pause_on_slot_conflict.pl. That TAP test uses the framework's own archiving and validates the mechanism inside core. This PoC is the real-world end-to-end validation that cannot live in core because it depends on external tools (WAL-G, object storage, pg_recvlogical + test_decoding/wal2json).

Environment

  • Clean Linux box (e.g. Hetzner Ubuntu).
  • Both nodes built from the same 19devel HEAD = the PR's May-27 base (Tom Lane commit 0f24332) + the 3-commit patch. The primary could be vanilla/unpatched (same major version), but use identical builds to remove variables.
  • Only the decode standby runs the patched build with recovery_pause_on_logical_slot_conflict=on. The primary needs no patch, no slot, and no connection to the standby.

Setup

  • Primary GUCs: wal_level=logical, archive_mode=on, archive_timeout=1s, autovacuum on.
  • Configure WAL-G archiving to S3 / object storage.
  • Create the logical slot FIRST (before loading data) so the full change history lands in the stream. Logical decoding does not include pre-existing rows.
  • Take a WAL-G full backup to object storage.

Workload (on primary)

  • pgbench at ~10 TPS against the 4 standard pgbench tables (INSERT/UPDATE).
  • An EXTRA table getting ADD COLUMN / DROP COLUMN every ~10s = catalog churn.
  • Force VACUUM of catalog tables (pg_class, pg_attribute, pg_type, pg_statistic) so PRUNE_ON_ACCESS records are emitted — do not rely on autovacuum timing.
  • Logical decoding / the slot covers the pgbench tables only, NOT the extra (DDL-churned) table.

Decode node

  • Provision a standby (patched) from the WAL-G backup + restore_command (archive-only; has_streaming=off; fully decoupled — no primary connection).
  • Consumer: pg_recvlogical + test_decoding (or wal2json) running on the standby; it persists its own flush LSN.

Triggering the pause (core mechanism)

The conflict-pause only fires if the slot's catalog_xmin is behind the prune horizon when the prune record is replayed. A fast consumer keeps catalog_xmin advanced → no conflict.

  • Deliberately make the consumer LAG: artificially pause the consumer ~1 minute (do this twice) so the slot falls behind. The catalog-churn + VACUUM replay then conflicts → recovery pauses.
  • Detect the pause deterministically: POLL pg_get_wal_replay_pause_state() until it returns 'paused'not a fixed sleep.
  • Then resume the consumer → auto-resume → it catches up.

End marker

  • A final psql command emits a sentinel: pg_logical_emit_message(true, 'test', 'END-<token>') OR a marker row carrying the commit LSN.
  • Test is complete when the consumer decodes that exact marker.

Success criteria

  • (Core) Slot survives. The GUC-on slot survives all conflicts (stays reserved/active); a GUC-off CONTROL standby's slot goes lost.
  • No gaps, ever. Any gap = data loss = bug. No duplicates within a continuous session. Across a standby restart / re-provision, expect at-least-once overlap — dedup by LSN and check on the deduped output.
  • Pause observed (>=1) plus auto-resume after drain.
  • Zero primary footprint: no slot and no walsender on the primary (pg_replication_slots / pg_stat_replication empty on the primary).
  • Fidelity: pause physical replay on the decode node at LSN X (pg_wal_replay_pause), compare its physical table state (ground truth) vs the consumer's applied state up to X; resume; repeat.

Lag measurement (no streaming → use LSN/time arithmetic)

  • Replay time-lag: now() - pg_last_xact_replay_timestamp()
  • Replay LSN-lag: pg_wal_lsn_diff(<latest archived LSN>, pg_last_wal_replay_lsn())
  • Consumer / slot lag: pg_wal_lsn_diff(pg_last_wal_replay_lsn(), confirmed_flush_lsn)

Caveats / notes

  • DDL is NOT decoded (DML only). The decode node applies DDL physically, so its catalog stays correct for decoding subsequent DML.
  • TRUNCATE IS decoded (since PG11) — optional coverage on a decoded table.

Deferred to v2 (advanced)

  • Consumer back-pressure / disk-full stress (e.g. 20 GB table, 10 GB consumer disk): exercises the unbounded-pause boundary.
    • Built-in safety valve: max_slot_wal_keep_size still applies during the pause (the checkpointer runs), so past the limit the slot is invalidated out-of-band and replay proceeds.
    • Without it, the pause holds until the standby's own disk fills. The operator escapes via pg_wal_replay_resume (give up the slot) or by growing the disk.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions