Skip to content

Releases: valehdba/pgclone

pgclone v4.4.0 -Schema/database-level in-line masking and table subset filters

11 Jun 19:41

Choose a tag to compare

Schema/database-level in-line masking and table subset filters — implements [discussion #16](#16).

Until now, in-line masking (masking applied while data streams from source to target) was only available per table. Cloning a whole schema or database meant masking afterwards — either a slow mass UPDATE (followed by re-analyze/vacuum) or masked views that aren't transparent to end users. v4.4.0 removes that gap, and adds regex-based table selection on top.

No SQL signature changes: both features are new keys in the existing JSON options argument of pgclone.schema(), pgclone.database(), and pgclone.database_create().

In-line masking for schema/database clones — "masks"

SELECT pgclone.schema(
    'host=source dbname=prod user=postgres password=...',
    'hr', true,
    '{"masks": {
        "employees":  {"email": "email", "ssn": "null"},
        "candidates": {"full_name": "name", "phone": "phone"}
     }}'
);
  • Keys are table names; values use the exact same format as the single-table "mask" option — all 8 masking strategies supported.
  • Masking happens inside the COPY stream: the mask expressions run on the source as part of the per-table COPY (SELECT ...), so unmasked data never reaches the target. No post-clone UPDATE, no re-vacuum, no view layer.
  • For database clones, keys may be schema-qualified ("hr.employees") to disambiguate identically named tables; qualified keys win over bare names.
  • Tables not listed are cloned unmasked.

Table subset filters — "tables" / "exclude_tables"

-- Only order_* partitions plus customers, minus anything archived
SELECT pgclone.schema(
    'host=source dbname=prod user=postgres password=...',
    'sales', true,
    '{"tables": ["order_[0-9]+", "customers"],
      "exclude_tables": [".*_archive"]}'
);
  • Entries are POSIX regular expressions, anchored as ^(pattern)$ (whole-name match), evaluated by the source server against pg_tables.tablename. Excludes apply after includes.
  • The filtered table list flows through the FK-retry and deferred-trigger passes automatically. Sequences, views, materialized views, and functions of the schema are still cloned in full.
  • Patterns are sent as quoted literals — no regex code in the extension and no injection surface; an invalid regex fails the clone with the source's invalid regular expression error.

Both options combine freely, and pgclone.database_create() forwards them verbatim into the target database's clone run.

Internals

  • The hand-rolled option parser gained string-aware JSON scanning (pgclone_json_balanced_end, pgclone_json_string_end, pgclone_json_unescape, pgclone_parse_pattern_array), so regex character classes like [0-9] inside option strings no longer terminate arrays early.
  • Per-table mask objects are captured as raw JSON during option parsing and re-emitted verbatim as the "mask" option of each matching per-table sub-call — the existing single-table masking pipeline does all the real work, unchanged.
  • pgclone.database() propagates masks/tables/exclude_tables verbatim to its per-schema sub-calls.

Tests & docs

  • pgTAP test group 25 (10 new tests, plan 77 → 87) with a new filter_test seed schema: include/exclude regex filtering, in-line email/null masking during a schema clone, per-table mask scoping, unfiltered data integrity.
  • docs/USAGE.md: new sections Clone a subset of tables and In-line masking during schema/database clone, plus JSON options reference rows.
  • COMMENT ON FUNCTION documentation for pgclone.schema(TEXT, TEXT, BOOLEAN, TEXT) and pgclone.database(TEXT, BOOLEAN, TEXT).

Limitations

  • Synchronous paths only: pgclone.schema_async() and the parallel worker pool do not carry JSON options through shared memory and ignore these keys (as they already do for "mask").
  • Bare-name mask keys apply in every schema with a matching table during a database clone; use schema-qualified keys to scope them.

Upgrade

ALTER EXTENSION pgclone UPDATE TO '4.4.0';

sql/pgclone--4.3.2--4.4.0.sql only updates function comments — no catalog changes. Identical behavior on PostgreSQL 14–18.

Full changelog: v4.3.2...v4.4.0

pgclone v4.3.2 — bgw snapshot-keeper resilience

13 May 22:20
243f3e6

Choose a tag to compare

pgclone v4.3.2

Bug-fix release. Closes the v4.3.1 "Known gap" — ports the snapshot-keeper resilience fix from issue #9 to the background-worker path.

If you ran v4.3.1 and your async clones still fail with ERROR: pgclone: SET TRANSACTION SNAPSHOT … invalid snapshot identifier on networked sources, this release fixes that.

What was broken in v4.3.1

v4.3.1 closed three failure paths on the synchronous code in src/pgclone.c:

  1. Firewall/NAT idle TCP drop on the keeper conninfo
  2. idle_in_transaction_session_timeout on the source server
  3. statement_timeout firing on COMMIT

But src/pgclone_bgw.c carries its own mirror helpers and four direct PQconnectdb() call sites — none of which received the v4.3.1 protections. As a result:

  • pgclone.table_async()
  • pgclone.schema_async() sequential
  • pgclone.schema_async() with {"parallel": N} (parallel pool)

…remained vulnerable on networked sources. We disclosed this honestly in the v4.3.1 release notes and in docs/ASYNC.md rather than ship a doc that misleads users about the actual state of the fix.

v4.3.2 closes the gap.

What's fixed

The full v4.3.1 four-layer protection is now mirrored in the background-worker translation unit:

1. bgw_connect_with_keepalives()

New helper, mirror of pgclone_connect_with_keepalives(). Parses every bgworker source conninfo via PQconninfoParse, injects:

keepalives=1
keepalives_idle=30
keepalives_interval=10
keepalives_count=6

…only when the user did not specify them, and connects via PQconnectdbParams. Both URI (postgresql://…) and keyword-form strings are handled. Replaces three direct PQconnectdb() call sites: pgclone_bgw_main, pgclone_pool_worker_main, pgclone_pool_coordinator_main. ~90 s TCP-drop detection on the async path; keeps perimeter NAT entries warm.

2. SET LOCAL timeouts in bgw_begin_repeatable_read()

After BEGIN ISOLATION LEVEL REPEATABLE READ READ ONLY the helper now issues:

SET LOCAL idle_in_transaction_session_timeout = 0;
SET LOCAL statement_timeout = 0;

Both GUCs are PGC_USERSET (no privilege required). SET LOCAL reverts at COMMIT so pooled connections never leak settings.

3. Clearer diagnostic in bgw_begin_with_imported_snapshot()

When SET TRANSACTION SNAPSHOT fails with invalid snapshot identifier, the WARNING now includes a HINT-style line naming the likely causes and the {"consistent": false} emergency opt-out. The bgworker context has no errhint, so the hint goes into the WARNING body.

4. bgw_keeper_ping() in the pool coordinator wait loop

New helper. Cheap SELECT 1 round-trip on the coordinator's keeper connection. Called every ~5 s (every 50th iteration of the 100 ms WaitLatch cycle) inside pgclone_pool_coordinator_main's wait loop. On failure, sets pool.snapshot_failed = true, cascading the failure to every pool worker so they abort cleanly instead of timing out on a snapshot the server already reaped.

Verification

  • Regression test test/test_snapshot_keeper.sh Group 5 sets idle_in_transaction_session_timeout = 1s on the source role and runs pgclone.schema_async() against a 3-table source. Without the v4.3.2 fix the bgworker keeper is killed mid-clone and the async job lands in failed; with the fix the clone completes cleanly.
  • Skipped automatically when pgclone is not in shared_preload_libraries (so local Docker runs without async setup still pass without false positives).
  • The full PG 14 / 15 / 16 / 17 / 18 CI matrix exercises the new test on every supported version.
  • Compiles clean against PG 16; only the two pre-existing warnings at src/pgclone.c:3867 and :1089, both outside this diff.

Upgrade

cd /path/to/pgclone-source
git fetch --tags && git checkout v4.3.2
make clean && make && sudo make install

In every database where the extension is installed:

ALTER EXTENSION pgclone UPDATE;
SELECT pgclone.version();   -- expect 'pgclone 4.3.2' or newer

sql/pgclone--4.3.1--4.3.2.sql is intentionally empty — no SQL signature changes. The ALTER EXTENSION just refreshes metadata.

No configuration changes required. No data is touched. No backward-incompatible behaviour.

Emergency workaround if you can't upgrade yet

The same async-path workarounds from v4.3.1 still apply:

SELECT pgclone.schema_async(
    'postgresql://user@host:5432/sourcedb keepalives=1 keepalives_idle=30',
    'myschema',
    true,
    '{"consistent": false}'
);

Either set keepalives explicitly in the conninfo, ensure source-side idle_in_transaction_session_timeout = 0, or pass {"consistent": false} to bypass cross-table snapshot sharing. v4.3.2 makes all of these unnecessary for the keepalive and timeout paths; you only need {"consistent": false} if you actually want to disable cross-table consistency for other reasons.

Compatibility

  • PostgreSQL: 14, 15, 16, 17, 18 (unchanged)
  • OS: Linux (RHEL, Ubuntu), macOS (unchanged)
  • API: no SQL signature changes; pure C-layer fix in src/pgclone_bgw.c
  • Existing async jobs: continue to work without modification

What's NOT changed in v4.3.2

  • The synchronous-path code in src/pgclone.c — already fixed in v4.3.1, untouched here.
  • bgw_connect_local() (the local-loopback helper at pgclone_bgw.c:159) — localhost has no firewall path so keepalives are irrelevant; we left it alone to keep this diff minimal.
  • Default behaviour: nothing changes for users on healthy LANs / Unix-socket connections. v4.3.2's protections only become observable on networked sources with idle drops or non-zero source-side timeouts.

References

  • Original issue: #9
  • v4.3.1 fix (sync path): #10
  • v4.3.1 test coverage: #11
  • v4.3.1 docs + gap disclosure: #12
  • v4.3.2 bgw mirror (this release): #13
  • Full changelog: git log v4.3.1..v4.3.2
  • Architecture: docs/ARCHITECTURE.md → "Snapshot-keeper resilience (v4.3.1)"
  • Async-path detail: docs/ASYNC.md → "Snapshot-keeper resilience in async paths (v4.3.2)"
  • Operator troubleshooting: docs/USAGE.md → "Troubleshooting invalid snapshot identifier"

pgclone v4.3.1 — snapshot-keeper resilience

13 May 21:30
cfc8057

Choose a tag to compare

Bug-fix release. Resolves a class of failures where long-running schema or database clones over remote/networked source connections fail mid-run with the misleading message ERROR: pgclone: could not import snapshot … invalid snapshot identifier.

If you've ever had a multi-hour pgclone.schema() or pgclone.database() succeed on a small test database and fail on a real one, this is for you.

What was broken

pgclone.schema() / pgclone.database() export a snapshot on a "keeper" connection at the start of the clone, then open many new source connections (one per table, plus separate connections for the FK retry, view, materialized view, function, and trigger sub-phases) that each SET TRANSACTION SNAPSHOT against the keeper's exported snapshot ID — the same pattern pg_dump -j uses for cross-table consistency.

The keeper then sits idle in transaction for the bulk of the run — hours, in the original report (#9). Three independent paths could silently terminate it and invalidate the exported snapshot:

  1. Firewall / NAT idle TCP drop on remote source connections. The keeper had no keepalives=* in its conninfo, so the OS default (Linux: 7200 s before the first probe) was far too lazy to keep perimeter equipment warm. Enterprise firewalls and NAT gateways commonly reap idle TCP after 30–60 minutes.
  2. idle_in_transaction_session_timeout on the source server — a frequent production safeguard that killed the keeper without our consent.
  3. statement_timeout firing on the keeper's eventual COMMIT.

When the keeper died, PostgreSQL removed the exported-snapshot file. Every subsequent SET TRANSACTION SNAPSHOT then failed. PostgreSQL emits ERROR: invalid snapshot identifier both for malformed IDs and for IDs whose backing file has been reaped — which is why the failure mode was hard to diagnose: the user's snapshot ID (e.g. 000000A6-000C4299-1) was perfectly well-formed PG 17/18 syntax (procNumber-lxid-counter), but the file behind it was gone.

Affected users

You were vulnerable to this if all three were true:

  • v4.3.0 with consistent (the default) left on,
  • multi-table operations (pgclone.schema() or pgclone.database(), sync or async pool mode),
  • and either: (a) the source DB was reached over a network through a firewall/NAT, or (b) the source had idle_in_transaction_session_timeout > 0.

pgclone.table() (single-table) and clones using '{"consistent": false}' were never affected.

What's fixed

Four layers of protection, all PG 14–18 compatible, no SQL signature changes:

1. TCP keepalives auto-injected into every source connection

pgclone_connect_with_keepalives() parses the user's conninfo with PQconninfoParse, injects sensible defaults only when the user did not specify them, and connects via PQconnectdbParams. Both URI (postgresql://...) and keyword-form conninfo strings are handled.

Defaults:

keepalives=1
keepalives_idle=30
keepalives_interval=10
keepalives_count=6

Total drop detection: ~90 s. Keeps perimeter NAT entries warm. If you've explicitly set any of these in your conninfo, your value wins.

2. Source-side timeouts disabled inside the keeper transaction

The keeper's BEGIN ISOLATION LEVEL REPEATABLE READ READ ONLY is immediately followed by:

SET LOCAL idle_in_transaction_session_timeout = 0;
SET LOCAL statement_timeout = 0;

Both GUCs are PGC_USERSET (no privilege required). SET LOCAL reverts at COMMIT, so pooled connections never leak the settings.

3. Clearer error on snapshot import failure

When SET TRANSACTION SNAPSHOT does fail with invalid snapshot identifier, pgclone now attaches an errhint naming the most likely causes (firewall idle drop, idle_in_transaction_session_timeout, etc.) and pointing at the {"consistent": false} emergency opt-out. No more silent debugging.

4. Keeper liveness ping between sub-phases

pgclone_keeper_ping() runs a cheap SELECT 1 round-trip on the keeper before each per-table call and before each major sub-phase of pgclone.schema() (FK retry, views, functions, triggers) and pgclone.database() (per-schema loop). If the keeper has been dropped, the error names the keeper directly — you won't waste a multi-GB COPY only to hit a misleading message at the next SET TRANSACTION SNAPSHOT.

Verification

  • Regression test test/test_snapshot_keeper.sh sets idle_in_transaction_session_timeout = 1s on the source role and runs a 5-table schema clone. Without the fix the keeper is killed mid-loop and a later importer fails; with the fix the clone completes cleanly.
  • Runs on the full PG 14 / 15 / 16 / 17 / 18 matrix in CI.
  • Compiles clean with -Wall -Werror=format-security against PG 16.

Upgrade

cd /path/to/pgclone-source
git fetch --tags && git checkout v4.3.1
make clean && make && sudo make install

In every database where the extension is installed:

ALTER EXTENSION pgclone UPDATE;
SELECT pgclone.version();   -- expect 'pgclone 4.3.1'

The upgrade script sql/pgclone--4.3.0--4.3.1.sql is intentionally empty — there are no SQL signature changes, so the ALTER EXTENSION just refreshes the metadata.

No configuration changes are required. No data is touched. No backward-incompatible behavior.

Emergency workaround if you can't upgrade yet

Pass '{"consistent": false}' in the options argument to disable cross-table snapshot sharing:

SELECT pgclone.schema(
    'postgresql://user@host:5432/sourcedb',
    'myschema',
    true,
    '{"consistent": false}'
);

Every per-table copy is still internally consistent. Cross-table referential consistency is no longer guaranteed against concurrent writers, but the clone will not fail with invalid snapshot identifier.

Compatibility

  • PostgreSQL: 14, 15, 16, 17, 18 (unchanged from 4.3.0)
  • OS: Linux (RHEL, Ubuntu), macOS (unchanged)
  • API: no SQL signature changes; pure C-layer fix

Acknowledgements

Thanks to @herwigg for the detailed bug report (#9), including the exact timing (2 h 15 min run length), the PostgreSQL log excerpt, and the snapshot ID — all of which pointed directly at the keeper-lifecycle root cause.

References

  • Issue: #9
  • Pull request: #10
  • Architecture doc (new section): docs/ARCHITECTURE.md → "Snapshot-keeper resilience (v4.3.1)"
  • Full changelog: git log v4.3.0..v4.3.1

What's Changed

  • fix(version): bump pgclone_version() string to 4.3.0 by @valehdba in #8

Full Changelog: v4.3.0...v4.3.1

v4.3.0: Consistent-snapshot clones

08 May 08:27
20fda0c

Choose a tag to compare

Every clone now reads the source under one shared BEGIN ISOLATION LEVEL REPEATABLE READ READ ONLY snapshot. Cross-table foreign-key consistency on a live source is guaranteed — the same correctness model pg_dump -j uses for parallel dumps.

This fixes the long-standing class of FK-violation and partial-state anomalies reported when cloning OLTP sources that take concurrent writes during the clone.

What's New

Consistent-snapshot clones (default behavior)

Every per-table COPY across the entire clone observes the same point-in-time view of the source. Parent/child rows, FK invariants, and any other application invariant that holds at a single instant on the source is preserved in the clone — even under concurrent writes.

Applies to every clone path:

Path Mechanism
pgclone.table() (sync) Single source connection wrapped in BEGIN ISOLATION LEVEL REPEATABLE READ READ ONLY
pgclone.schema() (sync) Initial source connection becomes the snapshot keeper; ID propagated to every per-table sub-call
pgclone.database() (sync) Same keeper pattern at one level higher; snapshot ID flows down to every per-schema and per-table call
pgclone.table_async() (single bgw) bgworker source connection wrapped in REPEATABLE READ READ ONLY
pgclone.schema_async() (single bgw) Same as above; one snapshot for the whole schema
pgclone.schema_async(... '{"parallel": N}') New pgclone_pool_coordinator_main bgworker exports the snapshot; N pool workers import via SET TRANSACTION SNAPSHOT

Multi-connection paths use pg_export_snapshot() / SET TRANSACTION SNAPSHOT so every libpq connection involved in a single clone — per-table sub-calls, FK retry, views, matviews, functions, triggers, and every parallel pool worker — binds to one shared snapshot.

Opt-out per call

For callers who prefer the v4.2.x behavior — for example because the source can't tolerate a long-running keeper transaction — pass '{"consistent": false}' in any options JSON:

SELECT pgclone.schema('host=...', 'public', true, '{"consistent": false}');
SELECT pgclone.database('host=...', true, '{"consistent": false}');
SELECT pgclone.schema_async('host=...', 'public', true,
                            '{"parallel": 4, "consistent": false}');

Snapshot coordinator background worker

When pgclone.schema_async(..., '{"parallel": N}') runs in consistent mode, pgclone launches one extra bgworker — the snapshot coordinator — visible in pgclone.jobs_view as pgclone: snapshot coordinator (parent <id>). It opens its own source connection, BEGINs, calls pg_export_snapshot(), publishes the ID to shared memory, and stays idle in transaction until every pool worker has bound to the snapshot, then COMMITs and exits.

Highlights

  • No SQL surface change. Existing function signatures and return types are unchanged. The feature is enabled via a flag in the existing options TEXT parameters.
  • Same correctness model as pg_dump -j. Snapshot export from a keeper, snapshot import on every other connection.
  • Hot-standby supported on PostgreSQL ≥ 10 (where pg_export_snapshot() works on a standby).
  • Defence-in-depth preserved. The pre-existing per-table READ ONLY wrap (SQL-injection containment around WHERE clauses) detects the outer transaction and skips its own BEGIN/COMMIT, so the sandbox guarantee holds regardless of the call path.

Tradeoffs

  • Long-running source-side transaction. Clones now hold an open transaction on the source for the full clone duration. On a busy source this delays VACUUM cleanup of dead tuples and WAL recycling proportional to clone time. For very long clones (multi-hour) on heavily updated sources this can cause table/index bloat and WAL accumulation. Opt out per call with '{"consistent": false}' if cross-table consistency is less important than source-side lock pressure.
  • One extra bgworker per parallel pool clone (the snapshot coordinator). Negligible resource cost; appears in pgclone.jobs_view for full progress visibility.

docs/USAGE.md discusses the REPEATABLE READ vs SERIALIZABLE DEFERRABLE tradeoff for readers who want the strictest possible isolation.

Internal

  • New CloneOptions.consistent (bool, default true) and CloneOptions.snapshot_id[64] plus parser support for "consistent": false and "snapshot_id": "...".
  • New helpers in src/pgclone.c: pgclone_begin_repeatable_read, pgclone_commit_source, pgclone_export_snapshot, pgclone_begin_with_imported_snapshot, pgclone_setup_source_txn, pgclone_setup_source_txn_done. Mirror helpers in src/pgclone_bgw.c.
  • New PgclonePoolQueue shared-memory fields: consistent, snapshot_ready, snapshot_failed, snapshot_imported_count, snapshot_expected_workers, launch_complete, snapshot_id[64], coordinator_job_id.
  • New PgcloneJob.consistent field so the single-worker async path knows whether to wrap in REPEATABLE READ READ ONLY.

CI / Tests

  • New test/test_consistent.sh — runs sync, async sequential, async pool, and opt-out clones against a concurrent parent/child FK writer that continuously inserts rows on the source. Asserts zero orphan child rows in every consistent path. Wired into test/run_tests.sh.
  • All existing test suites unchanged and passing on PG 14, 15, 16, 17, 18.

Upgrade

ALTER EXTENSION pgclone UPDATE TO '4.3.0';
SELECT pgclone.version();

The 4.2.0 → 4.3.0 migration script is a no-op (no DDL changes — the feature is a behavior change activated by an option in the existing options TEXT parameters).

Compatibility

  • PostgreSQL 14, 15, 16, 17, 18
  • No C ABI changes for existing functions
  • No external library dependencies

Documentation

v4.2.0 — Pre-flight Validator

07 May 19:53
7b8a9b5

Choose a tag to compare

A new read-only sanity check that surfaces issues before they fail a clone mid-flight. Companion to pgclone.diff() (v4.1.0): preflight validates that a clone can succeed; diff reports drift after the fact.

What's New

pgclone.preflight(source_conninfo, schema_name) → JSON

Connect read-only to source and the local target, run 16 checks, return a JSON document with three role-based summary arrays (errors, warnings, info) plus a per-check breakdown.

SELECT pgclone.preflight(
    'host=prod-db dbname=app user=postgres password=***',
    'public'
)::jsonb;

ready is true only when zero errors are recorded. Quick boolean gate:

SELECT (pgclone.preflight(:src, 'public')::jsonb ->> 'ready')::boolean
       AS clone_safe;

Checks performed

Check Severity What it catches
source_connection error bad conninfo, unreachable host, auth failure
target_connection error local socket / pg_hba misconfiguration
source_version info source PostgreSQL version
target_version info target PostgreSQL version
version_compat warn source major version > target major version
schema_exists_source error source schema does not exist
schema_exists_target info target schema absent → will be created
source_permissions error role lacks USAGE on schema or SELECT on tables
target_permissions error role lacks CREATE on target schema (or DB if creating it)
estimated_size info sum of pg_total_relation_size on source
target_database_size info current target DB size for capacity reasoning
object_counts info tables / views / sequences / indexes counts
name_conflicts warn object names already present in target schema
missing_extensions warn extensions installed on source but not target
missing_roles warn owner / grantee roles not present on target
missing_tablespaces warn non-default tablespaces referenced by source

Highlights

  • Read-only on both sides. Both source and target connections wrap every query in BEGIN ISOLATION LEVEL REPEATABLE READ READ ONLY. The function never executes DDL or DML.
  • Self-contained C module (src/pgclone_preflight.c). Like src/pgclone_diff.c (v4.1.0), this file does not share helpers with src/pgclone.c — the feature is fully additive and trivially auditable. See docs/ARCHITECTURE.md for the design rationale.
  • No new permissions required. The pre-flight uses has_*_privilege() to test the same privileges the clone itself would need — no destructive probes.
  • Composes with pgclone.diff(). Pre-flight before a fresh clone; diff after to catch drift over time.

CI / Test Infrastructure

CI now invokes test/test_loopback.sh directly instead of the previous hand-rolled inline subset in .github/workflows/ci.yml. Going forward, every assertion added to the loopback script runs in CI automatically — closing the gap that was hiding v4.1.0 schema-diff tests from the matrix.

13 new pre-flight assertions were added to test/test_loopback.sh:

  • function registration with documented (text, text) signature
  • top-level JSON keys: schema, ready, summary, checks
  • ready is a boolean
  • summary contains errors / warnings / info / checks_run
  • checks includes the connection / version / schema_exists_* entries
  • schema name is echoed back
  • missing source schema → ready = false, errors[] non-empty
  • name_conflicts.items is an array
  • object_counts.tables is a non-negative integer
  • version_compat status is pass or warn
  • read-only invariant on the target catalog
  • STRICT (NULL schema_name argument yields no row)

Upgrade

ALTER EXTENSION pgclone UPDATE TO '4.2.0';
SELECT pgclone.version();

No SQL signature changes for any existing function. The new pgclone.preflight() is purely additive.

Compatibility

  • PostgreSQL 14, 15, 16, 17, 18
  • No C ABI changes for existing functions
  • No external library dependencies (the JSON escaper is shipped locally to avoid utils/jsonapi.h cross-version drift)

Documentation

v4.1.0

05 May 13:52
e29019b

Choose a tag to compare

v4.1.0 — Schema Diff (pgclone.diff)

v4.0.1 - Bugfix release

03 May 18:28

Choose a tag to compare

Fixes [issue #3](#3): schema clone fails on databases whose default search_path includes an application schema.

What was broken

Calling pgclone.schema(...) against a source database with a non-default search_path produced three classes of failure, all triggered by the same clone:

WARNING: pgclone: local exec failed: ERROR: relation "itiniris.inbox_messages" does not exist
  (query: CREATE OR REPLACE FUNCTION itiniris.inbox_messages_fct_ck(p_id bigint) ...)

WARNING: pgclone: local exec failed: ERROR: relation "city_street" does not exist
  (query: CREATE TRIGGER city_street_insert_trigger BEFORE INSERT ON city_street ...)

ERROR: pgclone: failed to create table locally: ERROR: relation "documents_to_resend_id_seq" does not exist
LINE 1: ...ts_to_resend (id integer NOT NULL DEFAULT nextval('documents...

Root cause

Two separate bugs interacted:

A. Functions were cloned before tables. pgclone_schema() Step 2 ran CREATE FUNCTION for every function in the schema before Step 4 created the tables. SQL-language functions (LANGUAGE sql) parse and validate object references at CREATE FUNCTION time — unlike LANGUAGE plpgsql which defers to call-time — so any function whose body referenced a same-schema table failed when that table didn't yet exist.

B. Source-side search_path leaked into extracted DDL. pg_get_triggerdef(), pg_get_expr() (used for column DEFAULTs), and pg_get_indexdef() all call generate_relation_name() internally, which omits the schema prefix when the relation's namespace is on the source session's search_path. Production source DBs commonly have ALTER DATABASE x SET search_path = app_schema, public, which caused every extracted DDL to come back with bare relation names. Replaying that DDL on the target loopback connection - whose search_path doesn't include the app schema — failed to resolve the references.

Fix

  1. Force fully-qualified DDL output — every source libpq connection now runs SET search_path = pg_catalog immediately after connect (pgclone_normalize_session() helper invoked from pgclone_connect(), plus equivalent inline calls at the two PQconnectdb(...source_conninfo) sites in the bgworker). Built-in types stay unqualified because pg_catalog is on search_path; everything else gets schema-qualified. This matches pg_dump --no-search-path behaviour.

  2. Reorder pgclone_schema() to respect dependencies:

    schema → sequences → tables (no triggers) → FK retry
           → views → matviews → functions → triggers (deferred pass)
    

    Per-table pgclone_table() calls now receive triggers=false from the schema driver; the schema driver runs a single trigger pass at the end after all functions exist.

Both sync paths and both async bgworker paths (sequential and pool) are covered.

Verification

A new regression test (TEST GROUP 21 in test/pgclone_test.sql with fixtures in test/fixtures/seed.sql) reproduces all three failure modes by cloning a dep_test schema using a conninfo with options='-c search_path=dep_test,public'. Locally verified that the pre-fix .so produces exactly 5 failures matching the bug-report errors, and the post-fix .so passes 77/77 assertions.

Compatibility

  • No public API changes
  • No breaking changes to the JSON options format
  • Upgrade path: ALTER EXTENSION pgclone UPDATE (no-op SQL upgrade script — the fix is entirely in the compiled library)
  • Verified compiles cleanly on PostgreSQL 16; PostgreSQL 14–18 compatibility maintained (only SET search_path is added, which is stable across all supported versions)

Credits

Reported by [@herwigg](https://github.com/herwigg). Thanks for the detailed reproduction.

v4.0.0 — Schema Namespace

15 Apr 12:17

Choose a tag to compare

⚠️ BREAKING CHANGE — All pgclone functions now live under the pgclone schema.

What Changed

The pgclone schema is created automatically when you run CREATE EXTENSION pgclone. All functions have been moved into this schema and the pgclone_ prefix has been removed:

Before (v3.x) After (v4.0.0)
pgclone_table(...) pgclone.table(...)
pgclone_schema(...) pgclone.schema(...)
pgclone_database(...) pgclone.database(...)
pgclone_database_create(...) pgclone.database_create(...)
pgclone_table_async(...) pgclone.table_async(...)
pgclone_schema_async(...) pgclone.schema_async(...)
pgclone_progress(...) pgclone.progress(...)
pgclone_cancel(...) pgclone.cancel(...)
pgclone_resume(...) pgclone.resume(...)
pgclone_jobs() pgclone.jobs()
pgclone_clear_jobs() pgclone.clear_jobs()
pgclone_progress_detail() pgclone.progress_detail()
pgclone_jobs_view pgclone.jobs_view
pgclone_discover_sensitive(...) pgclone.discover_sensitive(...)
pgclone_mask_in_place(...) pgclone.mask_in_place(...)
pgclone_create_masking_policy(...) pgclone.create_masking_policy(...)
pgclone_drop_masking_policy(...) pgclone.drop_masking_policy(...)
pgclone_clone_roles(...) pgclone.clone_roles(...)
pgclone_verify(...) pgclone.verify(...)
pgclone_masking_report(...) pgclone.masking_report(...)
pgclone_table_ex(...) pgclone.table_ex(...)
pgclone_schema_ex(...) pgclone.schema_ex(...)
pgclone_functions(...) pgclone.functions(...)
pgclone_version() pgclone.version()

Upgrade Path

This is a breaking change. To upgrade from v3.x:

DROP EXTENSION pgclone;
CREATE EXTENSION pgclone;
SELECT pgclone.version();

All application queries must be updated to use the pgclone. prefix.

Quick Start

CREATE EXTENSION pgclone;

SELECT pgclone.table(
'host=source-server dbname=mydb user=postgres password=secret',
'public', 'customers', true
);

SELECT pgclone.schema(
'host=source-server dbname=mydb user=postgres password=secret',
'sales', true
);

Compatibility

  • PostgreSQL 14, 15, 16, 17, 18
  • No C API changes — only SQL function names affected
  • All 115 tests pass across all supported versions

v3.6.0 - GDPR/Compliance Masking

10 Apr 04:35

Choose a tag to compare

Generate audit reports proving that sensitive data is properly masked in your cloned databases. Essential for GDPR, HIPAA, and SOX compliance.

Usage

SELECT * FROM pgclone_masking_report('public');
 schema_name | table_name | column_name | sensitivity  | mask_status   | recommendation
-------------+------------+-------------+--------------+---------------+--------------------------------------
 public      | employees  | full_name   | PII - Name   | UNMASKED      | Apply mask strategy: name
 public      | employees  | email       | Email        | UNMASKED      | Apply mask strategy: email
 public      | employees  | phone       | Phone        | UNMASKED      | Apply mask strategy: phone
 public      | employees  | salary      | Financial    | UNMASKED      | Apply mask strategy: random_int
 public      | employees  | ssn         | National ID  | UNMASKED      | Apply mask strategy: null
 public      | users      | email       | Email        | MASKED (view) | OK - masked via users_masked view
 public      | users      | password    | Credential   | MASKED (view) | OK - masked via users_masked view

Sensitivity Categories

Email, PII - Name, Phone, National ID, Financial, Credential, Address, Date of Birth, Credit Card, IP Address — ~40 patterns across 10 categories.

Mask Status

Status Meaning
MASKED (view) A _masked view exists — data is protected via dynamic masking
UNMASKED Sensitive column has no masking — action needed

Compliance Workflow

-- 1. Clone production data
SELECT pgclone_database('host=prod dbname=myapp user=postgres', true);

-- 2. Find unmasked PII
SELECT * FROM pgclone_masking_report('public') WHERE mask_status = 'UNMASKED';

-- 3. Apply masking
SELECT pgclone_create_masking_policy('public', 'employees',
'{"email": "email", "full_name": "name", "ssn": "null"}', 'dba_team');

-- 4. Confirm — all sensitive columns now masked
SELECT * FROM pgclone_masking_report('public');

Also in This Release

  • Refactored sensitivity rules into shared SensitivityRule struct and pgclone_match_sensitivity() helper — used by both pgclone_discover_sensitive and pgclone_masking_report

Install

git clone https://github.com/valehdba/pgclone.git
cd pgclone && make && sudo make install
DROP EXTENSION IF EXISTS pgclone; CREATE EXTENSION pgclone;
SELECT pgclone_version();  -- pgclone 3.6.0

v3.5.0 — Clone Verification

09 Apr 18:28

Choose a tag to compare

Compare row counts between source and target databases after cloning to verify completeness.

Usage

-- Verify a specific schema
SELECT * FROM pgclone_verify(
    'host=source-server dbname=prod user=postgres',
    'app_schema'
);
 schema_name |  table_name  | source_rows | target_rows | match
-------------+--------------+-------------+-------------+--------------
 app_schema  | customers    |       15230 |       15230 | ✓
 app_schema  | orders       |      148920 |      148920 | ✓
 app_schema  | payments     |       98100 |       97855 | ✗
 app_schema  | audit_log    |     1204500 |           0 | ✗ (missing)
-- Verify all schemas
SELECT * FROM pgclone_verify(
    'host=source-server dbname=prod user=postgres'
);

Match Indicators

Indicator Meaning
Row counts are equal
Row counts differ
✗ (missing) Table exists on source but not on target

How It Works

  • Uses pg_class.reltuples for fast approximate counts — no full table scans needed
  • Connects to both source (via libpq) and local (via loopback) in a single call
  • Returns a standard PostgreSQL table result — can be filtered, sorted, joined
  • Works with regular and partitioned tables

Typical Workflow

-- 1. Clone a schema
SELECT pgclone_schema('host=prod dbname=myapp user=postgres', 'sales', true);

-- 2. Verify the clone
SELECT * FROM pgclone_verify('host=prod dbname=myapp user=postgres', 'sales');

-- 3. Check for mismatches
SELECT * FROM pgclone_verify('host=prod dbname=myapp user=postgres', 'sales')
WHERE match != '✓';

What's New

  • pgclone_verify(conninfo) — all user schemas
  • pgclone_verify(conninfo, schema) — single schema
  • SET-RETURNING FUNCTION using SRF pattern with VerifyState in multi_call_memory_ctx
  • 5 new pgTAP tests (84 total, 23 test groups, 22 C functions)

Install

git clone https://github.com/valehdba/pgclone.git
cd pgclone && make && sudo make install
DROP EXTENSION IF EXISTS pgclone; CREATE EXTENSION pgclone;
SELECT pgclone_version();  -- pgclone 3.5.0