Releases: valehdba/pgclone
pgclone v4.4.0 -Schema/database-level in-line masking and table subset filters
Schema/database-level in-line masking and table subset filters — implements [discussion #16](#16).
Until now, in-line masking (masking applied while data streams from source to target) was only available per table. Cloning a whole schema or database meant masking afterwards — either a slow mass UPDATE (followed by re-analyze/vacuum) or masked views that aren't transparent to end users. v4.4.0 removes that gap, and adds regex-based table selection on top.
No SQL signature changes: both features are new keys in the existing JSON options argument of pgclone.schema(), pgclone.database(), and pgclone.database_create().
In-line masking for schema/database clones — "masks"
SELECT pgclone.schema(
'host=source dbname=prod user=postgres password=...',
'hr', true,
'{"masks": {
"employees": {"email": "email", "ssn": "null"},
"candidates": {"full_name": "name", "phone": "phone"}
}}'
);- Keys are table names; values use the exact same format as the single-table
"mask"option — all 8 masking strategies supported. - Masking happens inside the COPY stream: the mask expressions run on the source as part of the per-table
COPY (SELECT ...), so unmasked data never reaches the target. No post-cloneUPDATE, no re-vacuum, no view layer. - For database clones, keys may be schema-qualified (
"hr.employees") to disambiguate identically named tables; qualified keys win over bare names. - Tables not listed are cloned unmasked.
Table subset filters — "tables" / "exclude_tables"
-- Only order_* partitions plus customers, minus anything archived
SELECT pgclone.schema(
'host=source dbname=prod user=postgres password=...',
'sales', true,
'{"tables": ["order_[0-9]+", "customers"],
"exclude_tables": [".*_archive"]}'
);- Entries are POSIX regular expressions, anchored as
^(pattern)$(whole-name match), evaluated by the source server againstpg_tables.tablename. Excludes apply after includes. - The filtered table list flows through the FK-retry and deferred-trigger passes automatically. Sequences, views, materialized views, and functions of the schema are still cloned in full.
- Patterns are sent as quoted literals — no regex code in the extension and no injection surface; an invalid regex fails the clone with the source's
invalid regular expressionerror.
Both options combine freely, and pgclone.database_create() forwards them verbatim into the target database's clone run.
Internals
- The hand-rolled option parser gained string-aware JSON scanning (
pgclone_json_balanced_end,pgclone_json_string_end,pgclone_json_unescape,pgclone_parse_pattern_array), so regex character classes like[0-9]inside option strings no longer terminate arrays early. - Per-table mask objects are captured as raw JSON during option parsing and re-emitted verbatim as the
"mask"option of each matching per-table sub-call — the existing single-table masking pipeline does all the real work, unchanged. pgclone.database()propagatesmasks/tables/exclude_tablesverbatim to its per-schema sub-calls.
Tests & docs
- pgTAP test group 25 (10 new tests, plan 77 → 87) with a new
filter_testseed schema: include/exclude regex filtering, in-line email/null masking during a schema clone, per-table mask scoping, unfiltered data integrity. docs/USAGE.md: new sections Clone a subset of tables and In-line masking during schema/database clone, plus JSON options reference rows.COMMENT ON FUNCTIONdocumentation forpgclone.schema(TEXT, TEXT, BOOLEAN, TEXT)andpgclone.database(TEXT, BOOLEAN, TEXT).
Limitations
- Synchronous paths only:
pgclone.schema_async()and the parallel worker pool do not carry JSON options through shared memory and ignore these keys (as they already do for"mask"). - Bare-name mask keys apply in every schema with a matching table during a database clone; use schema-qualified keys to scope them.
Upgrade
ALTER EXTENSION pgclone UPDATE TO '4.4.0';sql/pgclone--4.3.2--4.4.0.sql only updates function comments — no catalog changes. Identical behavior on PostgreSQL 14–18.
Full changelog: v4.3.2...v4.4.0
pgclone v4.3.2 — bgw snapshot-keeper resilience
pgclone v4.3.2
Bug-fix release. Closes the v4.3.1 "Known gap" — ports the snapshot-keeper resilience fix from issue #9 to the background-worker path.
If you ran v4.3.1 and your async clones still fail with ERROR: pgclone: SET TRANSACTION SNAPSHOT … invalid snapshot identifier on networked sources, this release fixes that.
What was broken in v4.3.1
v4.3.1 closed three failure paths on the synchronous code in src/pgclone.c:
- Firewall/NAT idle TCP drop on the keeper conninfo
idle_in_transaction_session_timeouton the source serverstatement_timeoutfiring on COMMIT
But src/pgclone_bgw.c carries its own mirror helpers and four direct PQconnectdb() call sites — none of which received the v4.3.1 protections. As a result:
pgclone.table_async()pgclone.schema_async()sequentialpgclone.schema_async()with{"parallel": N}(parallel pool)
…remained vulnerable on networked sources. We disclosed this honestly in the v4.3.1 release notes and in docs/ASYNC.md rather than ship a doc that misleads users about the actual state of the fix.
v4.3.2 closes the gap.
What's fixed
The full v4.3.1 four-layer protection is now mirrored in the background-worker translation unit:
1. bgw_connect_with_keepalives()
New helper, mirror of pgclone_connect_with_keepalives(). Parses every bgworker source conninfo via PQconninfoParse, injects:
keepalives=1
keepalives_idle=30
keepalives_interval=10
keepalives_count=6
…only when the user did not specify them, and connects via PQconnectdbParams. Both URI (postgresql://…) and keyword-form strings are handled. Replaces three direct PQconnectdb() call sites: pgclone_bgw_main, pgclone_pool_worker_main, pgclone_pool_coordinator_main. ~90 s TCP-drop detection on the async path; keeps perimeter NAT entries warm.
2. SET LOCAL timeouts in bgw_begin_repeatable_read()
After BEGIN ISOLATION LEVEL REPEATABLE READ READ ONLY the helper now issues:
SET LOCAL idle_in_transaction_session_timeout = 0;
SET LOCAL statement_timeout = 0;Both GUCs are PGC_USERSET (no privilege required). SET LOCAL reverts at COMMIT so pooled connections never leak settings.
3. Clearer diagnostic in bgw_begin_with_imported_snapshot()
When SET TRANSACTION SNAPSHOT fails with invalid snapshot identifier, the WARNING now includes a HINT-style line naming the likely causes and the {"consistent": false} emergency opt-out. The bgworker context has no errhint, so the hint goes into the WARNING body.
4. bgw_keeper_ping() in the pool coordinator wait loop
New helper. Cheap SELECT 1 round-trip on the coordinator's keeper connection. Called every ~5 s (every 50th iteration of the 100 ms WaitLatch cycle) inside pgclone_pool_coordinator_main's wait loop. On failure, sets pool.snapshot_failed = true, cascading the failure to every pool worker so they abort cleanly instead of timing out on a snapshot the server already reaped.
Verification
- Regression test
test/test_snapshot_keeper.shGroup 5 setsidle_in_transaction_session_timeout = 1son the source role and runspgclone.schema_async()against a 3-table source. Without the v4.3.2 fix the bgworker keeper is killed mid-clone and the async job lands infailed; with the fix the clone completes cleanly. - Skipped automatically when
pgcloneis not inshared_preload_libraries(so local Docker runs without async setup still pass without false positives). - The full PG 14 / 15 / 16 / 17 / 18 CI matrix exercises the new test on every supported version.
- Compiles clean against PG 16; only the two pre-existing warnings at
src/pgclone.c:3867and:1089, both outside this diff.
Upgrade
cd /path/to/pgclone-source
git fetch --tags && git checkout v4.3.2
make clean && make && sudo make installIn every database where the extension is installed:
ALTER EXTENSION pgclone UPDATE;
SELECT pgclone.version(); -- expect 'pgclone 4.3.2' or newersql/pgclone--4.3.1--4.3.2.sql is intentionally empty — no SQL signature changes. The ALTER EXTENSION just refreshes metadata.
No configuration changes required. No data is touched. No backward-incompatible behaviour.
Emergency workaround if you can't upgrade yet
The same async-path workarounds from v4.3.1 still apply:
SELECT pgclone.schema_async(
'postgresql://user@host:5432/sourcedb keepalives=1 keepalives_idle=30',
'myschema',
true,
'{"consistent": false}'
);Either set keepalives explicitly in the conninfo, ensure source-side idle_in_transaction_session_timeout = 0, or pass {"consistent": false} to bypass cross-table snapshot sharing. v4.3.2 makes all of these unnecessary for the keepalive and timeout paths; you only need {"consistent": false} if you actually want to disable cross-table consistency for other reasons.
Compatibility
- PostgreSQL: 14, 15, 16, 17, 18 (unchanged)
- OS: Linux (RHEL, Ubuntu), macOS (unchanged)
- API: no SQL signature changes; pure C-layer fix in
src/pgclone_bgw.c - Existing async jobs: continue to work without modification
What's NOT changed in v4.3.2
- The synchronous-path code in
src/pgclone.c— already fixed in v4.3.1, untouched here. bgw_connect_local()(the local-loopback helper atpgclone_bgw.c:159) — localhost has no firewall path so keepalives are irrelevant; we left it alone to keep this diff minimal.- Default behaviour: nothing changes for users on healthy LANs / Unix-socket connections. v4.3.2's protections only become observable on networked sources with idle drops or non-zero source-side timeouts.
References
- Original issue: #9
- v4.3.1 fix (sync path): #10
- v4.3.1 test coverage: #11
- v4.3.1 docs + gap disclosure: #12
- v4.3.2 bgw mirror (this release): #13
- Full changelog:
git log v4.3.1..v4.3.2 - Architecture:
docs/ARCHITECTURE.md→ "Snapshot-keeper resilience (v4.3.1)" - Async-path detail:
docs/ASYNC.md→ "Snapshot-keeper resilience in async paths (v4.3.2)" - Operator troubleshooting:
docs/USAGE.md→ "Troubleshootinginvalid snapshot identifier"
pgclone v4.3.1 — snapshot-keeper resilience
Bug-fix release. Resolves a class of failures where long-running schema or database clones over remote/networked source connections fail mid-run with the misleading message ERROR: pgclone: could not import snapshot … invalid snapshot identifier.
If you've ever had a multi-hour pgclone.schema() or pgclone.database() succeed on a small test database and fail on a real one, this is for you.
What was broken
pgclone.schema() / pgclone.database() export a snapshot on a "keeper" connection at the start of the clone, then open many new source connections (one per table, plus separate connections for the FK retry, view, materialized view, function, and trigger sub-phases) that each SET TRANSACTION SNAPSHOT against the keeper's exported snapshot ID — the same pattern pg_dump -j uses for cross-table consistency.
The keeper then sits idle in transaction for the bulk of the run — hours, in the original report (#9). Three independent paths could silently terminate it and invalidate the exported snapshot:
- Firewall / NAT idle TCP drop on remote source connections. The keeper had no
keepalives=*in its conninfo, so the OS default (Linux: 7200 s before the first probe) was far too lazy to keep perimeter equipment warm. Enterprise firewalls and NAT gateways commonly reap idle TCP after 30–60 minutes. idle_in_transaction_session_timeouton the source server — a frequent production safeguard that killed the keeper without our consent.statement_timeoutfiring on the keeper's eventual COMMIT.
When the keeper died, PostgreSQL removed the exported-snapshot file. Every subsequent SET TRANSACTION SNAPSHOT then failed. PostgreSQL emits ERROR: invalid snapshot identifier both for malformed IDs and for IDs whose backing file has been reaped — which is why the failure mode was hard to diagnose: the user's snapshot ID (e.g. 000000A6-000C4299-1) was perfectly well-formed PG 17/18 syntax (procNumber-lxid-counter), but the file behind it was gone.
Affected users
You were vulnerable to this if all three were true:
- v4.3.0 with
consistent(the default) left on, - multi-table operations (
pgclone.schema()orpgclone.database(), sync or async pool mode), - and either: (a) the source DB was reached over a network through a firewall/NAT, or (b) the source had
idle_in_transaction_session_timeout > 0.
pgclone.table() (single-table) and clones using '{"consistent": false}' were never affected.
What's fixed
Four layers of protection, all PG 14–18 compatible, no SQL signature changes:
1. TCP keepalives auto-injected into every source connection
pgclone_connect_with_keepalives() parses the user's conninfo with PQconninfoParse, injects sensible defaults only when the user did not specify them, and connects via PQconnectdbParams. Both URI (postgresql://...) and keyword-form conninfo strings are handled.
Defaults:
keepalives=1
keepalives_idle=30
keepalives_interval=10
keepalives_count=6
Total drop detection: ~90 s. Keeps perimeter NAT entries warm. If you've explicitly set any of these in your conninfo, your value wins.
2. Source-side timeouts disabled inside the keeper transaction
The keeper's BEGIN ISOLATION LEVEL REPEATABLE READ READ ONLY is immediately followed by:
SET LOCAL idle_in_transaction_session_timeout = 0;
SET LOCAL statement_timeout = 0;Both GUCs are PGC_USERSET (no privilege required). SET LOCAL reverts at COMMIT, so pooled connections never leak the settings.
3. Clearer error on snapshot import failure
When SET TRANSACTION SNAPSHOT does fail with invalid snapshot identifier, pgclone now attaches an errhint naming the most likely causes (firewall idle drop, idle_in_transaction_session_timeout, etc.) and pointing at the {"consistent": false} emergency opt-out. No more silent debugging.
4. Keeper liveness ping between sub-phases
pgclone_keeper_ping() runs a cheap SELECT 1 round-trip on the keeper before each per-table call and before each major sub-phase of pgclone.schema() (FK retry, views, functions, triggers) and pgclone.database() (per-schema loop). If the keeper has been dropped, the error names the keeper directly — you won't waste a multi-GB COPY only to hit a misleading message at the next SET TRANSACTION SNAPSHOT.
Verification
- Regression test
test/test_snapshot_keeper.shsetsidle_in_transaction_session_timeout = 1son the source role and runs a 5-table schema clone. Without the fix the keeper is killed mid-loop and a later importer fails; with the fix the clone completes cleanly. - Runs on the full PG 14 / 15 / 16 / 17 / 18 matrix in CI.
- Compiles clean with
-Wall -Werror=format-securityagainst PG 16.
Upgrade
cd /path/to/pgclone-source
git fetch --tags && git checkout v4.3.1
make clean && make && sudo make installIn every database where the extension is installed:
ALTER EXTENSION pgclone UPDATE;
SELECT pgclone.version(); -- expect 'pgclone 4.3.1'The upgrade script sql/pgclone--4.3.0--4.3.1.sql is intentionally empty — there are no SQL signature changes, so the ALTER EXTENSION just refreshes the metadata.
No configuration changes are required. No data is touched. No backward-incompatible behavior.
Emergency workaround if you can't upgrade yet
Pass '{"consistent": false}' in the options argument to disable cross-table snapshot sharing:
SELECT pgclone.schema(
'postgresql://user@host:5432/sourcedb',
'myschema',
true,
'{"consistent": false}'
);Every per-table copy is still internally consistent. Cross-table referential consistency is no longer guaranteed against concurrent writers, but the clone will not fail with invalid snapshot identifier.
Compatibility
- PostgreSQL: 14, 15, 16, 17, 18 (unchanged from 4.3.0)
- OS: Linux (RHEL, Ubuntu), macOS (unchanged)
- API: no SQL signature changes; pure C-layer fix
Acknowledgements
Thanks to @herwigg for the detailed bug report (#9), including the exact timing (2 h 15 min run length), the PostgreSQL log excerpt, and the snapshot ID — all of which pointed directly at the keeper-lifecycle root cause.
References
- Issue: #9
- Pull request: #10
- Architecture doc (new section):
docs/ARCHITECTURE.md→ "Snapshot-keeper resilience (v4.3.1)" - Full changelog:
git log v4.3.0..v4.3.1
What's Changed
Full Changelog: v4.3.0...v4.3.1
v4.3.0: Consistent-snapshot clones
Every clone now reads the source under one shared BEGIN ISOLATION LEVEL REPEATABLE READ READ ONLY snapshot. Cross-table foreign-key consistency on a live source is guaranteed — the same correctness model pg_dump -j uses for parallel dumps.
This fixes the long-standing class of FK-violation and partial-state anomalies reported when cloning OLTP sources that take concurrent writes during the clone.
What's New
Consistent-snapshot clones (default behavior)
Every per-table COPY across the entire clone observes the same point-in-time view of the source. Parent/child rows, FK invariants, and any other application invariant that holds at a single instant on the source is preserved in the clone — even under concurrent writes.
Applies to every clone path:
| Path | Mechanism |
|---|---|
pgclone.table() (sync) |
Single source connection wrapped in BEGIN ISOLATION LEVEL REPEATABLE READ READ ONLY |
pgclone.schema() (sync) |
Initial source connection becomes the snapshot keeper; ID propagated to every per-table sub-call |
pgclone.database() (sync) |
Same keeper pattern at one level higher; snapshot ID flows down to every per-schema and per-table call |
pgclone.table_async() (single bgw) |
bgworker source connection wrapped in REPEATABLE READ READ ONLY |
pgclone.schema_async() (single bgw) |
Same as above; one snapshot for the whole schema |
pgclone.schema_async(... '{"parallel": N}') |
New pgclone_pool_coordinator_main bgworker exports the snapshot; N pool workers import via SET TRANSACTION SNAPSHOT |
Multi-connection paths use pg_export_snapshot() / SET TRANSACTION SNAPSHOT so every libpq connection involved in a single clone — per-table sub-calls, FK retry, views, matviews, functions, triggers, and every parallel pool worker — binds to one shared snapshot.
Opt-out per call
For callers who prefer the v4.2.x behavior — for example because the source can't tolerate a long-running keeper transaction — pass '{"consistent": false}' in any options JSON:
SELECT pgclone.schema('host=...', 'public', true, '{"consistent": false}');
SELECT pgclone.database('host=...', true, '{"consistent": false}');
SELECT pgclone.schema_async('host=...', 'public', true,
'{"parallel": 4, "consistent": false}');Snapshot coordinator background worker
When pgclone.schema_async(..., '{"parallel": N}') runs in consistent mode, pgclone launches one extra bgworker — the snapshot coordinator — visible in pgclone.jobs_view as pgclone: snapshot coordinator (parent <id>). It opens its own source connection, BEGINs, calls pg_export_snapshot(), publishes the ID to shared memory, and stays idle in transaction until every pool worker has bound to the snapshot, then COMMITs and exits.
Highlights
- No SQL surface change. Existing function signatures and return types are unchanged. The feature is enabled via a flag in the existing
options TEXTparameters. - Same correctness model as
pg_dump -j. Snapshot export from a keeper, snapshot import on every other connection. - Hot-standby supported on PostgreSQL ≥ 10 (where
pg_export_snapshot()works on a standby). - Defence-in-depth preserved. The pre-existing per-table READ ONLY wrap (SQL-injection containment around
WHEREclauses) detects the outer transaction and skips its own BEGIN/COMMIT, so the sandbox guarantee holds regardless of the call path.
Tradeoffs
- Long-running source-side transaction. Clones now hold an open transaction on the source for the full clone duration. On a busy source this delays VACUUM cleanup of dead tuples and WAL recycling proportional to clone time. For very long clones (multi-hour) on heavily updated sources this can cause table/index bloat and WAL accumulation. Opt out per call with
'{"consistent": false}'if cross-table consistency is less important than source-side lock pressure. - One extra bgworker per parallel pool clone (the snapshot coordinator). Negligible resource cost; appears in
pgclone.jobs_viewfor full progress visibility.
docs/USAGE.md discusses the REPEATABLE READ vs SERIALIZABLE DEFERRABLE tradeoff for readers who want the strictest possible isolation.
Internal
- New
CloneOptions.consistent(bool, defaulttrue) andCloneOptions.snapshot_id[64]plus parser support for"consistent": falseand"snapshot_id": "...". - New helpers in
src/pgclone.c:pgclone_begin_repeatable_read,pgclone_commit_source,pgclone_export_snapshot,pgclone_begin_with_imported_snapshot,pgclone_setup_source_txn,pgclone_setup_source_txn_done. Mirror helpers insrc/pgclone_bgw.c. - New
PgclonePoolQueueshared-memory fields:consistent,snapshot_ready,snapshot_failed,snapshot_imported_count,snapshot_expected_workers,launch_complete,snapshot_id[64],coordinator_job_id. - New
PgcloneJob.consistentfield so the single-worker async path knows whether to wrap in REPEATABLE READ READ ONLY.
CI / Tests
- New
test/test_consistent.sh— runs sync, async sequential, async pool, and opt-out clones against a concurrent parent/child FK writer that continuously inserts rows on the source. Asserts zero orphan child rows in every consistent path. Wired intotest/run_tests.sh. - All existing test suites unchanged and passing on PG 14, 15, 16, 17, 18.
Upgrade
ALTER EXTENSION pgclone UPDATE TO '4.3.0';
SELECT pgclone.version();The 4.2.0 → 4.3.0 migration script is a no-op (no DDL changes — the feature is a behavior change activated by an option in the existing options TEXT parameters).
Compatibility
- PostgreSQL 14, 15, 16, 17, 18
- No C ABI changes for existing functions
- No external library dependencies
Documentation
v4.2.0 — Pre-flight Validator
A new read-only sanity check that surfaces issues before they fail a clone mid-flight. Companion to pgclone.diff() (v4.1.0): preflight validates that a clone can succeed; diff reports drift after the fact.
What's New
pgclone.preflight(source_conninfo, schema_name) → JSON
Connect read-only to source and the local target, run 16 checks, return a JSON document with three role-based summary arrays (errors, warnings, info) plus a per-check breakdown.
SELECT pgclone.preflight(
'host=prod-db dbname=app user=postgres password=***',
'public'
)::jsonb;ready is true only when zero errors are recorded. Quick boolean gate:
SELECT (pgclone.preflight(:src, 'public')::jsonb ->> 'ready')::boolean
AS clone_safe;Checks performed
| Check | Severity | What it catches |
|---|---|---|
source_connection |
error | bad conninfo, unreachable host, auth failure |
target_connection |
error | local socket / pg_hba misconfiguration |
source_version |
info | source PostgreSQL version |
target_version |
info | target PostgreSQL version |
version_compat |
warn | source major version > target major version |
schema_exists_source |
error | source schema does not exist |
schema_exists_target |
info | target schema absent → will be created |
source_permissions |
error | role lacks USAGE on schema or SELECT on tables |
target_permissions |
error | role lacks CREATE on target schema (or DB if creating it) |
estimated_size |
info | sum of pg_total_relation_size on source |
target_database_size |
info | current target DB size for capacity reasoning |
object_counts |
info | tables / views / sequences / indexes counts |
name_conflicts |
warn | object names already present in target schema |
missing_extensions |
warn | extensions installed on source but not target |
missing_roles |
warn | owner / grantee roles not present on target |
missing_tablespaces |
warn | non-default tablespaces referenced by source |
Highlights
- Read-only on both sides. Both source and target connections wrap every query in
BEGIN ISOLATION LEVEL REPEATABLE READ READ ONLY. The function never executes DDL or DML. - Self-contained C module (
src/pgclone_preflight.c). Likesrc/pgclone_diff.c(v4.1.0), this file does not share helpers withsrc/pgclone.c— the feature is fully additive and trivially auditable. Seedocs/ARCHITECTURE.mdfor the design rationale. - No new permissions required. The pre-flight uses
has_*_privilege()to test the same privileges the clone itself would need — no destructive probes. - Composes with
pgclone.diff(). Pre-flight before a fresh clone; diff after to catch drift over time.
CI / Test Infrastructure
CI now invokes test/test_loopback.sh directly instead of the previous hand-rolled inline subset in .github/workflows/ci.yml. Going forward, every assertion added to the loopback script runs in CI automatically — closing the gap that was hiding v4.1.0 schema-diff tests from the matrix.
13 new pre-flight assertions were added to test/test_loopback.sh:
- function registration with documented
(text, text)signature - top-level JSON keys:
schema,ready,summary,checks readyis a booleansummarycontainserrors/warnings/info/checks_runchecksincludes the connection / version /schema_exists_*entries- schema name is echoed back
- missing source schema →
ready = false,errors[]non-empty name_conflicts.itemsis an arrayobject_counts.tablesis a non-negative integerversion_compatstatus ispassorwarn- read-only invariant on the target catalog
- STRICT (NULL
schema_nameargument yields no row)
Upgrade
ALTER EXTENSION pgclone UPDATE TO '4.2.0';
SELECT pgclone.version();No SQL signature changes for any existing function. The new pgclone.preflight() is purely additive.
Compatibility
- PostgreSQL 14, 15, 16, 17, 18
- No C ABI changes for existing functions
- No external library dependencies (the JSON escaper is shipped locally to avoid
utils/jsonapi.hcross-version drift)
Documentation
v4.1.0
v4.0.1 - Bugfix release
Fixes [issue #3](#3): schema clone fails on databases whose default search_path includes an application schema.
What was broken
Calling pgclone.schema(...) against a source database with a non-default search_path produced three classes of failure, all triggered by the same clone:
WARNING: pgclone: local exec failed: ERROR: relation "itiniris.inbox_messages" does not exist
(query: CREATE OR REPLACE FUNCTION itiniris.inbox_messages_fct_ck(p_id bigint) ...)
WARNING: pgclone: local exec failed: ERROR: relation "city_street" does not exist
(query: CREATE TRIGGER city_street_insert_trigger BEFORE INSERT ON city_street ...)
ERROR: pgclone: failed to create table locally: ERROR: relation "documents_to_resend_id_seq" does not exist
LINE 1: ...ts_to_resend (id integer NOT NULL DEFAULT nextval('documents...
Root cause
Two separate bugs interacted:
A. Functions were cloned before tables. pgclone_schema() Step 2 ran CREATE FUNCTION for every function in the schema before Step 4 created the tables. SQL-language functions (LANGUAGE sql) parse and validate object references at CREATE FUNCTION time — unlike LANGUAGE plpgsql which defers to call-time — so any function whose body referenced a same-schema table failed when that table didn't yet exist.
B. Source-side search_path leaked into extracted DDL. pg_get_triggerdef(), pg_get_expr() (used for column DEFAULTs), and pg_get_indexdef() all call generate_relation_name() internally, which omits the schema prefix when the relation's namespace is on the source session's search_path. Production source DBs commonly have ALTER DATABASE x SET search_path = app_schema, public, which caused every extracted DDL to come back with bare relation names. Replaying that DDL on the target loopback connection - whose search_path doesn't include the app schema — failed to resolve the references.
Fix
-
Force fully-qualified DDL output — every source libpq connection now runs
SET search_path = pg_catalogimmediately after connect (pgclone_normalize_session()helper invoked frompgclone_connect(), plus equivalent inline calls at the twoPQconnectdb(...source_conninfo)sites in the bgworker). Built-in types stay unqualified becausepg_catalogis onsearch_path; everything else gets schema-qualified. This matchespg_dump --no-search-pathbehaviour. -
Reorder
pgclone_schema()to respect dependencies:schema → sequences → tables (no triggers) → FK retry → views → matviews → functions → triggers (deferred pass)Per-table
pgclone_table()calls now receivetriggers=falsefrom the schema driver; the schema driver runs a single trigger pass at the end after all functions exist.
Both sync paths and both async bgworker paths (sequential and pool) are covered.
Verification
A new regression test (TEST GROUP 21 in test/pgclone_test.sql with fixtures in test/fixtures/seed.sql) reproduces all three failure modes by cloning a dep_test schema using a conninfo with options='-c search_path=dep_test,public'. Locally verified that the pre-fix .so produces exactly 5 failures matching the bug-report errors, and the post-fix .so passes 77/77 assertions.
Compatibility
- No public API changes
- No breaking changes to the JSON options format
- Upgrade path:
ALTER EXTENSION pgclone UPDATE(no-op SQL upgrade script — the fix is entirely in the compiled library) - Verified compiles cleanly on PostgreSQL 16; PostgreSQL 14–18 compatibility maintained (only
SET search_pathis added, which is stable across all supported versions)
Credits
Reported by [@herwigg](https://github.com/herwigg). Thanks for the detailed reproduction.
v4.0.0 — Schema Namespace
pgclone schema.
What Changed
The pgclone schema is created automatically when you run CREATE EXTENSION pgclone. All functions have been moved into this schema and the pgclone_ prefix has been removed:
| Before (v3.x) | After (v4.0.0) |
|---|---|
| pgclone_table(...) | pgclone.table(...) |
| pgclone_schema(...) | pgclone.schema(...) |
| pgclone_database(...) | pgclone.database(...) |
| pgclone_database_create(...) | pgclone.database_create(...) |
| pgclone_table_async(...) | pgclone.table_async(...) |
| pgclone_schema_async(...) | pgclone.schema_async(...) |
| pgclone_progress(...) | pgclone.progress(...) |
| pgclone_cancel(...) | pgclone.cancel(...) |
| pgclone_resume(...) | pgclone.resume(...) |
| pgclone_jobs() | pgclone.jobs() |
| pgclone_clear_jobs() | pgclone.clear_jobs() |
| pgclone_progress_detail() | pgclone.progress_detail() |
| pgclone_jobs_view | pgclone.jobs_view |
| pgclone_discover_sensitive(...) | pgclone.discover_sensitive(...) |
| pgclone_mask_in_place(...) | pgclone.mask_in_place(...) |
| pgclone_create_masking_policy(...) | pgclone.create_masking_policy(...) |
| pgclone_drop_masking_policy(...) | pgclone.drop_masking_policy(...) |
| pgclone_clone_roles(...) | pgclone.clone_roles(...) |
| pgclone_verify(...) | pgclone.verify(...) |
| pgclone_masking_report(...) | pgclone.masking_report(...) |
| pgclone_table_ex(...) | pgclone.table_ex(...) |
| pgclone_schema_ex(...) | pgclone.schema_ex(...) |
| pgclone_functions(...) | pgclone.functions(...) |
| pgclone_version() | pgclone.version() |
Upgrade Path
This is a breaking change. To upgrade from v3.x:
DROP EXTENSION pgclone;
CREATE EXTENSION pgclone;
SELECT pgclone.version();
All application queries must be updated to use the pgclone. prefix.
Quick Start
CREATE EXTENSION pgclone;SELECT pgclone.table(
'host=source-server dbname=mydb user=postgres password=secret',
'public', 'customers', true
);
SELECT pgclone.schema(
'host=source-server dbname=mydb user=postgres password=secret',
'sales', true
);
Compatibility
- PostgreSQL 14, 15, 16, 17, 18
- No C API changes — only SQL function names affected
- All 115 tests pass across all supported versions
v3.6.0 - GDPR/Compliance Masking
Generate audit reports proving that sensitive data is properly masked in your cloned databases. Essential for GDPR, HIPAA, and SOX compliance.
Usage
SELECT * FROM pgclone_masking_report('public');
schema_name | table_name | column_name | sensitivity | mask_status | recommendation
-------------+------------+-------------+--------------+---------------+--------------------------------------
public | employees | full_name | PII - Name | UNMASKED | Apply mask strategy: name
public | employees | email | Email | UNMASKED | Apply mask strategy: email
public | employees | phone | Phone | UNMASKED | Apply mask strategy: phone
public | employees | salary | Financial | UNMASKED | Apply mask strategy: random_int
public | employees | ssn | National ID | UNMASKED | Apply mask strategy: null
public | users | email | Email | MASKED (view) | OK - masked via users_masked view
public | users | password | Credential | MASKED (view) | OK - masked via users_masked view
Sensitivity Categories
Email, PII - Name, Phone, National ID, Financial, Credential, Address, Date of Birth, Credit Card, IP Address — ~40 patterns across 10 categories.
Mask Status
| Status | Meaning |
|---|---|
| MASKED (view) | A _masked view exists — data is protected via dynamic masking |
| UNMASKED | Sensitive column has no masking — action needed |
Compliance Workflow
-- 1. Clone production data SELECT pgclone_database('host=prod dbname=myapp user=postgres', true);-- 2. Find unmasked PII
SELECT * FROM pgclone_masking_report('public') WHERE mask_status = 'UNMASKED';-- 3. Apply masking
SELECT pgclone_create_masking_policy('public', 'employees',
'{"email": "email", "full_name": "name", "ssn": "null"}', 'dba_team');
-- 4. Confirm — all sensitive columns now masked
SELECT * FROM pgclone_masking_report('public');
Also in This Release
- Refactored sensitivity rules into shared
SensitivityRulestruct andpgclone_match_sensitivity()helper — used by bothpgclone_discover_sensitiveandpgclone_masking_report
Install
git clone https://github.com/valehdba/pgclone.git
cd pgclone && make && sudo make install
DROP EXTENSION IF EXISTS pgclone; CREATE EXTENSION pgclone;
SELECT pgclone_version(); -- pgclone 3.6.0
v3.5.0 — Clone Verification
Compare row counts between source and target databases after cloning to verify completeness.
Usage
-- Verify a specific schema
SELECT * FROM pgclone_verify(
'host=source-server dbname=prod user=postgres',
'app_schema'
);
schema_name | table_name | source_rows | target_rows | match
-------------+--------------+-------------+-------------+--------------
app_schema | customers | 15230 | 15230 | ✓
app_schema | orders | 148920 | 148920 | ✓
app_schema | payments | 98100 | 97855 | ✗
app_schema | audit_log | 1204500 | 0 | ✗ (missing)
-- Verify all schemas
SELECT * FROM pgclone_verify(
'host=source-server dbname=prod user=postgres'
);
Match Indicators
| Indicator | Meaning |
|---|---|
| ✓ | Row counts are equal |
| ✗ | Row counts differ |
| ✗ (missing) | Table exists on source but not on target |
How It Works
- Uses
pg_class.reltuplesfor fast approximate counts — no full table scans needed - Connects to both source (via libpq) and local (via loopback) in a single call
- Returns a standard PostgreSQL table result — can be filtered, sorted, joined
- Works with regular and partitioned tables
Typical Workflow
-- 1. Clone a schema SELECT pgclone_schema('host=prod dbname=myapp user=postgres', 'sales', true);-- 2. Verify the clone
SELECT * FROM pgclone_verify('host=prod dbname=myapp user=postgres', 'sales');
-- 3. Check for mismatches
SELECT * FROM pgclone_verify('host=prod dbname=myapp user=postgres', 'sales')
WHERE match != '✓';
What's New
pgclone_verify(conninfo)— all user schemaspgclone_verify(conninfo, schema)— single schema- SET-RETURNING FUNCTION using SRF pattern with
VerifyStateinmulti_call_memory_ctx - 5 new pgTAP tests (84 total, 23 test groups, 22 C functions)
Install
git clone https://github.com/valehdba/pgclone.git
cd pgclone && make && sudo make install
DROP EXTENSION IF EXISTS pgclone; CREATE EXTENSION pgclone;
SELECT pgclone_version(); -- pgclone 3.5.0