-
-
Notifications
You must be signed in to change notification settings - Fork 0
pgclone Wiki
pgclone is a PostgreSQL extension that clones databases, schemas, tables, and functions between PostgreSQL instances — directly from SQL. No pg_dump, no pg_restore, no shell scripts.
- Repository: github.com/valehdba/pgclone
-
Current version:
4.3.0 - Supported PostgreSQL: 14, 15, 16, 17, 18
- License: PostgreSQL License
- What pgclone Does
- Quick Start
- Installation
- Core Cloning
- Selective Cloning: Columns, WHERE, Conflict Resolution
- Async Operations & Parallel Worker Pool
- Data Masking
- Roles & Permissions
- Verification & Drift Detection
- Consistent-Snapshot Clones
- Function Reference
- JSON Options Reference
- Architecture & Internals
- Testing
- Security Notes
- Contributing
- Further Reading
| Capability | Notes |
|---|---|
| Clone tables, schemas, full databases | One SQL call per operation |
| High-throughput data transfer | PostgreSQL COPY protocol, not row-at-a-time |
| Full DDL fidelity | Indexes, PK/UNIQUE/CHECK/FK/EXCLUDE constraints, triggers, views, matviews, sequences |
| Data masking | 8 strategies — email, name, phone, partial, hash, null, random_int, constant |
| Auto-discovery | Scans source for ~40 sensitive column patterns, returns suggested mask rules |
| Static & dynamic masking | In-place UPDATE masking, or role-based masked views |
| Async + parallel | Background workers with shared-memory job tracking, visual progress bar |
| Conflict resolution | error, skip, replace, rename |
| Clone roles | Attributes, encrypted passwords, memberships, grants |
| Verification | Row-count diff, schema DDL drift detection, pre-flight validator |
| Consistent snapshots | REPEATABLE READ READ ONLY on source + exported snapshot for multi-connection paths |
| Cross-version | Tested on PG 14–18 in CI on every push |
pgclone is a PostgreSQL extension that clones databases, schemas, tables, and functions between PostgreSQL instances — directly from SQL. No pg_dump, no pg_restore, no shell scripts.
- Repository: [github.com/valehdba/pgclone](https://github.com/valehdba/pgclone)
-
Current version:
4.3.0 - Supported PostgreSQL: 14, 15, 16, 17, 18
- License: PostgreSQL License
- [What pgclone Does](#what-pgclone-does)
- [Quick Start](#quick-start)
- [Installation](#installation)
- [Core Cloning](#core-cloning)
- [Selective Cloning: Columns, WHERE, Conflict Resolution](#selective-cloning-columns-where-conflict-resolution)
- [Async Operations & Parallel Worker Pool](#async-operations--parallel-worker-pool)
- [Data Masking](#data-masking)
- [Roles & Permissions](#roles--permissions)
- [Verification & Drift Detection](#verification--drift-detection)
- [Consistent-Snapshot Clones](#consistent-snapshot-clones-v430)
- [Function Reference](#function-reference)
- [JSON Options Reference](#json-options-reference)
- [Architecture & Internals](#architecture--internals)
- [Testing](#testing)
- [Security Notes](#security-notes)
- [Contributing](#contributing)
- [Further Reading](#further-reading)
| Capability | Notes |
|---|---|
| Clone tables, schemas, full databases | One SQL call per operation |
| High-throughput data transfer | PostgreSQL COPY protocol, not row-at-a-time |
| Full DDL fidelity | Indexes, PK/UNIQUE/CHECK/FK/EXCLUDE constraints, triggers, views, matviews, sequences |
| Data masking | 8 strategies — email, name, phone, partial, hash, null, random_int, constant
|
| Auto-discovery | Scans source for ~40 sensitive column patterns, returns suggested mask rules |
| Static & dynamic masking | In-place UPDATE masking, or role-based masked views |
| Async + parallel | Background workers with shared-memory job tracking, visual progress bar |
| Conflict resolution |
error, skip, replace, rename
|
| Clone roles | Attributes, encrypted passwords, memberships, grants |
| Verification | Row-count diff, schema DDL drift detection, pre-flight validator |
| Consistent snapshots |
REPEATABLE READ READ ONLY on source + exported snapshot for multi-connection paths |
| Cross-version | Tested on PG 14–18 in CI on every push |
-- Enable
CREATE EXTENSION pgclone;
-- Clone a single table
SELECT pgclone.table(
'host=source-server dbname=mydb user=postgres password=secret',
'public', 'customers', true
);
-- Clone an entire schema
SELECT pgclone.schema(
'host=source-server dbname=mydb user=postgres password=secret',
'sales', true
);
-- Clone a full database (all user schemas)
SELECT pgclone.database(
'host=source-server dbname=mydb user=postgres password=secret',
true
);
-- Check the version
SELECT pgclone.version();| Platform | Install |
|---|---|
| Debian / Ubuntu | sudo apt-get install postgresql-server-dev-18 libpq-dev build-essential |
| RHEL / Rocky / Alma | sudo dnf install postgresql18-devel libpq-devel gcc make |
| macOS (Homebrew) | brew install postgresql@18 |
git clone https://github.com/valehdba/pgclone.git
cd pgclone
make
sudo make install
# Or pin to a specific PG version:
make PG_CONFIG=/usr/lib/postgresql/18/bin/pg_config
sudo make install PG_CONFIG=/usr/lib/postgresql/18/bin/pg_configCREATE EXTENSION pgclone;For async operations (background workers), add this to postgresql.conf and restart:
shared_preload_libraries = 'pgclone'
max_worker_processes = 32 -- recommended for parallel cloning
pgclone uses Unix domain sockets for local loopback DDL — the default local all all peer line in pg_hba.conf is sufficient; no trust entry needed.
-- Same name on target (default conflict = error)
SELECT pgclone.table(:src, 'public', 'customers', true);
-- Different target name
SELECT pgclone.table(:src, 'public', 'customers', true, 'customers_copy');
-- Structure only (no data)
SELECT pgclone.table(:src, 'public', 'customers', false);
-- JSON options
SELECT pgclone.table(:src, 'public', 'customers', true, 'customers_copy',
'{"indexes": true, "constraints": true, "triggers": false}');Clones every table, view, materialized view, sequence, function, and (in dependency-respecting order) every constraint and trigger in the schema.
SELECT pgclone.schema(:src, 'sales', true);
-- With options
SELECT pgclone.schema(:src, 'sales', true,
'{"matviews": false, "triggers": false, "conflict": "replace"}');Clones all user schemas (system schemas are excluded).
SELECT pgclone.database(:src, true);
-- With options
SELECT pgclone.database(:src, true,
'{"indexes": false, "constraints": false}');Create a new local database and clone into it (idempotent — works even if the database already exists):
SELECT pgclone.database_create(:src, 'clone_target_db', true);The extension auto-installs pgclone into the newly created database.
SELECT pgclone.functions(:src, 'public');SELECT pgclone.table(:src, 'public', 'employees', true, 'emp_public',
'{"columns": ["id", "full_name", "department"]}');SELECT pgclone.table(:src, 'public', 'users', true, 'active_users',
'{"where": "status = ''active''"}');The where filter is validated against DDL/DML keywords and semicolons and runs inside a READ ONLY transaction on the source (v2.2.1+).
SELECT pgclone.table(:src, 'public', 'orders', true, 'recent_orders',
'{"columns": ["id", "customer_id", "total", "created_at"],
"where": "created_at > ''2024-01-01''",
"triggers": false}');| Strategy | Behavior |
|---|---|
error (default) |
Fail if target exists |
skip |
Leave the existing target untouched, return without error |
replace |
DROP existing target, then clone |
rename |
Rename existing to <name>_old, then clone |
SELECT pgclone.table(:src, 'public', 'orders', true, NULL,
'{"conflict": "replace"}');All four strategies also work with pgclone.table_async and pgclone.schema_async.
Async functions submit clone jobs to PostgreSQL background workers and return immediately with a job_id. Requires shared_preload_libraries = 'pgclone'.
-- Returns job_id (integer)
SELECT pgclone.table_async(:src, 'public', 'large_table', true);
SELECT pgclone.schema_async(:src, 'sales', true);Launches exactly N background workers that pull tables from a shared-memory queue — dynamic load balancing without exhausting max_worker_processes:
SELECT pgclone.schema_async(:src, 'sales', true,
'{"parallel": 8, "conflict": "replace", "triggers": false}');-- Visual progress bar view
SELECT job_id, status, schema_name, progress_bar FROM pgclone.jobs_view; job_id | status | schema_name | progress_bar
--------+-----------+-------------+------------------------------------------------------------
1 | running | sales | [████████████░░░░░░░░] 60.0% | 450000 rows | 00:08:30 elapsed
2 | completed | analytics | [████████████████████] 100.0% | 1200000 rows | 00:25:18 elapsed
-- JSON snapshot of a single job
SELECT pgclone.progress(1);
-- All jobs as JSON array
SELECT pgclone.jobs();SELECT pgclone.cancel(1); -- cancel a running job
SELECT pgclone.resume(1); -- resume from last checkpoint; returns new job_id
SELECT pgclone.clear_jobs(); -- free completed/cancelled slots in shared memorySee [ASYNC.md](https://github.com/valehdba/pgclone/blob/main/docs/ASYNC.md) for full async details.
Three complementary modes covering clone-time, post-clone, and view-based masking.
Mask expressions are applied server-side as SQL during the COPY stream — no per-row overhead.
SELECT pgclone.table(:src, 'public', 'employees', true, 'emp_masked',
'{"mask": {
"email": "email",
"full_name": "name",
"phone": "phone",
"ssn": "null",
"salary": {"type": "random_int", "min": 40000, "max": 200000},
"notes": {"type": "constant", "value": "REDACTED"}
}}');| Strategy | Effect |
|---|---|
email |
Replace local-part, preserve domain |
name |
Replace with XXXX
|
phone |
Replace with +X-XXX-XXX-XXXX
|
partial |
Keep prefix/suffix characters, mask the middle |
hash |
Deterministic MD5 (32 hex chars) — preserves referential integrity |
null |
NULL the column |
random_int |
Uniform integer in [min, max]
|
constant |
Fixed value (string or numeric) |
SELECT pgclone.discover_sensitive(:src, 'public');Returns a JSON object grouped by table with suggested mask strategies — paste directly into the "mask" option:
{"employees": {"email": "email", "full_name": "name", "phone": "phone", "salary": "random_int", "ssn": "null"}}Detects ~40 patterns across email, name, phone, SSN/national ID, financials, secrets/credentials, addresses, DOB, credit card, IP.
Apply masking to an already-cloned local table without re-connecting to source:
SELECT pgclone.mask_in_place(
'public', 'employees',
'{"email": "email", "full_name": "name", "ssn": "null"}'
);
-- → OK: masked 1000 rows in public.employees (3 columns)Keep the base table intact; expose a masked view to unprivileged roles:
SELECT pgclone.create_masking_policy(
'public', 'employees',
'{"email": "email", "full_name": "name", "ssn": "null"}',
'data_admin' -- this role can still see raw data
);
-- Later
SELECT pgclone.drop_masking_policy('public', 'employees');This creates public.employees_masked, revokes SELECT on the base table from PUBLIC, and grants raw access only to data_admin.
Clone database roles from source — attributes, encrypted passwords (copied as-is from pg_authid), memberships, schema/table/sequence/function grants. Requires superuser on both sides.
-- All non-system roles
SELECT pgclone.clone_roles(:src);
-- Single role
SELECT pgclone.clone_roles(:src, 'app_user');
-- Comma-separated subset
SELECT pgclone.clone_roles(:src, 'reader,writer,etl_runner');Existing target roles get their passwords and attributes synced — pgclone does not drop existing roles.
Three read-only inspection functions. All run inside BEGIN ISOLATION LEVEL REPEATABLE READ READ ONLY on both source and target and never execute DDL or DML.
-- Single schema
SELECT * FROM pgclone.verify(:src, 'sales');
-- All user schemas
SELECT * FROM pgclone.verify(:src); schema_name | table_name | source_rows | target_rows | match
-------------+------------+-------------+-------------+-------
sales | customers | 1000 | 1000 | ✓
sales | orders | 5240 | 5238 | ✗
sales | audit_log | 812 | 0 | ✗ (missing)
Read-only DDL drift detector between source and the local target — JSON with per-category only_in_source / only_in_target / modified arrays for tables (with per-column type/nullability/default drift), indexes, constraints, triggers, views, sequences.
SELECT pgclone.diff(:src, 'app_schema');
-- Quick boolean gate
SELECT (pgclone.diff(:src, 'app_schema')::jsonb ->> 'in_sync')::boolean AS in_sync;Validate that a clone is likely to succeed before you run it: connections, permissions, version compatibility, capacity, name conflicts, missing roles, missing extensions, missing tablespaces.
SELECT pgclone.preflight(:src, 'app_schema');
-- Gate the clone on the result
SELECT (pgclone.preflight(:src, 'app_schema')::jsonb ->> 'ready')::boolean
AS clone_safe;SELECT * FROM pgclone.masking_report('public');Lists every column, detects sensitivity, reports whether a _masked view exists, and gives a recommendation.
Every clone — sync and async, single-connection and parallel-pool — wraps the source-side reads in a BEGIN ISOLATION LEVEL REPEATABLE READ READ ONLY transaction.
Multi-connection paths (schema clone, database clone, parallel pool) export a snapshot from one keeper connection and import it on every other connection via SET TRANSACTION SNAPSHOT — the same correctness model as pg_dump -j.
Result: every per-table COPY across the entire clone reads the same point-in-time view of the source. Cross-table FKs and any other invariant that holds at a single instant on source are preserved in the clone even when the source is taking concurrent writes.
| Function | Returns | Description |
|---|---|---|
pgclone.version() |
text |
Extension version string |
pgclone.table(conninfo, schema, table, include_data) |
text |
Clone a single table |
pgclone.table(conninfo, schema, table, include_data, target_name) |
text |
Clone with different target name |
pgclone.table(conninfo, schema, table, include_data, target_name, options) |
text |
Clone with JSON options |
pgclone.table_ex(conninfo, schema, table, data, target, idx, constr, trig) |
text |
Clone with explicit boolean flags |
pgclone.schema(conninfo, schema, include_data) |
text |
Clone an entire schema |
pgclone.schema(conninfo, schema, include_data, options) |
text |
Clone schema with options |
pgclone.schema_ex(conninfo, schema, data, idx, constr, trig) |
text |
Clone schema with boolean flags |
pgclone.functions(conninfo, schema) |
text |
Clone all functions/procedures from a schema |
pgclone.database(conninfo, include_data) |
text |
Clone all user schemas |
pgclone.database(conninfo, include_data, options) |
text |
Clone database with options |
pgclone.database_create(conninfo, target_db, include_data) |
text |
Create a new database and clone into it |
| Function | Returns | Description |
|---|---|---|
pgclone.table_async(conninfo, schema, table, include_data [, target [, options]]) |
int4 |
Submit table-clone job; returns job_id
|
pgclone.schema_async(conninfo, schema, include_data [, options]) |
int4 |
Submit schema-clone job (supports "parallel": N) |
pgclone.progress(job_id) |
text |
JSON snapshot of one job |
pgclone.jobs() |
text |
JSON array of all jobs |
pgclone.progress_detail() |
setof record |
Underlying table-returning function for jobs_view
|
pgclone.jobs_view |
view | Live progress with visual bar + elapsed time |
pgclone.cancel(job_id) |
bool |
Cancel a running job |
pgclone.resume(job_id) |
int4 |
Resume a failed job from last checkpoint; returns new job_id
|
pgclone.clear_jobs() |
int4 |
Free completed/cancelled job slots in shared memory |
| Function | Returns | Description |
|---|---|---|
pgclone.discover_sensitive(conninfo, schema) |
text |
Scan source for sensitive columns; returns suggested mask JSON |
pgclone.mask_in_place(schema, table, mask_json) |
text |
Apply masking to an existing local table via UPDATE |
pgclone.create_masking_policy(schema, table, mask_json, privileged_role) |
text |
Create <table>_masked view + role-based grants |
pgclone.drop_masking_policy(schema, table) |
text |
Drop masked view, restore base table grants |
| Function | Returns | Description |
|---|---|---|
pgclone.clone_roles(conninfo [, role_filter]) |
text |
Clone roles, passwords, memberships, grants from source |
pgclone.verify(conninfo [, schema]) |
setof record |
Row-count diff between source and local target |
pgclone.diff(conninfo, schema) |
text |
JSON DDL drift report (tables/indexes/constraints/triggers/views/sequences) |
pgclone.preflight(conninfo, schema) |
text |
JSON pre-flight validation report |
pgclone.masking_report(schema) |
setof record |
GDPR audit: sensitivity + mask status + recommendation per column |
| Option | Type | Default | Description |
|---|---|---|---|
indexes |
bool | true |
Clone indexes |
constraints |
bool | true |
Clone constraints (PK / UNIQUE / CHECK / FK / EXCLUDE) |
triggers |
bool | true |
Clone triggers |
matviews |
bool | true |
Clone materialized views (schema/db only) |
columns |
array | all | Whitelist of columns to clone (table only) |
where |
string | none | Row filter (table only) — validated against DDL/DML keywords |
conflict |
string | "error" |
error, skip, replace, rename
|
parallel |
int | 1 | Number of background workers (schema_async only) |
mask |
object | none | Column masking rules: {"col":"strategy"} or {"col":{"type":"...","min":...,"max":...}}
|
pgclone uses loopback libpq connections to the local target database for all DDL operations instead of PostgreSQL's SPI. SPI executes within the calling transaction's snapshot, so DDL statements like CREATE TABLE aren't visible to subsequent SPI calls within the same function invocation until commit. By connecting via libpq (even to localhost), each DDL statement executes in its own transaction and is immediately visible.
- Direct access to the COPY protocol via
PQgetCopyData/PQputCopyDatafor high-throughput data transfer - Background-worker registration requires C (
RegisterDynamicBackgroundWorker) - Shared-memory allocation for progress tracking requires C hooks (
shmem_request_hookon PG 15+,RequestAddinShmemSpaceon PG 14) - Fine-grained error handling with
PG_TRY/PG_CATCHfor guaranteed resource cleanup
Read-only feature additions live in their own .c files and do not share helpers with src/pgclone.c:
| File | Feature | Version |
|---|---|---|
src/pgclone_diff.c |
pgclone.diff() |
4.1.0 |
src/pgclone_preflight.c |
pgclone.preflight() |
4.2.0 |
Each re-implements the few helpers it needs (connect, READ ONLY wrappers, quote_literal_cstr, RFC 8259 JSON escaper). Duplication is intentional — a regression in pgclone.c cannot break these features, and vice versa.
pgclone/
├── src/
│ ├── pgclone.c # Main extension (~3800 lines)
│ ├── pgclone_bgw.c # Background worker (~800 lines)
│ ├── pgclone_bgw.h # Shared definitions and structs
│ ├── pgclone_diff.c # Schema diff (isolated TU)
│ └── pgclone_preflight.c # Pre-flight validator (isolated TU)
├── sql/pgclone--*.sql # SQL function definitions per version
├── test/
│ ├── fixtures/seed.sql # Test data
│ ├── pgclone_test.sql # 66 pgTAP tests across 20 groups
│ ├── run_tests.sh # Test orchestrator
│ ├── test_loopback.sh # Loopback-DDL tests (roles/verify/diff/preflight)
│ ├── test_async.sh # Async bgworker tests
│ └── test_database_create.sh
├── .github/workflows/ci.yml # PG 14–18 matrix
├── Dockerfile / docker-compose.yml # Local multi-version testing
├── Makefile # PGXS-based build
├── pgclone.control # Extension metadata
├── META.json # PGXN metadata
├── pre_deploy_checks.sh # Pre-release validation (22 checks)
└── docs/ # USAGE, ASYNC, TESTING, ARCHITECTURE
See [ARCHITECTURE.md](https://github.com/valehdba/pgclone/blob/main/docs/ARCHITECTURE.md) for the deep dive.
pgclone has four test suites running across PG 14–18 in CI on every push:
| Suite | Location | Count | Scope |
|---|---|---|---|
| pgTAP sync | test/pgclone_test.sql |
66 | Core clone, columns/WHERE, masking, auto-discovery, in-place mask, SQL-injection guards |
| Loopback-DDL | test/test_loopback.sh |
21 |
clone_roles, verify, masking_report, dynamic masking, diff, preflight
|
| Database create | test/test_database_create.sh |
7 |
database_create, idempotent re-clone |
| Async bgworker | test/test_async.sh |
— |
table_async, schema_async, progress, clear_jobs
|
# One PG version
docker compose down -v
docker compose up source-db -d
docker compose run --rm test-pg17
# All versions
for pg in 14 15 16 17 18; do
echo "=== PostgreSQL $pg ==="
docker compose down -v
docker compose up source-db -d
docker compose run --rm test-pg$pg 2>&1 | tail -5
doneSee [TESTING.md](https://github.com/valehdba/pgclone/blob/main/docs/TESTING.md) for details.
- pgclone requires superuser to install and to use (catalog access +
RegisterDynamicBackgroundWorker). - All identifier interpolation goes through
quote_identifier(); all literal interpolation throughquote_literal_cstr()— there is no raw string concatenation in dynamic SQL. - The
"where"filter is validated against DDL/DML keywords and semicolons, and the source query runs insideBEGIN ... READ ONLY(v2.2.1+). - Connection strings can contain passwords — prefer
.pgpassorPGPASSFILE. pgclone never logs connection strings aboveDEBUG1. - For vulnerability reports see [SECURITY.md](https://github.com/valehdba/pgclone/blob/main/SECURITY.md).
PRs welcome — see [CONTRIBUTING.md](https://github.com/valehdba/pgclone/blob/main/CONTRIBUTING.md).
Branch naming:
-
feature/<description>— new functionality -
fix/<description>— bug fixes
CI runs the full PG 14–18 matrix on main, fix/**, feature/**, and on PRs to main. All four test suites must pass on every PG version before merge.
| Document | Purpose |
|---|---|
| Usage Guide | Complete option-by-option reference with examples |
| Async Operations | Async functions, parallel pool, progress tracking |
| Architecture | Internal design decisions, codebase layout, version-compat strategy |
| Testing | Test suite layout, how to add a test, CI pipeline |
| Changelog | Per-version history from 0.1.0 to 4.3.0 |
| Security Policy | Vulnerability reporting policy |