Skip to content

pgclone Wiki

Valeh Agayev edited this page May 21, 2026 · 2 revisions

pgclone is a PostgreSQL extension that clones databases, schemas, tables, and functions between PostgreSQL instances — directly from SQL. No pg_dump, no pg_restore, no shell scripts.


Table of Contents

  1. What pgclone Does
  2. Quick Start
  3. Installation
  4. Core Cloning
  5. Selective Cloning: Columns, WHERE, Conflict Resolution
  6. Async Operations & Parallel Worker Pool
  7. Data Masking
  8. Roles & Permissions
  9. Verification & Drift Detection
  10. Consistent-Snapshot Clones
  11. Function Reference
  12. JSON Options Reference
  13. Architecture & Internals
  14. Testing
  15. Security Notes
  16. Contributing
  17. Further Reading

What pgclone Does

Capability Notes
Clone tables, schemas, full databases One SQL call per operation
High-throughput data transfer PostgreSQL COPY protocol, not row-at-a-time
Full DDL fidelity Indexes, PK/UNIQUE/CHECK/FK/EXCLUDE constraints, triggers, views, matviews, sequences
Data masking 8 strategies — email, name, phone, partial, hash, null, random_int, constant
Auto-discovery Scans source for ~40 sensitive column patterns, returns suggested mask rules
Static & dynamic masking In-place UPDATE masking, or role-based masked views
Async + parallel Background workers with shared-memory job tracking, visual progress bar
Conflict resolution error, skip, replace, rename
Clone roles Attributes, encrypted passwords, memberships, grants
Verification Row-count diff, schema DDL drift detection, pre-flight validator
Consistent snapshots REPEATABLE READ READ ONLY on source + exported snapshot for multi-connection paths
Cross-version Tested on PG 14–18 in CI on every push
# pgclone Wiki

pgclone is a PostgreSQL extension that clones databases, schemas, tables, and functions between PostgreSQL instances — directly from SQL. No pg_dump, no pg_restore, no shell scripts.


Table of Contents

  1. [What pgclone Does](#what-pgclone-does)
  2. [Quick Start](#quick-start)
  3. [Installation](#installation)
  4. [Core Cloning](#core-cloning)
  5. [Selective Cloning: Columns, WHERE, Conflict Resolution](#selective-cloning-columns-where-conflict-resolution)
  6. [Async Operations & Parallel Worker Pool](#async-operations--parallel-worker-pool)
  7. [Data Masking](#data-masking)
  8. [Roles & Permissions](#roles--permissions)
  9. [Verification & Drift Detection](#verification--drift-detection)
  10. [Consistent-Snapshot Clones](#consistent-snapshot-clones-v430)
  11. [Function Reference](#function-reference)
  12. [JSON Options Reference](#json-options-reference)
  13. [Architecture & Internals](#architecture--internals)
  14. [Testing](#testing)
  15. [Security Notes](#security-notes)
  16. [Contributing](#contributing)
  17. [Further Reading](#further-reading)

What pgclone Does

Capability Notes
Clone tables, schemas, full databases One SQL call per operation
High-throughput data transfer PostgreSQL COPY protocol, not row-at-a-time
Full DDL fidelity Indexes, PK/UNIQUE/CHECK/FK/EXCLUDE constraints, triggers, views, matviews, sequences
Data masking 8 strategies — email, name, phone, partial, hash, null, random_int, constant
Auto-discovery Scans source for ~40 sensitive column patterns, returns suggested mask rules
Static & dynamic masking In-place UPDATE masking, or role-based masked views
Async + parallel Background workers with shared-memory job tracking, visual progress bar
Conflict resolution error, skip, replace, rename
Clone roles Attributes, encrypted passwords, memberships, grants
Verification Row-count diff, schema DDL drift detection, pre-flight validator
Consistent snapshots REPEATABLE READ READ ONLY on source + exported snapshot for multi-connection paths
Cross-version Tested on PG 14–18 in CI on every push

Quick Start

-- Enable
CREATE EXTENSION pgclone;

-- Clone a single table
SELECT pgclone.table(
    'host=source-server dbname=mydb user=postgres password=secret',
    'public', 'customers', true
);

-- Clone an entire schema
SELECT pgclone.schema(
    'host=source-server dbname=mydb user=postgres password=secret',
    'sales', true
);

-- Clone a full database (all user schemas)
SELECT pgclone.database(
    'host=source-server dbname=mydb user=postgres password=secret',
    true
);

-- Check the version
SELECT pgclone.version();

Installation

Build dependencies

Platform Install
Debian / Ubuntu sudo apt-get install postgresql-server-dev-18 libpq-dev build-essential
RHEL / Rocky / Alma sudo dnf install postgresql18-devel libpq-devel gcc make
macOS (Homebrew) brew install postgresql@18

Build & install

git clone https://github.com/valehdba/pgclone.git
cd pgclone
make
sudo make install

# Or pin to a specific PG version:
make PG_CONFIG=/usr/lib/postgresql/18/bin/pg_config
sudo make install PG_CONFIG=/usr/lib/postgresql/18/bin/pg_config

Enable

CREATE EXTENSION pgclone;

For async operations (background workers), add this to postgresql.conf and restart:

shared_preload_libraries = 'pgclone'
max_worker_processes = 32   -- recommended for parallel cloning

pgclone uses Unix domain sockets for local loopback DDL — the default local all all peer line in pg_hba.conf is sufficient; no trust entry needed.


Core Cloning

Table

-- Same name on target (default conflict = error)
SELECT pgclone.table(:src, 'public', 'customers', true);

-- Different target name
SELECT pgclone.table(:src, 'public', 'customers', true, 'customers_copy');

-- Structure only (no data)
SELECT pgclone.table(:src, 'public', 'customers', false);

-- JSON options
SELECT pgclone.table(:src, 'public', 'customers', true, 'customers_copy',
    '{"indexes": true, "constraints": true, "triggers": false}');

Schema

Clones every table, view, materialized view, sequence, function, and (in dependency-respecting order) every constraint and trigger in the schema.

SELECT pgclone.schema(:src, 'sales', true);

-- With options
SELECT pgclone.schema(:src, 'sales', true,
    '{"matviews": false, "triggers": false, "conflict": "replace"}');

Database

Clones all user schemas (system schemas are excluded).

SELECT pgclone.database(:src, true);

-- With options
SELECT pgclone.database(:src, true,
    '{"indexes": false, "constraints": false}');

Database Create

Create a new local database and clone into it (idempotent — works even if the database already exists):

SELECT pgclone.database_create(:src, 'clone_target_db', true);

The extension auto-installs pgclone into the newly created database.

Functions only

SELECT pgclone.functions(:src, 'public');

Selective Cloning: Columns, WHERE, Conflict Resolution

Selective columns

SELECT pgclone.table(:src, 'public', 'employees', true, 'emp_public',
    '{"columns": ["id", "full_name", "department"]}');

WHERE filter

SELECT pgclone.table(:src, 'public', 'users', true, 'active_users',
    '{"where": "status = ''active''"}');

The where filter is validated against DDL/DML keywords and semicolons and runs inside a READ ONLY transaction on the source (v2.2.1+).

Combined

SELECT pgclone.table(:src, 'public', 'orders', true, 'recent_orders',
    '{"columns": ["id", "customer_id", "total", "created_at"],
      "where": "created_at > ''2024-01-01''",
      "triggers": false}');

Conflict resolution

Strategy Behavior
error (default) Fail if target exists
skip Leave the existing target untouched, return without error
replace DROP existing target, then clone
rename Rename existing to <name>_old, then clone
SELECT pgclone.table(:src, 'public', 'orders', true, NULL,
    '{"conflict": "replace"}');

All four strategies also work with pgclone.table_async and pgclone.schema_async.


Async Operations & Parallel Worker Pool

Async functions submit clone jobs to PostgreSQL background workers and return immediately with a job_id. Requires shared_preload_libraries = 'pgclone'.

Submit a job

-- Returns job_id (integer)
SELECT pgclone.table_async(:src, 'public', 'large_table', true);

SELECT pgclone.schema_async(:src, 'sales', true);

Parallel pool (v2.2.0+)

Launches exactly N background workers that pull tables from a shared-memory queue — dynamic load balancing without exhausting max_worker_processes:

SELECT pgclone.schema_async(:src, 'sales', true,
    '{"parallel": 8, "conflict": "replace", "triggers": false}');

Live progress

-- Visual progress bar view
SELECT job_id, status, schema_name, progress_bar FROM pgclone.jobs_view;
 job_id | status    | schema_name | progress_bar
--------+-----------+-------------+------------------------------------------------------------
      1 | running   | sales       | [████████████░░░░░░░░] 60.0% | 450000 rows | 00:08:30 elapsed
      2 | completed | analytics   | [████████████████████] 100.0% | 1200000 rows | 00:25:18 elapsed
-- JSON snapshot of a single job
SELECT pgclone.progress(1);

-- All jobs as JSON array
SELECT pgclone.jobs();

Job management

SELECT pgclone.cancel(1);       -- cancel a running job
SELECT pgclone.resume(1);       -- resume from last checkpoint; returns new job_id
SELECT pgclone.clear_jobs();    -- free completed/cancelled slots in shared memory

See [ASYNC.md](https://github.com/valehdba/pgclone/blob/main/docs/ASYNC.md) for full async details.


Data Masking

Three complementary modes covering clone-time, post-clone, and view-based masking.

1. Clone-time masking (v3.0.0)

Mask expressions are applied server-side as SQL during the COPY stream — no per-row overhead.

SELECT pgclone.table(:src, 'public', 'employees', true, 'emp_masked',
    '{"mask": {
        "email":     "email",
        "full_name": "name",
        "phone":     "phone",
        "ssn":       "null",
        "salary":    {"type": "random_int", "min": 40000, "max": 200000},
        "notes":     {"type": "constant", "value": "REDACTED"}
    }}');
Strategy Effect
email Replace local-part, preserve domain
name Replace with XXXX
phone Replace with +X-XXX-XXX-XXXX
partial Keep prefix/suffix characters, mask the middle
hash Deterministic MD5 (32 hex chars) — preserves referential integrity
null NULL the column
random_int Uniform integer in [min, max]
constant Fixed value (string or numeric)

2. Auto-discovery of sensitive columns (v3.1.0)

SELECT pgclone.discover_sensitive(:src, 'public');

Returns a JSON object grouped by table with suggested mask strategies — paste directly into the "mask" option:

{"employees": {"email": "email", "full_name": "name", "phone": "phone", "salary": "random_int", "ssn": "null"}}

Detects ~40 patterns across email, name, phone, SSN/national ID, financials, secrets/credentials, addresses, DOB, credit card, IP.

3. Static masking — in-place UPDATE (v3.2.0)

Apply masking to an already-cloned local table without re-connecting to source:

SELECT pgclone.mask_in_place(
    'public', 'employees',
    '{"email": "email", "full_name": "name", "ssn": "null"}'
);
-- → OK: masked 1000 rows in public.employees (3 columns)

4. Dynamic masking — role-based views (v3.3.0)

Keep the base table intact; expose a masked view to unprivileged roles:

SELECT pgclone.create_masking_policy(
    'public', 'employees',
    '{"email": "email", "full_name": "name", "ssn": "null"}',
    'data_admin'   -- this role can still see raw data
);

-- Later
SELECT pgclone.drop_masking_policy('public', 'employees');

This creates public.employees_masked, revokes SELECT on the base table from PUBLIC, and grants raw access only to data_admin.


Roles & Permissions

Clone database roles from source — attributes, encrypted passwords (copied as-is from pg_authid), memberships, schema/table/sequence/function grants. Requires superuser on both sides.

-- All non-system roles
SELECT pgclone.clone_roles(:src);

-- Single role
SELECT pgclone.clone_roles(:src, 'app_user');

-- Comma-separated subset
SELECT pgclone.clone_roles(:src, 'reader,writer,etl_runner');

Existing target roles get their passwords and attributes synced — pgclone does not drop existing roles.


Verification & Drift Detection

Three read-only inspection functions. All run inside BEGIN ISOLATION LEVEL REPEATABLE READ READ ONLY on both source and target and never execute DDL or DML.

Row-count verify (v3.5.0)

-- Single schema
SELECT * FROM pgclone.verify(:src, 'sales');

-- All user schemas
SELECT * FROM pgclone.verify(:src);
 schema_name | table_name | source_rows | target_rows | match
-------------+------------+-------------+-------------+-------
 sales       | customers  |        1000 |        1000 | ✓
 sales       | orders     |        5240 |        5238 | ✗
 sales       | audit_log  |         812 |           0 | ✗ (missing)

Schema diff (v4.1.0)

Read-only DDL drift detector between source and the local target — JSON with per-category only_in_source / only_in_target / modified arrays for tables (with per-column type/nullability/default drift), indexes, constraints, triggers, views, sequences.

SELECT pgclone.diff(:src, 'app_schema');

-- Quick boolean gate
SELECT (pgclone.diff(:src, 'app_schema')::jsonb ->> 'in_sync')::boolean AS in_sync;

Pre-flight validator (v4.2.0)

Validate that a clone is likely to succeed before you run it: connections, permissions, version compatibility, capacity, name conflicts, missing roles, missing extensions, missing tablespaces.

SELECT pgclone.preflight(:src, 'app_schema');

-- Gate the clone on the result
SELECT (pgclone.preflight(:src, 'app_schema')::jsonb ->> 'ready')::boolean
       AS clone_safe;

GDPR/compliance masking report (v3.6.0)

SELECT * FROM pgclone.masking_report('public');

Lists every column, detects sensitivity, reports whether a _masked view exists, and gives a recommendation.


Consistent-Snapshot Clones (v4.3.0+)

Every clone — sync and async, single-connection and parallel-pool — wraps the source-side reads in a BEGIN ISOLATION LEVEL REPEATABLE READ READ ONLY transaction.

Multi-connection paths (schema clone, database clone, parallel pool) export a snapshot from one keeper connection and import it on every other connection via SET TRANSACTION SNAPSHOT — the same correctness model as pg_dump -j.

Result: every per-table COPY across the entire clone reads the same point-in-time view of the source. Cross-table FKs and any other invariant that holds at a single instant on source are preserved in the clone even when the source is taking concurrent writes.


Function Reference

Core clone

Function Returns Description
pgclone.version() text Extension version string
pgclone.table(conninfo, schema, table, include_data) text Clone a single table
pgclone.table(conninfo, schema, table, include_data, target_name) text Clone with different target name
pgclone.table(conninfo, schema, table, include_data, target_name, options) text Clone with JSON options
pgclone.table_ex(conninfo, schema, table, data, target, idx, constr, trig) text Clone with explicit boolean flags
pgclone.schema(conninfo, schema, include_data) text Clone an entire schema
pgclone.schema(conninfo, schema, include_data, options) text Clone schema with options
pgclone.schema_ex(conninfo, schema, data, idx, constr, trig) text Clone schema with boolean flags
pgclone.functions(conninfo, schema) text Clone all functions/procedures from a schema
pgclone.database(conninfo, include_data) text Clone all user schemas
pgclone.database(conninfo, include_data, options) text Clone database with options
pgclone.database_create(conninfo, target_db, include_data) text Create a new database and clone into it

Async / progress

Function Returns Description
pgclone.table_async(conninfo, schema, table, include_data [, target [, options]]) int4 Submit table-clone job; returns job_id
pgclone.schema_async(conninfo, schema, include_data [, options]) int4 Submit schema-clone job (supports "parallel": N)
pgclone.progress(job_id) text JSON snapshot of one job
pgclone.jobs() text JSON array of all jobs
pgclone.progress_detail() setof record Underlying table-returning function for jobs_view
pgclone.jobs_view view Live progress with visual bar + elapsed time
pgclone.cancel(job_id) bool Cancel a running job
pgclone.resume(job_id) int4 Resume a failed job from last checkpoint; returns new job_id
pgclone.clear_jobs() int4 Free completed/cancelled job slots in shared memory

Masking

Function Returns Description
pgclone.discover_sensitive(conninfo, schema) text Scan source for sensitive columns; returns suggested mask JSON
pgclone.mask_in_place(schema, table, mask_json) text Apply masking to an existing local table via UPDATE
pgclone.create_masking_policy(schema, table, mask_json, privileged_role) text Create <table>_masked view + role-based grants
pgclone.drop_masking_policy(schema, table) text Drop masked view, restore base table grants

Inspection / drift

Function Returns Description
pgclone.clone_roles(conninfo [, role_filter]) text Clone roles, passwords, memberships, grants from source
pgclone.verify(conninfo [, schema]) setof record Row-count diff between source and local target
pgclone.diff(conninfo, schema) text JSON DDL drift report (tables/indexes/constraints/triggers/views/sequences)
pgclone.preflight(conninfo, schema) text JSON pre-flight validation report
pgclone.masking_report(schema) setof record GDPR audit: sensitivity + mask status + recommendation per column

JSON Options Reference

Option Type Default Description
indexes bool true Clone indexes
constraints bool true Clone constraints (PK / UNIQUE / CHECK / FK / EXCLUDE)
triggers bool true Clone triggers
matviews bool true Clone materialized views (schema/db only)
columns array all Whitelist of columns to clone (table only)
where string none Row filter (table only) — validated against DDL/DML keywords
conflict string "error" error, skip, replace, rename
parallel int 1 Number of background workers (schema_async only)
mask object none Column masking rules: {"col":"strategy"} or {"col":{"type":"...","min":...,"max":...}}

Architecture & Internals

Why libpq instead of SPI

pgclone uses loopback libpq connections to the local target database for all DDL operations instead of PostgreSQL's SPI. SPI executes within the calling transaction's snapshot, so DDL statements like CREATE TABLE aren't visible to subsequent SPI calls within the same function invocation until commit. By connecting via libpq (even to localhost), each DDL statement executes in its own transaction and is immediately visible.

Why C instead of PL/pgSQL

  • Direct access to the COPY protocol via PQgetCopyData / PQputCopyData for high-throughput data transfer
  • Background-worker registration requires C (RegisterDynamicBackgroundWorker)
  • Shared-memory allocation for progress tracking requires C hooks (shmem_request_hook on PG 15+, RequestAddinShmemSpace on PG 14)
  • Fine-grained error handling with PG_TRY / PG_CATCH for guaranteed resource cleanup

Isolated translation units for read-only features (v4.1.0+)

Read-only feature additions live in their own .c files and do not share helpers with src/pgclone.c:

File Feature Version
src/pgclone_diff.c pgclone.diff() 4.1.0
src/pgclone_preflight.c pgclone.preflight() 4.2.0

Each re-implements the few helpers it needs (connect, READ ONLY wrappers, quote_literal_cstr, RFC 8259 JSON escaper). Duplication is intentional — a regression in pgclone.c cannot break these features, and vice versa.

Codebase layout

pgclone/
├── src/
│   ├── pgclone.c             # Main extension (~3800 lines)
│   ├── pgclone_bgw.c         # Background worker (~800 lines)
│   ├── pgclone_bgw.h         # Shared definitions and structs
│   ├── pgclone_diff.c        # Schema diff (isolated TU)
│   └── pgclone_preflight.c   # Pre-flight validator (isolated TU)
├── sql/pgclone--*.sql        # SQL function definitions per version
├── test/
│   ├── fixtures/seed.sql     # Test data
│   ├── pgclone_test.sql      # 66 pgTAP tests across 20 groups
│   ├── run_tests.sh          # Test orchestrator
│   ├── test_loopback.sh      # Loopback-DDL tests (roles/verify/diff/preflight)
│   ├── test_async.sh         # Async bgworker tests
│   └── test_database_create.sh
├── .github/workflows/ci.yml  # PG 14–18 matrix
├── Dockerfile / docker-compose.yml  # Local multi-version testing
├── Makefile                  # PGXS-based build
├── pgclone.control           # Extension metadata
├── META.json                 # PGXN metadata
├── pre_deploy_checks.sh      # Pre-release validation (22 checks)
└── docs/                     # USAGE, ASYNC, TESTING, ARCHITECTURE

See [ARCHITECTURE.md](https://github.com/valehdba/pgclone/blob/main/docs/ARCHITECTURE.md) for the deep dive.


Testing

pgclone has four test suites running across PG 14–18 in CI on every push:

Suite Location Count Scope
pgTAP sync test/pgclone_test.sql 66 Core clone, columns/WHERE, masking, auto-discovery, in-place mask, SQL-injection guards
Loopback-DDL test/test_loopback.sh 21 clone_roles, verify, masking_report, dynamic masking, diff, preflight
Database create test/test_database_create.sh 7 database_create, idempotent re-clone
Async bgworker test/test_async.sh table_async, schema_async, progress, clear_jobs

Run locally (Docker Compose)

# One PG version
docker compose down -v
docker compose up source-db -d
docker compose run --rm test-pg17

# All versions
for pg in 14 15 16 17 18; do
    echo "=== PostgreSQL $pg ==="
    docker compose down -v
    docker compose up source-db -d
    docker compose run --rm test-pg$pg 2>&1 | tail -5
done

See [TESTING.md](https://github.com/valehdba/pgclone/blob/main/docs/TESTING.md) for details.


Security Notes

  • pgclone requires superuser to install and to use (catalog access + RegisterDynamicBackgroundWorker).
  • All identifier interpolation goes through quote_identifier(); all literal interpolation through quote_literal_cstr() — there is no raw string concatenation in dynamic SQL.
  • The "where" filter is validated against DDL/DML keywords and semicolons, and the source query runs inside BEGIN ... READ ONLY (v2.2.1+).
  • Connection strings can contain passwords — prefer .pgpass or PGPASSFILE. pgclone never logs connection strings above DEBUG1.
  • For vulnerability reports see [SECURITY.md](https://github.com/valehdba/pgclone/blob/main/SECURITY.md).

Contributing

PRs welcome — see [CONTRIBUTING.md](https://github.com/valehdba/pgclone/blob/main/CONTRIBUTING.md).

Branch naming:

  • feature/<description> — new functionality
  • fix/<description> — bug fixes

CI runs the full PG 14–18 matrix on main, fix/**, feature/**, and on PRs to main. All four test suites must pass on every PG version before merge.


Further Reading

Document Purpose
Usage Guide Complete option-by-option reference with examples
Async Operations Async functions, parallel pool, progress tracking
Architecture Internal design decisions, codebase layout, version-compat strategy
Testing Test suite layout, how to add a test, CI pipeline
Changelog Per-version history from 0.1.0 to 4.3.0
Security Policy Vulnerability reporting policy