pgfusion-benchmark

Status: Work in progress. Numbers and tuning defaults shift as pgfusion evolves — treat results as snapshots in time, not steady-state claims.

Benchmark suite comparing pgfusion against PostgreSQL on industry-standard analytical workloads.

Structure

pgfusion-benchmark/
├── justfile            # task runner — preferred entry point
├── bench_lib.sh        # shared runner library (sourced by run.sh scripts)
├── tune_postgres.sh    # PostgreSQL tuning shared across all benchmarks
├── clickbench/         # ClickBench: 43 analytical queries on the hits dataset
│   ├── setup.sh        # download dataset and load into PostgreSQL
│   ├── run.sh          # run benchmark and produce results
│   ├── queries.sql
│   └── checkpoints/    # saved run results
└── tpch/               # TPC-H: 22 queries at configurable scale factor
    ├── setup.sh        # generate data with dbgen and load into PostgreSQL
    ├── run.sh          # run benchmark and produce results
    ├── queries.sql
    └── checkpoints/    # saved run results

Prerequisites

just (recommended — every recipe is exposed there)
PostgreSQL installed and configured — run just pg-setup <version> from pgfusion/ or pg_arrow/
PG_HARNESS_DIR set to your pg-test-harness clone (scripts derive pg-test-config.toml from it)
PGFUSION_ROOT set to the pgfusion/ crate root (scripts build pgfusion_cli from source; falls back to PATH)
timeout / gtimeout (from coreutils) — required to enforce the per-query timeout on the pgfusion side

Configuration

export PG_HARNESS_DIR=/path/to/utils/pg-test-harness
export PGFUSION_ROOT=/path/to/pgfusion

Scripts read $PG_HARNESS_DIR/pg-test-config.toml to locate psql, pg_ctl, and $PGDATA. The config is generated by just pg-setup <version> in pgfusion/ or pg_arrow/.

PGFUSION_ROOT is used to build pgfusion_cli --release before each run. If unset, scripts fall back to pgfusion_cli on PATH.

Quick start (via `just`)

just                           # list all recipes
just pg-tune pg18              # apply analytics-friendly PG tuning (run once)

# ── TPC-H ────────────────────────────────────────────────────────────────
just tpch-setup pg18 10        # generate + load SF10 dataset (~10 GB)
just tpch                      # run all 22 queries, 3 runs each
just tpch-query 7              # run a single query
just tpch-checkpoint pg18 3 baseline    # run + archive to checkpoints/<hash>-baseline/

# ── ClickBench ──────────────────────────────────────────────────────────
just clickbench-setup pg18     # download hits dataset (~75 GB)
just clickbench                # run all 43 queries
just clickbench pg18 3 query=13
just clickbench-checkpoint pg18 3 before-opt

Direct script usage

The run.sh and setup.sh scripts work standalone if just is unavailable:

# ClickBench
cd clickbench && ./setup.sh [pg_version]
./run.sh [pg_version] [runs] [--timeout=<sec>] [--checkpoint] [--label=<text>] [--query=N]

# TPC-H
cd tpch && ./setup.sh [pg_version] [scale_factor]
./run.sh [pg_version] [runs] [--timeout=<sec>] [--checkpoint] [--label=<text>] [--query=N]

Flags

Flag	Description
`--timeout=<sec>`	Per-query timeout in seconds (default `300`, use `0` to disable). PG uses `statement_timeout`; pgfusion is wrapped with `timeout`.
`--checkpoint`	Save results to `checkpoints/<git-short-hash>[-label]/` after the run
`--checkpoint-only`	Archive current results without re-running
`--label=<text>`	Tag appended to the checkpoint folder name
`--query=N`	Run only query N (1-based)
`--skip=N`	Skip query N

Results are written to checkpoints/current/ on every run (results.csv, results.json, heatmap.html, per-query output samples).

PostgreSQL tuning

tune_postgres.sh (run via just pg-tune <pg_version>) applies an analytics-friendly profile via ALTER SYSTEM SET:

Parallelism matched to pgfusion's 10 partitions (max_parallel_workers_per_gather, zeroed parallel_*_cost)
Memory: shared_buffers = 8GB, work_mem = 4GB, effective_cache_size = 24GB
JIT always on (jit_above_cost = 0)
I/O: effective_io_concurrency = 200

All changes are reversible with ALTER SYSTEM RESET ALL followed by a restart. Tune values are tuned for a 32 GB workstation — edit tune_postgres.sh for larger boxes.

Checkpoint system

Every run writes to <bench>/checkpoints/current/ (CSV + JSON + per-query output samples + heatmap HTML). Named checkpoints save under checkpoints/<git-short-hash>[-label]/.

just tpch-save my-label             # archive current without re-running
just tpch-report                    # open latest heatmap in browser
just tpch-report-checkpoint <slug>  # open a specific checkpoint heatmap

Status semantics

Each query gets a status per engine:

Status	Meaning
`OK`	Completed successfully; `Time:` parsed
`ERROR`	Engine returned an error (parse failure, planner error, missing table, etc.)
`TIMEOUT`	Cancelled by `statement_timeout` (PG) or wall-clock `timeout` (pgfusion)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

pgfusion-benchmark

Structure

Prerequisites

Configuration

Quick start (via `just`)

Direct script usage

Flags

PostgreSQL tuning

Checkpoint system

Status semantics

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
clickbench		clickbench
tpch		tpch
.gitignore		.gitignore
CLAUDE.md		CLAUDE.md
README.md		README.md
bench_lib.sh		bench_lib.sh
justfile		justfile
tune_postgres.sh		tune_postgres.sh

Folders and files

Latest commit

History

Repository files navigation

pgfusion-benchmark

Structure

Prerequisites

Configuration

Quick start (via just)

Direct script usage

Flags

PostgreSQL tuning

Checkpoint system

Status semantics

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Quick start (via `just`)

Packages