Status: Work in progress. Numbers and tuning defaults shift as pgfusion evolves — treat results as snapshots in time, not steady-state claims.
Benchmark suite comparing pgfusion against PostgreSQL on industry-standard analytical workloads.
pgfusion-benchmark/
├── justfile # task runner — preferred entry point
├── bench_lib.sh # shared runner library (sourced by run.sh scripts)
├── tune_postgres.sh # PostgreSQL tuning shared across all benchmarks
├── clickbench/ # ClickBench: 43 analytical queries on the hits dataset
│ ├── setup.sh # download dataset and load into PostgreSQL
│ ├── run.sh # run benchmark and produce results
│ ├── queries.sql
│ └── checkpoints/ # saved run results
└── tpch/ # TPC-H: 22 queries at configurable scale factor
├── setup.sh # generate data with dbgen and load into PostgreSQL
├── run.sh # run benchmark and produce results
├── queries.sql
└── checkpoints/ # saved run results
just(recommended — every recipe is exposed there)- PostgreSQL installed and configured — run
just pg-setup <version>frompgfusion/orpg_arrow/ PG_HARNESS_DIRset to yourpg-test-harnessclone (scripts derivepg-test-config.tomlfrom it)PGFUSION_ROOTset to thepgfusion/crate root (scripts buildpgfusion_clifrom source; falls back toPATH)timeout/gtimeout(from coreutils) — required to enforce the per-query timeout on the pgfusion side
export PG_HARNESS_DIR=/path/to/utils/pg-test-harness
export PGFUSION_ROOT=/path/to/pgfusionScripts read $PG_HARNESS_DIR/pg-test-config.toml to locate psql, pg_ctl, and $PGDATA. The config is generated by just pg-setup <version> in pgfusion/ or pg_arrow/.
PGFUSION_ROOT is used to build pgfusion_cli --release before each run. If unset, scripts fall back to pgfusion_cli on PATH.
just # list all recipes
just pg-tune pg18 # apply analytics-friendly PG tuning (run once)
# ── TPC-H ────────────────────────────────────────────────────────────────
just tpch-setup pg18 10 # generate + load SF10 dataset (~10 GB)
just tpch # run all 22 queries, 3 runs each
just tpch-query 7 # run a single query
just tpch-checkpoint pg18 3 baseline # run + archive to checkpoints/<hash>-baseline/
# ── ClickBench ──────────────────────────────────────────────────────────
just clickbench-setup pg18 # download hits dataset (~75 GB)
just clickbench # run all 43 queries
just clickbench pg18 3 query=13
just clickbench-checkpoint pg18 3 before-optThe run.sh and setup.sh scripts work standalone if just is unavailable:
# ClickBench
cd clickbench && ./setup.sh [pg_version]
./run.sh [pg_version] [runs] [--timeout=<sec>] [--checkpoint] [--label=<text>] [--query=N]
# TPC-H
cd tpch && ./setup.sh [pg_version] [scale_factor]
./run.sh [pg_version] [runs] [--timeout=<sec>] [--checkpoint] [--label=<text>] [--query=N]| Flag | Description |
|---|---|
--timeout=<sec> |
Per-query timeout in seconds (default 300, use 0 to disable). PG uses statement_timeout; pgfusion is wrapped with timeout. |
--checkpoint |
Save results to checkpoints/<git-short-hash>[-label]/ after the run |
--checkpoint-only |
Archive current results without re-running |
--label=<text> |
Tag appended to the checkpoint folder name |
--query=N |
Run only query N (1-based) |
--skip=N |
Skip query N |
Results are written to checkpoints/current/ on every run (results.csv, results.json, heatmap.html, per-query output samples).
tune_postgres.sh (run via just pg-tune <pg_version>) applies an analytics-friendly profile via ALTER SYSTEM SET:
- Parallelism matched to pgfusion's 10 partitions (
max_parallel_workers_per_gather, zeroedparallel_*_cost) - Memory:
shared_buffers = 8GB,work_mem = 4GB,effective_cache_size = 24GB - JIT always on (
jit_above_cost = 0) - I/O:
effective_io_concurrency = 200
All changes are reversible with ALTER SYSTEM RESET ALL followed by a restart. Tune values are tuned for a 32 GB workstation — edit tune_postgres.sh for larger boxes.
Every run writes to <bench>/checkpoints/current/ (CSV + JSON + per-query output samples + heatmap HTML).
Named checkpoints save under checkpoints/<git-short-hash>[-label]/.
just tpch-save my-label # archive current without re-running
just tpch-report # open latest heatmap in browser
just tpch-report-checkpoint <slug> # open a specific checkpoint heatmapEach query gets a status per engine:
| Status | Meaning |
|---|---|
OK |
Completed successfully; Time: parsed |
ERROR |
Engine returned an error (parse failure, planner error, missing table, etc.) |
TIMEOUT |
Cancelled by statement_timeout (PG) or wall-clock timeout (pgfusion) |