Skip to content

pg-arrow/pgfusion-benchmark

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

7 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

pgfusion-benchmark

Status: Work in progress. Numbers and tuning defaults shift as pgfusion evolves — treat results as snapshots in time, not steady-state claims.

Benchmark suite comparing pgfusion against PostgreSQL on industry-standard analytical workloads.

Structure

pgfusion-benchmark/
├── justfile            # task runner — preferred entry point
├── bench_lib.sh        # shared runner library (sourced by run.sh scripts)
├── tune_postgres.sh    # PostgreSQL tuning shared across all benchmarks
├── clickbench/         # ClickBench: 43 analytical queries on the hits dataset
│   ├── setup.sh        # download dataset and load into PostgreSQL
│   ├── run.sh          # run benchmark and produce results
│   ├── queries.sql
│   └── checkpoints/    # saved run results
└── tpch/               # TPC-H: 22 queries at configurable scale factor
    ├── setup.sh        # generate data with dbgen and load into PostgreSQL
    ├── run.sh          # run benchmark and produce results
    ├── queries.sql
    └── checkpoints/    # saved run results

Prerequisites

  • just (recommended — every recipe is exposed there)
  • PostgreSQL installed and configured — run just pg-setup <version> from pgfusion/ or pg_arrow/
  • PG_HARNESS_DIR set to your pg-test-harness clone (scripts derive pg-test-config.toml from it)
  • PGFUSION_ROOT set to the pgfusion/ crate root (scripts build pgfusion_cli from source; falls back to PATH)
  • timeout / gtimeout (from coreutils) — required to enforce the per-query timeout on the pgfusion side

Configuration

export PG_HARNESS_DIR=/path/to/utils/pg-test-harness
export PGFUSION_ROOT=/path/to/pgfusion

Scripts read $PG_HARNESS_DIR/pg-test-config.toml to locate psql, pg_ctl, and $PGDATA. The config is generated by just pg-setup <version> in pgfusion/ or pg_arrow/.

PGFUSION_ROOT is used to build pgfusion_cli --release before each run. If unset, scripts fall back to pgfusion_cli on PATH.

Quick start (via just)

just                           # list all recipes
just pg-tune pg18              # apply analytics-friendly PG tuning (run once)

# ── TPC-H ────────────────────────────────────────────────────────────────
just tpch-setup pg18 10        # generate + load SF10 dataset (~10 GB)
just tpch                      # run all 22 queries, 3 runs each
just tpch-query 7              # run a single query
just tpch-checkpoint pg18 3 baseline    # run + archive to checkpoints/<hash>-baseline/

# ── ClickBench ──────────────────────────────────────────────────────────
just clickbench-setup pg18     # download hits dataset (~75 GB)
just clickbench                # run all 43 queries
just clickbench pg18 3 query=13
just clickbench-checkpoint pg18 3 before-opt

Direct script usage

The run.sh and setup.sh scripts work standalone if just is unavailable:

# ClickBench
cd clickbench && ./setup.sh [pg_version]
./run.sh [pg_version] [runs] [--timeout=<sec>] [--checkpoint] [--label=<text>] [--query=N]

# TPC-H
cd tpch && ./setup.sh [pg_version] [scale_factor]
./run.sh [pg_version] [runs] [--timeout=<sec>] [--checkpoint] [--label=<text>] [--query=N]

Flags

Flag Description
--timeout=<sec> Per-query timeout in seconds (default 300, use 0 to disable). PG uses statement_timeout; pgfusion is wrapped with timeout.
--checkpoint Save results to checkpoints/<git-short-hash>[-label]/ after the run
--checkpoint-only Archive current results without re-running
--label=<text> Tag appended to the checkpoint folder name
--query=N Run only query N (1-based)
--skip=N Skip query N

Results are written to checkpoints/current/ on every run (results.csv, results.json, heatmap.html, per-query output samples).

PostgreSQL tuning

tune_postgres.sh (run via just pg-tune <pg_version>) applies an analytics-friendly profile via ALTER SYSTEM SET:

  • Parallelism matched to pgfusion's 10 partitions (max_parallel_workers_per_gather, zeroed parallel_*_cost)
  • Memory: shared_buffers = 8GB, work_mem = 4GB, effective_cache_size = 24GB
  • JIT always on (jit_above_cost = 0)
  • I/O: effective_io_concurrency = 200

All changes are reversible with ALTER SYSTEM RESET ALL followed by a restart. Tune values are tuned for a 32 GB workstation — edit tune_postgres.sh for larger boxes.

Checkpoint system

Every run writes to <bench>/checkpoints/current/ (CSV + JSON + per-query output samples + heatmap HTML). Named checkpoints save under checkpoints/<git-short-hash>[-label]/.

just tpch-save my-label             # archive current without re-running
just tpch-report                    # open latest heatmap in browser
just tpch-report-checkpoint <slug>  # open a specific checkpoint heatmap

Status semantics

Each query gets a status per engine:

Status Meaning
OK Completed successfully; Time: parsed
ERROR Engine returned an error (parse failure, planner error, missing table, etc.)
TIMEOUT Cancelled by statement_timeout (PG) or wall-clock timeout (pgfusion)

About

pgfusion benchmark setup and reports

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors