Skip to content

styk-tv/pgRDF

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

311 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

pgRDF

pgRDF

License PostgreSQL pgrx Rust Status LATEST.md Tests SPARQL ShmemCache PlanCache BulkIngest Inference Validation CI W3C W3C SHACL

A Rust-native PostgreSQL extension for RDF, SPARQL, SHACL and OWL reasoning.

Treat Postgres as the storage + execution engine for your knowledge graph. Load Turtle, query via SPARQL, validate via SHACL, materialize inferences via OWL 2 RL — all addressable from any Postgres client.

Status v0.5.40 — current advertised release (LATEST.md). Engine surface unchanged from v0.5.0; the v0.5.10..v0.5.40 cycle ships PGXN packaging, OCI distribution with SLSA Build Provenance v1 attestations, a 5-gate release-pipeline contract (PROVENANCE.md Rule 7), Phase-0 ingest instrumentation (parse_ms/dict_ms/insert_ms), and additive Track A perf-spike UDFs (parse_turtle_dict_batched -17% e2e, shmem_cache_prewarm -54% e2e — both behind explicit opt-in surfaces; default parse_turtle path unchanged). Pin via oras pull ghcr.io/styk-tv/pgrdf-bundle:0.5.40 or whatever LATEST.md advertises at audit time.

Query — SPARQL SELECT/ASK over N-pattern BGPs · FILTER · DISTINCT/LIMIT/OFFSET · type-aware ORDER BY · multi-triple OPTIONAL · UNION · MINUS · aggregates (COUNT/SUM/AVG/type-aware MIN-MAX/GROUP_CONCAT/SAMPLE) incl. over UNION · HAVING · downstream BIND · VALUES · named-graph scoping (GRAPH <iri> + GRAPH ?g + composition) · CONSTRUCT (constant/variable/blank-node/multi-triple templates · WHERE-shorthand · round-trip ingest) · DESCRIBE (W3C §16.4 CBD via pgrdf.describe) · property paths (^ + * ? · | alternation · materialised-closure no-CTE fallback · pgrdf.path_max_depth guard).
Update — full SPARQL UPDATE: INSERT/DELETE DATA · INSERT/DELETE WHERE · DELETE+INSERT WHERE · WITH <iri> scoping · lifecycle algebra (DROP/CLEAR/CREATE GRAPH × DEFAULT/NAMED/ALL).
Storage — CRUD + Turtle / TriG / N-Quads ingest (parse_turtle / parse_trig / parse_nquads) · per-graph LIST partitions · lifecycle UDFs (drop/clear/copy/move_graph, BIGINT + IRI overloads) · shmem dict cache (§4.1) + prepared-plan cache (§4.2) + prepared bulk-INSERT (§4.3 phase A).
Inferencepgrdf.materialize(graph_id, profile)owl-rl and rdfs profiles. Validationpgrdf.validate(data, shapes, mode) → real W3C sh:ValidationReport JSONB; SHACL Core native (genuine W3C SHACL Core 25/25); mode=>'sparql' is shipped + honest, upstream-gated (ERRATA E-012).

Shipped on the v0.4/v0.5 countdown: v0.4.0 SHACL · v0.4.1 named-graph §3 · v0.4.2 lifecycle UDFs §5 · v0.4.3 SPARQL UPDATE §4 · v0.4.4 CONSTRUCT §6 · v0.4.5 property paths §7 · v0.4.6 §11 SPARQL backlog · v0.5.0 — the complete surface (DESCRIBE, TriG/N-Quads, IRI lifecycle overloads, rdfs+owl-rl profiles, native SHACL Core 25/25).
Documented upstream gates (honest, not defects): E-011 — RDF 1.2 triple terms + crates.io publish gated on gtfierro/reasonable#50; E-012 — SHACL-SPARQL constraint execution gated on rudof (#21/#94); the mode=>'sparql' surface ships honest.
Deferred → v0.6-FUTURE: executor.rs core-BGP carve · heap_multi_insert phase B · real SHACL-SPARQL engine · federated SERVICE · incremental materialisation · RDF 1.2 (see SPEC.pgRDF.LLD.v0.6-FUTURE).
Supported PG 14, 15, 16, 17. PG 18 adoption stays deferred — pgrx 0.16 pin; 0.18.0 still fails to build locally and changes the schema-gen model. See ERRATA E-006.
Install Three paths. GitHub-release tarball — per-file :ro bind-mount of .so/.control/.sql into stock postgres:17.4-bookworm (8 tarballs: pg14-17 × amd64/arm64 + SHA256SUMS). Anonymous OCIoras pull ghcr.io/styk-tv/pgrdf-bundle:0.5.40 (zero credentials, public; pin to whatever LATEST.md advertises at audit time — every advertised digest carries an attested SLSA Build Provenance v1, verifiable via gh attestation verify oci://ghcr.io/styk-tv/pgrdf-bundle:<tag> --repo styk-tv/pgRDF). PGXN source installpgxn install pgrdf --pg_config /path/to/pg_config on a host with Rust 1.91 + cargo-pgrx 0.16 (see INSTALL.md). Per SPEC.pgRDF.INSTALL.v0.2.
Repo styk-tv/pgRDF

What you can do today

-- One-time install
CREATE EXTENSION pgrdf;

-- Load any Turtle file from the server-side filesystem
SELECT pgrdf.load_turtle('/fixtures/ontologies/foaf.ttl', 100);
--  → 631

-- See structured ingest stats (timing, cache hits, batches)
SELECT pgrdf.load_turtle_verbose('/fixtures/ontologies/prov.ttl', 200, 'http://www.w3.org/ns/prov#');
--  → {"triples": 1789, "dict_cache_hits": 4612, "dict_db_calls": 783, "quad_batches": 2, "elapsed_ms": 142.7}

-- Manage per-graph LIST partitions for cheap whole-graph drops
SELECT pgrdf.add_graph(42);
SELECT pgrdf.count_quads(42);

-- Inspect the dictionary directly
SELECT * FROM pgrdf._pgrdf_dictionary WHERE term_type = 1 LIMIT 5;

SPARQL

-- Multi-pattern BGP, shared variables become joins
SELECT * FROM pgrdf.sparql(
  'PREFIX foaf: <http://xmlns.com/foaf/0.1/>
   SELECT ?p ?n ?m
     WHERE { ?p foaf:name ?n .
             ?p foaf:mbox ?m }'
);
--  → {"p": "http://example.com/alice", "n": "Alice", "m": "mailto:a@x"}

-- FILTER over the BGP — identity, boolean composition, term-type tests
SELECT * FROM pgrdf.sparql(
  'PREFIX foaf: <http://xmlns.com/foaf/0.1/>
   SELECT ?s ?o
     WHERE { ?s ?p ?o FILTER(isIRI(?o) && ?p = foaf:knows) }'
);

-- Numeric ordering + REGEX in a single query
SELECT * FROM pgrdf.sparql(
  'PREFIX foaf: <http://xmlns.com/foaf/0.1/>
   SELECT ?s ?n
     WHERE { ?s foaf:name ?n .
             ?s <http://example.com/age> ?age
             FILTER(?age >= 30 && REGEX(?n, "^A", "i")) }'
);

-- OPTIONAL — mbox stays NULL when the person has no foaf:mbox
SELECT * FROM pgrdf.sparql(
  'PREFIX foaf: <http://xmlns.com/foaf/0.1/>
   SELECT ?s ?n ?m
     WHERE { ?s foaf:name ?n
             OPTIONAL { ?s foaf:mbox ?m } }'
);
--  → {"s": "http://example.com/alice", "n": "Alice", "m": "mailto:a@x"}
--  → {"s": "http://example.com/bob",   "n": "Bob",   "m": null}

-- UNION — either branch contributes solutions; unbound vars come as null
SELECT * FROM pgrdf.sparql(
  'PREFIX foaf: <http://xmlns.com/foaf/0.1/>
   SELECT ?s ?n ?m
     WHERE { { ?s foaf:name ?n }
             UNION
             { ?s foaf:mbox ?m } }'
);

-- Aggregates with GROUP BY — count of triples per predicate
SELECT * FROM pgrdf.sparql(
  'SELECT ?p (COUNT(?o) AS ?n)
     WHERE { ?s ?p ?o }
   GROUP BY ?p ORDER BY DESC(?n)'
);
--  → {"p": "http://xmlns.com/foaf/0.1/name", "n": "4"}

-- Named-graph SPARQL — GRAPH ?g binds the graph IRI per match
SELECT pgrdf.add_graph(101::bigint, 'http://example.org/g1');
SELECT pgrdf.add_graph(102::bigint, 'http://example.org/g2');
SELECT * FROM pgrdf.sparql(
  'PREFIX foaf: <http://xmlns.com/foaf/0.1/>
   SELECT ?g (COUNT(*) AS ?n)
     WHERE { GRAPH ?g { ?s foaf:name ?n } }
   GROUP BY ?g ORDER BY ?g'
);
--  → {"g": "http://example.org/g1", "n": "3"}
--  → {"g": "http://example.org/g2", "n": "2"}

-- Inspect the parsed shape without executing
SELECT pgrdf.sparql_parse('SELECT ?s WHERE { ?s ?p ?o OPTIONAL { ?s <http://x/n> ?n } }');
--  → {"form": "SELECT", ..., "unsupported_algebra": ["LeftJoin (OPTIONAL)"]}

OWL 2 RL inference

-- Load an ontology + some assertions
SELECT pgrdf.add_graph(100);
SELECT pgrdf.parse_turtle('
@prefix ex:   <http://example.com/> .
@prefix rdf:  <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .
ex:Engineer rdfs:subClassOf ex:Person .
ex:Person   rdfs:subClassOf ex:Agent .
ex:alice    rdf:type        ex:Engineer .
', 100);

-- Materialize OWL 2 RL entailments. Idempotent — call as often as
-- you like; the prior is_inferred=TRUE rows are dropped first.
SELECT pgrdf.materialize(100);
--  → {"base_triples": 3, "inferred_triples_written": 11, ...}

-- The 2-hop entailment is now in the table:
SELECT * FROM pgrdf.sparql(
  'PREFIX rdf:  <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
   PREFIX ex:   <http://example.com/>
   SELECT ?c WHERE { ex:alice rdf:type ?c }'
);
--  → {"c": "http://example.com/Engineer"}   ← base
--  → {"c": "http://example.com/Person"}     ← inferred
--  → {"c": "http://example.com/Agent"}      ← inferred

See guide/03-querying.md for the full SELECT/ASK surface (BGPs with N patterns, FILTER expressions, solution modifiers, OPTIONAL / UNION / MINUS, aggregates with HAVING, BIND for projection, combining with regular SQL). For operator-facing observability — pgrdf.stats(), pgrdf.shmem_reset(), pgrdf.plan_cache_clear() — see docs/02-storage.md.

Quickstart for users

Full walkthrough lives under guide/. Five-minute path:

# 1. Boot stock postgres:17.4 with the extension files bind-mounted
just build-ext        # builds pgrdf.so/.control/.sql in a Linux container
just compose-up       # podman compose up -d
just psql             # opens a psql shell to the pgrdf database

# 2. Inside psql
pgrdf=# CREATE EXTENSION pgrdf;
pgrdf=# SELECT pgrdf.version();
        --  → 0.5.40   (whatever LATEST.md currently advertises)
pgrdf=# SELECT pgrdf.parse_turtle('@prefix ex: <http://e.com/> . ex:a ex:p ex:b .', 1);
        --  → 1

Required postgresql.conf changes

pgRDF MUST be in shared_preload_libraries for _PG_init() to run in the postmaster context. Without it, the extension's shared-memory atomics (dict cache + plan-cache stats) are never registered, and the first call to any pgRDF function panics with PgAtomic was not initialized.

# postgresql.conf
shared_preload_libraries = 'pgrdf'         # pgRDF alone
# or:
shared_preload_libraries = 'pgrdf,pgck'    # if pgCK is also installed
                                           # — order matters: pgrdf first

A server restart (not just a reload) is required after editing this — preload happens at postmaster startup. Verify after restart:

SHOW shared_preload_libraries;             -- must contain 'pgrdf'
SELECT pgrdf.parse_turtle(
  'PREFIX ex: <http://example.org/> ex:t a ex:T .', 1::bigint, 'http://example.org/');
                                           -- returns a row count, not a panic

The just compose-up Quickstart above bakes this into the bundled image; only own-Postgres installs need to edit postgresql.conf manually.

Want to integrate from your application?

Documentation

Two parallel doc tracks:

Use documentation — guide/

For people running pgRDF in their applications.

Engineering / build plan — docs/

For people working on pgRDF itself.

Authoritative specs

Tests

Layer What it gates Run
pgrx integration UDF correctness inside a managed PG just test
pg_regress-style UDF correctness over the wire to compose Postgres just test-regression
Artifact parity Mounted extension bytes match a fresh build and the live container just test-artifact-parity
W3C-shape SPARQL Per-test data.ttl + query.rq vs expected.jsonl just test-w3c
LUBM-shape LUBM-style correctness gates against a hand-authored fixture just test-lubm
Ontology smoke Real-world Turtle parses cleanly tests/perf/smoke-ontologies.sh
Narrow bar just test + just test-regression (back-compat shape) just test-all
Compose-based bar regression + W3C-shape + LUBM-shape just test-conformance
Full bar pgrx integration + test-conformance — the broadest sweep just test-everything
Cold-compose smoke Wipe compose, rebuild, re-up, run test-conformance just smoke-cold

just test-everything is the comprehensive entry point; just smoke-cold is the cold-compose verification (it now includes artifact-parity proof after rebuild, before the compose-based test bar). Use it after touching anything in compose/, fixtures/, or the test SQL fixtures.

Current bar — 274 pgrx + 85 pg_regress + 51 W3C-sparql + 25 W3C SHACL Core + 3 LUBM-shape green across the full pgrx PG 14-17 matrix and the compose-based regression runtime (PG 17). Covers:

  • Storage CRUD + Turtle / TriG / N-Quads ingest.
  • The full SPARQL 1.1 SELECT/ASK/CONSTRUCT/DESCRIBE surface (type-aware ORDER BY, multi-triple OPTIONAL, UNION, MINUS, VALUES, downstream BIND, aggregates incl. over UNION, HAVING, property paths).
  • SPARQL UPDATE (INSERT/DELETE DATA + WHERE, DELETE+INSERT, WITH scoping, lifecycle algebra).
  • Storage performance (shmem dict cache, prepared-plan cache, prepared bulk-INSERT).
  • OWL 2 RL + RDFS inference (pgrdf.materialize, owl-rl / rdfs profiles) + the materialize → SPARQL round-trip.
  • Genuine W3C SHACL Core validation (pgrdf.validate) — 25/25 SHACL Core conformance, emitting a W3C sh:ValidationReport JSONB; mode=>'sparql' shipped + honest, upstream-gated (ERRATA E-012).
  • Named-graph surface (LLD v0.4 §3) — _pgrdf_graphs system table + pg_extension_config_dump registration for pg_dump round-trip; the five-UDF surface (add_graph(id) / add_graph(iri) / add_graph(id, iri) / graph_id(iri) / graph_iri(id)); SPARQL GRAPH <iri> literal + GRAPH ?g variable forms with per-pattern scope composition over OPTIONAL / UNION / MINUS. Pg_regress fixtures 72-79 + 87, pgrx tests in src/storage/graphs.rs + src/query/executor.rs, W3C-shape fixtures 24-graph-named-iri / 25-graph-var-projection / 26-graph-var-groupby, and the tests/regression/scripts/pg-dump-roundtrip.sh shell-driven end-to-end round-trip.
  • Operator surface (pgrdf.stats() JSONB shape contract).
  • 7 negative regression signals locking the error-message contract for unsupported SPARQL shapes (80-unsupported-shapes.sql).
  • Error-path signals locking the stable error-prefix UDFs emit on invalid input (81-error-paths.sql); first lock-in: load_turtle: failed to open on a missing path.
  • Edge-case correctness signals (62-materialize-empty.sql → forward): pgrdf.materialize() on an empty graph returns base_triples = 0, non-negative inferred-count, and stays idempotent across two calls.

External smoke covers 24 well-known ontologies → 17,134 triples (W3C, Apache Jena, ValueFlows, ConceptKernel v3.7); runs via tests/perf/smoke-ontologies.sh. Per-ontology triple counts are locked in tests/perf/smoke-ontologies.expected.tsv; tests/perf/smoke-ontologies.sh --check re-runs the smoke and diffs against the lock-file (not gated in CI yet — the fetched payloads are gitignored). Workflow.ttl held out due to a non-RFC IRI in the source — see ERRATA E-007 / TEST.ONTOLOGY-SET.md.

License

Copyright 2026 Peter Styk. Licensed under the MIT License — see LICENSE for the canonical attribution.

Project home: https://github.com/styk-tv/pgRDF.

About

Rust-native PostgreSQL extension for RDF, SPARQL, SHACL and OWL reasoning. pgrx 0.16, PG 14-17.

Resources

License

Stars

Watchers

Forks

Packages

 
 
 

Contributors