OpenDuck makes DuckDB work like a cloud database without giving up its embedded-DB feel. You attach a remote database in one line — ATTACH 'openduck:mydb' — tables resolve transparently, a single query can split its work across your laptop and a remote worker, and storage underneath is layered, snapshot-based, and concurrency-safe. It's a DuckDB extension plus a small Rust gateway/worker speaking an open gRPC + Arrow IPC protocol, so you self-host the whole thing or plug your own backend in.
The architecture follows the path MotherDuck pioneered with differential storage, dual execution, and the md: attach scheme. OpenDuck reimplements those ideas as an open protocol and an open backend you can run yourself.
import duckdb
con = duckdb.connect(config={"allow_unsigned_extensions": "true"})
con.execute("LOAD '/path/to/openduck.duckdb_extension';")
con.execute("ATTACH 'openduck:mydb?endpoint=http://localhost:7878&token=xxx' AS cloud;")
con.sql("SELECT * FROM cloud.users").show() # remote, transparent
con.sql("SELECT * FROM local.t JOIN cloud.t2 ON ...").show() # hybrid, one queryAppend-only layers with PostgreSQL metadata. DuckDB sees a normal file; OpenDuck persists data as immutable sealed layers addressable from object storage. Snapshots give you consistent reads. One serialized write path, many concurrent readers.
A single query can run partly on your machine and partly on a remote worker. The gateway splits the plan, labels each operator LOCAL or REMOTE, and inserts bridge operators at the boundaries. Only intermediate results cross the wire.
[LOCAL] HashJoin(l.id = r.id)
[LOCAL] Scan(products) ← your laptop
[LOCAL] Bridge(R→L)
[REMOTE] Scan(sales) ← remote worker
The extension implements DuckDB's StorageExtension and Catalog interfaces. Remote tables are first-class catalog entries, they participate in JOINs, CTEs, and the optimizer like local tables.
OpenDuck's protocol is intentionally minimal and defined in execution.proto. The data plane is two RPCs: one to execute a query and stream Arrow IPC batches back, another to cancel a running execution. Two additional RPCs handle worker lifecycle (registration and heartbeat) so the gateway can route queries by database affinity and compute context.
Because the protocol is open and simple, you're not locked into a single backend. Any service that speaks gRPC and returns Arrow can serve as an OpenDuck-compatible backend. Run the included Rust gateway, replace it with your own implementation, or plug in an entirely different execution engine — the client and extension don't care what's on the other side.
┌─────────────────────────────────────────────┐
│ DuckDB process (client) │
│ │
│ LOAD openduck │
│ ATTACH 'openduck:mydb' AS cloud │
│ │
│ ┌─────────────────────────────────────┐ │
│ │ OpenDuckCatalog │ │
│ │ └─ OpenDuckSchemaEntry │ │
│ │ └─ OpenDuckTableEntry (users) │ │
│ │ └─ OpenDuckTableEntry (events) │ │
│ └──────────────┬──────────────────────┘ │
│ │ gRPC + Arrow IPC │
└─────────────────┼───────────────────────────┘
│
┌───────────▼───────────┐
│ Gateway (Rust) │
│ - token auth │
│ - worker registry │
│ - affinity routing │ ┌──────────────┐
│ - plan splitting │────▶│ Worker 1 │
│ - backpressure │◀────│ (DuckDB) │
│ │ │ RegisterWorker
│ │ └──────────────┘
│ │ ┌──────────────┐
│ │────▶│ Worker N │
│ │◀────│ (DuckDB) │
│ │ │ Heartbeat │
└───────────────────────┘ └──────────────┘
│
┌─────────┴─────────┐
▼ ▼
┌──────────┐ ┌──────────────┐
│ Postgres │ │ Object store │
│ metadata │ │ sealed layers│
└──────────┘ └──────────────┘
cargo build --workspaceThe openduck extension is not yet published to DuckDB's extension repository, so you need to build it from source. See extensions/openduck/README.md for full prerequisites (vcpkg, bison on macOS).
cd extensions/openduck && makeThis produces the loadable binary at:
extensions/openduck/build/release/extension/openduck/openduck.duckdb_extension
export OPENDUCK_TOKEN=your-token
cargo run -p openduck -- -d mydb --token your-tokenBecause the extension is unsigned, every DuckDB connection needs allow_unsigned_extensions enabled and an explicit LOAD with the full path to the built binary.
Python (DuckDB SDK directly):
import duckdb
con = duckdb.connect(config={"allow_unsigned_extensions": "true"})
con.execute("LOAD 'extensions/openduck/build/release/extension/openduck/openduck.duckdb_extension';")
con.execute("ATTACH 'openduck:mydb?endpoint=http://localhost:7878&token=your-token' AS cloud;")
con.sql("SELECT * FROM cloud.users LIMIT 10").show()You can also set OPENDUCK_EXTENSION_PATH to avoid hard-coding the path:
export OPENDUCK_EXTENSION_PATH=extensions/openduck/build/release/extension/openduck/openduck.duckdb_extensionPython (openduck wrapper — auto-detects the local build):
pip install -e clients/python
export OPENDUCK_TOKEN=your-tokenimport openduck
con = openduck.connect("mydb")
con.sql("SELECT 1 AS x").show()The wrapper finds the extension automatically from the build tree or OPENDUCK_EXTENSION_PATH.
CLI:
duckdb -unsigned -c "
LOAD 'extensions/openduck/build/release/extension/openduck/openduck.duckdb_extension';
ATTACH 'openduck:mydb?token=your-token' AS cloud;
SELECT * FROM cloud.users LIMIT 10;
"Rust:
use duckdb::Connection;
let conn = Connection::open_in_memory()?;
conn.execute_batch(r"
SET allow_unsigned_extensions = true;
LOAD 'extensions/openduck/build/release/extension/openduck/openduck.duckdb_extension';
ATTACH 'openduck:mydb?endpoint=http://localhost:7878&token=xxx' AS cloud;
")?;
let mut stmt = conn.prepare("SELECT * FROM cloud.users LIMIT 10")?;Note: Once the extension is published to the DuckDB extension repository,
INSTALL openduck; LOAD openduck;will work without building from source or enabling unsigned extensions.
See examples/python/duckdb_sdk_ducklake.py for a comprehensive walkthrough including DuckLake integration and hybrid local+remote queries.
crates/
exec-gateway/ Gateway — auth, worker registry, routing, hybrid plan splitting
exec-worker/ Worker — embedded DuckDB, Arrow IPC streaming
exec-proto/ Protobuf/tonic codegen + shared auth module
openduck-cli/ Unified CLI (openduck [default]|gateway|worker|query|cancel|status|snapshot|gc)
openduck-metrics/ OpenTelemetry metrics (optional OTLP exporter)
diff-core/ Core types and StorageBackend trait
diff-metadata/ Postgres metadata repo, GC, PgStorageBackend
diff-layer-fs/ Append-only on-disk segment files
diff-blob/ Sealed layer upload to S3-compatible object storage
diff-bridge/ C ABI static library for the DuckDB extension
diff-fuse/ Linux FUSE adapter over StorageBackend
extensions/
openduck/ DuckDB C++ extension (StorageExtension + Catalog)
clients/
python/ openduck Python package (pip install -e clients/python)
proto/
openduck/v1/ Protocol definition (execution.proto)
MotherDuck is a commercial cloud service. OpenDuck is an open-source project inspired by its architecture.
| MotherDuck | OpenDuck | |
|---|---|---|
| What | Managed cloud service | Self-hosted open-source |
| Attach scheme | md: |
openduck: / od: |
| Auth | motherduck_token |
OPENDUCK_TOKEN |
| Differential storage | Proprietary | Open (Postgres metadata + object store) |
| Hybrid execution | Proprietary planner | Open (gateway + plan splitting) |
| Protocol | Private wire format | Open gRPC + Arrow IPC |
| Backend | MotherDuck's cloud | Anything implementing ExecutionService |
| Extension | Bundled in DuckDB | Separate loadable extension |
OpenDuck is not wire-compatible with MotherDuck. It reimplements the same architectural ideas as an open protocol.
Arrow Flight SQL is a generic database protocol — "JDBC/ODBC over Arrow." OpenDuck is a DuckDB-specific system with a narrower scope but deeper integration.
| Arrow Flight SQL | OpenDuck | |
|---|---|---|
| Scope | Any SQL database | DuckDB-specific |
| Integration | Separate client driver | DuckDB StorageExtension + Catalog |
| Catalog | Server-side (GetTables, etc.) |
Extension-side (DuckDB catalog entries) |
| Execution | Full query on server | Hybrid — split across local and remote |
| Protocol surface | ~15 RPCs | 4 RPCs (2 data plane + 2 worker lifecycle) |
| Plan format | SQL only | SQL (M2), structured plan IR (M3) |
| Optimizer | Client-side, unaware | DuckDB optimizer sees remote tables natively |
OpenDuck doesn't replace DuckLake — you use them together. They operate at different layers entirely.
DuckLake is a lakehouse catalog: it manages tables as Parquet files in object storage with transactional metadata in Postgres (or SQLite/DuckDB). It decides where data lives and how tables are organized.
OpenDuck is a storage and execution layer for DuckDB's own compute engine. It provides differential storage (append-only layers with snapshot isolation), hybrid query execution (split a single query across local and remote), and transparent remote attach (ATTACH 'openduck:mydb').
If you're using DuckLake but still fall back to a .duckdb file for things DuckLake doesn't support yet (e.g. indexes, full-text search, or workloads that need DuckDB-native storage), OpenDuck makes that file concurrency-safe with snapshot isolation. And when you want to query a DuckLake catalog running on a remote server, OpenDuck is the transport — a worker backed by DuckLake serves queries over gRPC, and clients attach via the openduck extension without knowing or caring what the backend storage is.
| DuckLake | OpenDuck | |
|---|---|---|
| Layer | Catalog (table → Parquet in S3) | Storage + execution (DuckDB file I/O, gRPC) |
| What it manages | Table metadata, Parquet data files | DuckDB pages, layers, snapshots |
| Concurrency | Parquet files are immutable | Snapshot isolation on .duckdb files |
| Remote access | Not built-in | ATTACH 'openduck:...' + hybrid execution |
| Together | DuckLake catalog on a remote worker → OpenDuck streams results to the client |
Full docs live in docs/:
- Overview — what OpenDuck is, problems it solves, comparisons.
- Architecture — components, protocol, data flow, security model.
- Configuration — every CLI flag, env var, TOML key, and DuckDB secret.
- Guides:
- Getting started — clone → build → first query.
- Python client — the
openduckpackage API and patterns. - DuckDB extension —
LOAD,ATTACH, URI format, secrets, table functions. - Differential storage — append-only layers, snapshots, the three storage modes.
- Hybrid execution —
--hybrid,openduck_run, plan splitting. - Snapshots and garbage collection — sealing, point-in-time reads, retention.
- Deployment — single-process, multi-worker, Docker, observability.
- Troubleshooting — common errors and fixes.
OpenDuck's architecture draws heavily from MotherDuck's published work on differential storage, dual execution, and cloud-native DuckDB. Credit to the MotherDuck team for pioneering these ideas.
MIT