OpenDuck

OpenDuck makes DuckDB work like a cloud database without giving up its embedded-DB feel. You attach a remote database in one line — ATTACH 'openduck:mydb' — tables resolve transparently, a single query can split its work across your laptop and a remote worker, and storage underneath is layered, snapshot-based, and concurrency-safe. It's a DuckDB extension plus a small Rust gateway/worker speaking an open gRPC + Arrow IPC protocol, so you self-host the whole thing or plug your own backend in.

The architecture follows the path MotherDuck pioneered with differential storage, dual execution, and the md: attach scheme. OpenDuck reimplements those ideas as an open protocol and an open backend you can run yourself.

import duckdb

con = duckdb.connect(config={"allow_unsigned_extensions": "true"})
con.execute("LOAD '/path/to/openduck.duckdb_extension';")
con.execute("ATTACH 'openduck:mydb?endpoint=http://localhost:7878&token=xxx' AS cloud;")

con.sql("SELECT * FROM cloud.users").show()                    # remote, transparent
con.sql("SELECT * FROM local.t JOIN cloud.t2 ON ...").show()   # hybrid, one query

What OpenDuck does

Differential storage

Append-only layers with PostgreSQL metadata. DuckDB sees a normal file; OpenDuck persists data as immutable sealed layers addressable from object storage. Snapshots give you consistent reads. One serialized write path, many concurrent readers.

Hybrid (dual) execution

A single query can run partly on your machine and partly on a remote worker. The gateway splits the plan, labels each operator LOCAL or REMOTE, and inserts bridge operators at the boundaries. Only intermediate results cross the wire.

[LOCAL]  HashJoin(l.id = r.id)
  [LOCAL]  Scan(products)          ← your laptop
  [LOCAL]  Bridge(R→L)
    [REMOTE] Scan(sales)           ← remote worker

DuckDB-native catalog

The extension implements DuckDB's StorageExtension and Catalog interfaces. Remote tables are first-class catalog entries, they participate in JOINs, CTEs, and the optimizer like local tables.

Open protocol

OpenDuck's protocol is intentionally minimal and defined in execution.proto. The data plane is two RPCs: one to execute a query and stream Arrow IPC batches back, another to cancel a running execution. Two additional RPCs handle worker lifecycle (registration and heartbeat) so the gateway can route queries by database affinity and compute context.

Because the protocol is open and simple, you're not locked into a single backend. Any service that speaks gRPC and returns Arrow can serve as an OpenDuck-compatible backend. Run the included Rust gateway, replace it with your own implementation, or plug in an entirely different execution engine — the client and extension don't care what's on the other side.

Architecture

┌─────────────────────────────────────────────┐
│  DuckDB process (client)                    │
│                                             │
│  LOAD openduck                              │
│  ATTACH 'openduck:mydb' AS cloud            │
│                                             │
│  ┌─────────────────────────────────────┐    │
│  │ OpenDuckCatalog                     │    │
│  │  └─ OpenDuckSchemaEntry             │    │
│  │      └─ OpenDuckTableEntry (users)  │    │
│  │      └─ OpenDuckTableEntry (events) │    │
│  └──────────────┬──────────────────────┘    │
│                 │ gRPC + Arrow IPC          │
└─────────────────┼───────────────────────────┘
                  │
      ┌───────────▼───────────┐
      │  Gateway (Rust)       │
      │  - token auth         │
      │  - worker registry    │
      │  - affinity routing   │     ┌──────────────┐
      │  - plan splitting     │────▶│  Worker 1    │
      │  - backpressure       │◀────│  (DuckDB)    │
      │                       │     │  RegisterWorker
      │                       │     └──────────────┘
      │                       │     ┌──────────────┐
      │                       │────▶│  Worker N    │
      │                       │◀────│  (DuckDB)    │
      │                       │     │  Heartbeat   │
      └───────────────────────┘     └──────────────┘
              │
    ┌─────────┴─────────┐
    ▼                   ▼
┌──────────┐    ┌──────────────┐
│ Postgres │    │ Object store │
│ metadata │    │ sealed layers│
└──────────┘    └──────────────┘

Quick start

1. Build the backend

cargo build --workspace

2. Build the DuckDB extension

The openduck extension is not yet published to DuckDB's extension repository, so you need to build it from source. See extensions/openduck/README.md for full prerequisites (vcpkg, bison on macOS).

cd extensions/openduck && make

This produces the loadable binary at:

extensions/openduck/build/release/extension/openduck/openduck.duckdb_extension

3. Start the server

export OPENDUCK_TOKEN=your-token
cargo run -p openduck -- -d mydb --token your-token

4. Connect

Because the extension is unsigned, every DuckDB connection needs allow_unsigned_extensions enabled and an explicit LOAD with the full path to the built binary.

Python (DuckDB SDK directly):

import duckdb

con = duckdb.connect(config={"allow_unsigned_extensions": "true"})
con.execute("LOAD 'extensions/openduck/build/release/extension/openduck/openduck.duckdb_extension';")
con.execute("ATTACH 'openduck:mydb?endpoint=http://localhost:7878&token=your-token' AS cloud;")

con.sql("SELECT * FROM cloud.users LIMIT 10").show()

You can also set OPENDUCK_EXTENSION_PATH to avoid hard-coding the path:

export OPENDUCK_EXTENSION_PATH=extensions/openduck/build/release/extension/openduck/openduck.duckdb_extension

Python (openduck wrapper — auto-detects the local build):

pip install -e clients/python
export OPENDUCK_TOKEN=your-token

import openduck

con = openduck.connect("mydb")
con.sql("SELECT 1 AS x").show()

The wrapper finds the extension automatically from the build tree or OPENDUCK_EXTENSION_PATH.

CLI:

duckdb -unsigned -c "
  LOAD 'extensions/openduck/build/release/extension/openduck/openduck.duckdb_extension';
  ATTACH 'openduck:mydb?token=your-token' AS cloud;
  SELECT * FROM cloud.users LIMIT 10;
"

Rust:

use duckdb::Connection;

let conn = Connection::open_in_memory()?;
conn.execute_batch(r"
    SET allow_unsigned_extensions = true;
    LOAD 'extensions/openduck/build/release/extension/openduck/openduck.duckdb_extension';
    ATTACH 'openduck:mydb?endpoint=http://localhost:7878&token=xxx' AS cloud;
")?;

let mut stmt = conn.prepare("SELECT * FROM cloud.users LIMIT 10")?;

Note: Once the extension is published to the DuckDB extension repository, INSTALL openduck; LOAD openduck; will work without building from source or enabling unsigned extensions.

See examples/python/duckdb_sdk_ducklake.py for a comprehensive walkthrough including DuckLake integration and hybrid local+remote queries.

Layout

crates/
  exec-gateway/     Gateway — auth, worker registry, routing, hybrid plan splitting
  exec-worker/      Worker — embedded DuckDB, Arrow IPC streaming
  exec-proto/       Protobuf/tonic codegen + shared auth module
  openduck-cli/     Unified CLI (openduck [default]|gateway|worker|query|cancel|status|snapshot|gc)
  openduck-metrics/ OpenTelemetry metrics (optional OTLP exporter)
  diff-core/        Core types and StorageBackend trait
  diff-metadata/    Postgres metadata repo, GC, PgStorageBackend
  diff-layer-fs/    Append-only on-disk segment files
  diff-blob/        Sealed layer upload to S3-compatible object storage
  diff-bridge/      C ABI static library for the DuckDB extension
  diff-fuse/        Linux FUSE adapter over StorageBackend

extensions/
  openduck/         DuckDB C++ extension (StorageExtension + Catalog)

clients/
  python/           openduck Python package (pip install -e clients/python)

proto/
  openduck/v1/      Protocol definition (execution.proto)

OpenDuck vs MotherDuck

MotherDuck is a commercial cloud service. OpenDuck is an open-source project inspired by its architecture.

	MotherDuck	OpenDuck
What	Managed cloud service	Self-hosted open-source
Attach scheme	`md:`	`openduck:` / `od:`
Auth	`motherduck_token`	`OPENDUCK_TOKEN`
Differential storage	Proprietary	Open (Postgres metadata + object store)
Hybrid execution	Proprietary planner	Open (gateway + plan splitting)
Protocol	Private wire format	Open gRPC + Arrow IPC
Backend	MotherDuck's cloud	Anything implementing `ExecutionService`
Extension	Bundled in DuckDB	Separate loadable extension

OpenDuck is not wire-compatible with MotherDuck. It reimplements the same architectural ideas as an open protocol.

OpenDuck vs Arrow Flight SQL

Arrow Flight SQL is a generic database protocol — "JDBC/ODBC over Arrow." OpenDuck is a DuckDB-specific system with a narrower scope but deeper integration.

	Arrow Flight SQL	OpenDuck
Scope	Any SQL database	DuckDB-specific
Integration	Separate client driver	DuckDB StorageExtension + Catalog
Catalog	Server-side (`GetTables`, etc.)	Extension-side (DuckDB catalog entries)
Execution	Full query on server	Hybrid — split across local and remote
Protocol surface	~15 RPCs	4 RPCs (2 data plane + 2 worker lifecycle)
Plan format	SQL only	SQL (M2), structured plan IR (M3)
Optimizer	Client-side, unaware	DuckDB optimizer sees remote tables natively

OpenDuck vs DuckLake

OpenDuck doesn't replace DuckLake — you use them together. They operate at different layers entirely.

DuckLake is a lakehouse catalog: it manages tables as Parquet files in object storage with transactional metadata in Postgres (or SQLite/DuckDB). It decides where data lives and how tables are organized.

OpenDuck is a storage and execution layer for DuckDB's own compute engine. It provides differential storage (append-only layers with snapshot isolation), hybrid query execution (split a single query across local and remote), and transparent remote attach (ATTACH 'openduck:mydb').

If you're using DuckLake but still fall back to a .duckdb file for things DuckLake doesn't support yet (e.g. indexes, full-text search, or workloads that need DuckDB-native storage), OpenDuck makes that file concurrency-safe with snapshot isolation. And when you want to query a DuckLake catalog running on a remote server, OpenDuck is the transport — a worker backed by DuckLake serves queries over gRPC, and clients attach via the openduck extension without knowing or caring what the backend storage is.

	DuckLake	OpenDuck
Layer	Catalog (table → Parquet in S3)	Storage + execution (DuckDB file I/O, gRPC)
What it manages	Table metadata, Parquet data files	DuckDB pages, layers, snapshots
Concurrency	Parquet files are immutable	Snapshot isolation on `.duckdb` files
Remote access	Not built-in	`ATTACH 'openduck:...'` + hybrid execution
Together	DuckLake catalog on a remote worker → OpenDuck streams results to the client

Documentation

Full docs live in docs/:

Overview — what OpenDuck is, problems it solves, comparisons.
Architecture — components, protocol, data flow, security model.
Configuration — every CLI flag, env var, TOML key, and DuckDB secret.
Guides:
- Getting started — clone → build → first query.
- Python client — the openduck package API and patterns.
- DuckDB extension — LOAD, ATTACH, URI format, secrets, table functions.
- Differential storage — append-only layers, snapshots, the three storage modes.
- Hybrid execution — --hybrid, openduck_run, plan splitting.
- Snapshots and garbage collection — sealing, point-in-time reads, retention.
- Deployment — single-process, multi-worker, Docker, observability.
- Troubleshooting — common errors and fixes.

Acknowledgments

OpenDuck's architecture draws heavily from MotherDuck's published work on differential storage, dual execution, and cloud-native DuckDB. Credit to the MotherDuck team for pioneering these ideas.

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 50 Commits
.github/workflows		.github/workflows
clients/python		clients/python
crates		crates
docker		docker
docs		docs
examples		examples
extensions		extensions
presentation		presentation
proto/openduck/v1		proto/openduck/v1
scripts		scripts
.DS_Store		.DS_Store
.gitignore		.gitignore
.gitmodules		.gitmodules
CHANGELOG.md		CHANGELOG.md
CONTRIBUTING.md		CONTRIBUTING.md
Cargo.lock		Cargo.lock
Cargo.toml		Cargo.toml
LICENSE		LICENSE
README.md		README.md
build_extension.sh		build_extension.sh
mydb		mydb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

OpenDuck

What OpenDuck does

Differential storage

Hybrid (dual) execution

DuckDB-native catalog

Open protocol

Architecture

Quick start

1. Build the backend

2. Build the DuckDB extension

3. Start the server

4. Connect

Layout

OpenDuck vs MotherDuck

OpenDuck vs Arrow Flight SQL

OpenDuck vs DuckLake

Documentation

Acknowledgments

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

OpenDuck

What OpenDuck does

Differential storage

Hybrid (dual) execution

DuckDB-native catalog

Open protocol

Architecture

Quick start

1. Build the backend

2. Build the DuckDB extension

3. Start the server

4. Connect

Layout

OpenDuck vs MotherDuck

OpenDuck vs Arrow Flight SQL

OpenDuck vs DuckLake

Documentation

Acknowledgments

License

About

Resources

License

Contributing

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages