Skip to content

[#749] fix: pin DuckLake extension via Docker bake + catalog preflight (incident 6)#750

Open
mlennie wants to merge 1 commit into
mainfrom
fix/ducklake-extension-version-pin
Open

[#749] fix: pin DuckLake extension via Docker bake + catalog preflight (incident 6)#750
mlennie wants to merge 1 commit into
mainfrom
fix/ducklake-extension-version-pin

Conversation

@mlennie
Copy link
Copy Markdown
Collaborator

@mlennie mlennie commented May 18, 2026

Summary

Issue: #749

Fixes incident item number 6 in slack https://credible-data.slack.com/archives/C085VT95H2T/p1778641006406909?thread_ts=1778630898.862899&cid=C085VT95H2T (the May 2026 DuckLake catalog version mismatch).

Pins the DuckLake extension at Docker build time via the duckdb@1.4.4 npm binding so all workers boot with the same extension binary. The bake lands at /root/.duckdb/extensions/v1.4.4/linux_arm64/, the
path the runtime server inspects (not the path the CLI installer in base-deps would use, which is for a different DuckDB engine version).

Also adds a postgres-side preflight in StorageManager.attachDuckLakeCatalog that reads ducklake_metadata.version directly via a plain TYPE postgres ATTACH (not DuckLake) and throws ConnectionAuthError
→ 422 when the catalog format is newer than the baked extension supports. Preflight failures (missing table, query timeout, connect failure) return undefined and fall through to the existing ATTACH path so
it's not load-bearing.

Why

In May 2026, prod's DuckLake catalog had been written at format version 1.0.0 while Publisher workers ran an older extension that only supported up to 0.3. The workaround was a manual catalog downgrade. Root
cause: nothing pinned the DuckLake extension binary on the worker side, so workers and the catalog could drift independently.

Verified:

  • DuckDB 1.4.4 ships the 0.3-line DuckLake (cross-referenced with the duckdb/ducklake release timeline; 1.0 requires DuckDB 1.5+).
  • Local docker build prints DuckLake baked: [{"extension_version":"3f1b372","install_path":"/root/.duckdb/extensions/v1.4.4/linux_arm64/..."}].
  • docker run --network none ... bun -e "...LOAD ducklake..." succeeds, proving the runtime uses the baked binary (not a network re-fetch).

Out of scope (separate PRs)

  • DuckLake 1.0 upgrade (bump duckdb to 1.5+, bump @malloydata/db-duckdb, add AUTOMATIC_MIGRATION=TRUE). Irreversible on prod data, deserves its own deliberate rollout.
  • README + public docs sync: deferred until PR docs: document PG_CONNECT_TIMEOUT_SECONDS env var #743 merges to avoid conflicts on the env-vars table.

Test plan

  • CI passes (typecheck, lint, prettier, 559 server tests).
  • CI's docker_smoke_test build log shows the new DuckLake baked: [...] line with v1.4.4 in the install path.

Pins the DuckLake extension binary at Docker build time. The bake runs
through the `duckdb@1.4.4` npm binding (not the CLI from base-deps,
which is a different DuckDB engine version), so the extension lands at
/root/.duckdb/extensions/v1.4.4/, the path the runtime server inspects
when calling INSTALL ducklake; LOAD ducklake;. Verified offline LOAD
in the built image with `docker run --network none`.

Adds a postgres-side preflight in StorageManager.attachDuckLakeCatalog
that reads ducklake_metadata.version directly (via plain postgres
ATTACH, not DuckLake) and throws ConnectionAuthError -> HTTP 422 when
the catalog format is newer than the baked extension supports. Reuses
the #741 ConnectionAuthError -> 422 mapping. Failures of the preflight
itself (table missing, query timeout, connect failure) return
undefined and fall through to the existing ATTACH path so the
preflight is not load-bearing for unrelated errors.

Addresses incident item #6. The deliberate DuckLake 1.0 upgrade
(duckdb -> 1.5+, @malloydata bump, AUTOMATIC_MIGRATION=TRUE) is a
separate follow-up PR.

README/public docs updates are intentionally deferred to a follow-up
PR after #743 (the PG_CONNECT_TIMEOUT_SECONDS docs) merges, to avoid
conflicts on the env-vars table.

Signed-off-by: Monty Lennie <montylennie@gmail.com>
Comment thread Dockerfile
# the extension at the path the runtime server uses, so its INSTALL
# ducklake; LOAD ducklake; at startup finds it and skips the network
# fetch (the actual pin).
RUN bun -e "const d=require('duckdb');const db=new d.Database(':memory:');await new Promise((r,j)=>db.exec('INSTALL ducklake; LOAD ducklake;',e=>e?j(e):r()));const rows=await new Promise((r,j)=>db.all(\"SELECT extension_version, install_path FROM duckdb_extensions() WHERE extension_name='ducklake'\",(e,x)=>e?j(e):r(x)));console.log('DuckLake baked:',JSON.stringify(rows));" || \
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we get rid of the install duckdb nodebindings then?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants