Skip to content

feat: Add support for DuckLake as a new warehouse type#23309

Open
AlphaJack wants to merge 12 commits into
lightdash:mainfrom
AlphaJack:main
Open

feat: Add support for DuckLake as a new warehouse type#23309
AlphaJack wants to merge 12 commits into
lightdash:mainfrom
AlphaJack:main

Conversation

@AlphaJack
Copy link
Copy Markdown

@AlphaJack AlphaJack commented May 20, 2026

Closes: #23310

Description:

Add first-class support for DuckLake as a new warehouse type which reuses the existing DuckdbWarehouseClient and the dbt-duckdb adapter under the hood.

Main features:

  • Supports every combination of metadata catalogs (DuckDB file, SQLite file, PostgreSQL) and Parquet store (local filesystem, S3-compatible, GCS, Azure). S3-compatible stores work with both path and vhost URL styles.
  • Connectivity to different catalog-store combinations was verified through Docker containers (MinIO for S3, fake-gcs-server for GCS, Azurite for Azure, lvh.me for vhost URL style)
  • DuckLake is attached on an in-memory DuckDB instance with ATTACH '…' (READ_ONLY)
  • ENCRYPTED was omitted as it is not needed for read-only access
  • Nested catalog/dataPath credentials bypass the top-level sensitiveCredentialsFieldNames filter, so I added
    stripDucklakeNestedSensitive() in common and called it after the flat strip in both ProjectModel.get and OrganizationWarehouseCredentialsModel.stripSensitiveCredentials
  • @duckdb/node-api was bumped to 1.5.2 to support DuckLake 1.0 format
  • dbt-duckdb was bumped to 1.10.0 to use the three named secrets connection pattern when generating the dbt profile

Not in scope:

  • MySQL / MariaDB as metadata catalogs: not recommended anymore
  • Quack client/server protocol: still in beta
  • Time travel in the UI: not implemented for any other warehouses either
  • Per-user credential overrides: the nested catalog/dataPath shape doesn't fit the existing per-user override model, TBD if requested.

Unrelated changes:

  • packages/backend/src/@types/express-account.d.ts: ts-node seed runner couldn't see the Express.Request.account augmentation declared inline in App.ts. It was lifted into a standalone ambient declaration.

AI disclosure:

  • Claude Opus was used for code generation

Screenshots:

DuckDB catalog + local filesystem store
image

PostgreSQL catalog + MinIO store
image

SQL generated by Lightdash
image

Generated SQL used in DuckDB UI
image

Underlying calog entry in PostgreSQL
image

Underlying data file in MinIO console
image

@socket-security
Copy link
Copy Markdown

socket-security Bot commented May 20, 2026

Review the following changes in direct dependencies. Learn more about Socket for GitHub.

Diff Package Supply Chain
Security
Vulnerability Quality Maintenance License
Added@​duckdb/​node-api@​1.5.2-r.210010010096100

View full report

@sdolidze
Copy link
Copy Markdown
Contributor

Hi @AlphaJack 👋, Thank you for the contribution!

I reviewed the PR, checked out locally and it indeed works with the DuckLake setup.

One issue I identified during testing is that cli fails when calling lightdash deploy --create. Error comes from convertDuckdbSchema which is responsible for parsing dbt profile into a lightdash config. Current parser only works on MotherDuck config.

I talked to the team on how to approach this PR. MotherDuck and DuckLake implementations are almost identical except for configuration differences. Our preference is to have one DuckDB warehouse, instead of two MotherDuck and DuckLake ones.

There are two ways we can approach this issue:

  • If you prefer, you can rework this PR to have one DuckDB warehouse with two configurations
  • I can raise a feature request that will go through our roadmap process

Tell me what you think and again, thanks for taking time and helping Lightdash improve!

AlphaJack added 11 commits May 21, 2026 22:51
Adds WarehouseTypes.DUCKLAKE with discriminated catalog (postgres / sqlite /
duckdb-file) and dataPath (s3 / gcs / azure / local) credential unions, plus
stripDucklakeNestedSensitive() for scrubbing nested secrets that the
top-level sensitiveCredentialsFieldNames filter can't reach.

Wires DUCKLAKE through the SQL dialect map (uses the DuckDB dialect) and
the legacy field-quote helper.
Constructor now accepts CreateDucklakeCredentials. Bootstrap emits the
three named secrets recommended by the DuckLake docs (catalog +
data-path + ducklake-typed wrapping secret) and runs
ATTACH 'ducklake:<secret>' AS <alias> (READ_ONLY) for postgres catalogs,
or an inline ATTACH for sqlite/duckdb-file catalogs. READ_ONLY is
unconditional — Lightdash never writes to the lake.

hardenInstance() gains an allowKnownExtensionAutoload flag (true only
in DuckLake mode) so the ducklake/postgres/httpfs/azure extensions
autoload at ATTACH time. The user-SQL allowlist (SELECT only) stays
intact.

Bumps @duckdb/node-api 1.4.4-r.1 -> 1.5.2-r.1 to match the DuckLake
catalog file format.

Wires DUCKLAKE into warehouseClientFromCredentials, sshTunnel (no-op),
and warehouseSqlBuilderFromType (normalises to the DuckDB adapter).

Adds unit tests for the bootstrap SQL ordering and a real-driver
integration test against an ephemeral sqlite+local-fs lake.
- buildDucklakeProfile in dbt/profiles.ts emits the named-secret pattern
  from the DuckLake access-control docs: postgres + s3/gcs/azure +
  ducklake-typed wrapping secret, with metadata_parameters as a dict
  that dbt-duckdb 1.10+ serialises into a DuckDB MAP literal. Credentials
  go through env_var indirection — nothing in the on-disk profiles.yml.
- Migration 20260519130000_add_ducklake_warehouse inserts 'ducklake' into
  warehouse_types so the warehouse_credentials FK accepts it.
- ProjectModel.get + OrganizationWarehouseCredentialsModel.stripSensitive
  call stripDucklakeNestedSensitive after the flat top-level strip to
  scrub secrets nested inside catalog/dataPath.
- ProjectModel.mergeMissingWarehouseSecrets merges nested missing
  catalog/dataPath secrets on update so the form can omit unchanged
  passwords.
- ProjectService.clearSecretsFromCredentials clears the DUCKLAKE branch.
- UserWarehouseCredentialsModel + AI SQL-prompt hints + WarehouseTypes
  switches all updated.
- @types/express-account.d.ts pulls the Express.Request.account
  augmentation into a standalone d.ts so the seed runner's ts-node
  picks it up without importing App.ts. Not DuckLake-specific; surfaced
  while running the dev seed.
- generated/routes.ts + generated/swagger.json regenerated via
  `pnpm -F backend generate-api`.
DucklakeForm renders a discriminated UI: catalog backend picker
(PostgreSQL / SQLite-file / DuckDB-file) -> conditional credential
fields; data path picker (S3 / GCS / Azure / local) -> conditional
credential fields. Registered in WarehouseSettingsForm + DbtSettingsForm
+ defaultValues + validators.

Adds a DuckLake tile in ProjectConnectFlow using the official ducklake
comb svg from ducklake.select.

Per-user credentials are filtered out of CreateCredentialsModal +
EditCredentialsModal + WarehouseFormInputs for DUCKLAKE — the nested
catalog/dataPath shape doesn't fit the existing per-user override
model. ProjectManagementPanel gets a DuckLake label.

MySQL is intentionally absent from the catalog options — DuckLake's
MySQL backend SIGSEGVs on reattach (reproduces on MariaDB too).
Adds dbt-duckdb 1.10.1 to the v1.10 + v1.11 venvs used by
setup-dbt-venvs.sh. No entry in v1.9 — dbt-duckdb 1.9.x can't emit
the wrapping ducklake-typed secret cleanly (it stringifies the MAP
literal, which DuckDB rejects). DuckLake projects in Lightdash
require dbt 1.10+.
Collapse duplicate "DuckLake nests secrets in catalog/dataPath" notes
to a single canonical docblock on stripDucklakeNestedSensitive, drop a
self-evident SQL-dialect comment, and remove the integration test that
was silently skipped without @duckdb/node-api — the unit tests in
DuckdbWarehouseClient.test.ts already cover the ATTACH wiring.
@AlphaJack
Copy link
Copy Markdown
Author

Hi @sdolidze, thank you for reviewing the PR in such a short time!

My rationale for a separate Warehouse type was altering as least as possible the existing MotherDuck implementation, but I totally share your concern about code duplication.

I've added a few commits to keep one DuckDB Warehouse with two possible configurations. I've tested these combinations:

  • regular motherduck database
  • catalog: duckdb file, storage: local filesystem
  • catalog: postgresql, storage: local filesystem
  • catalog: postgresql, storage: minio
  • catalog: sqlite file, storage: azurite
  • catalog: motherduck, storage: motherduck

Let me know what you think and if I can support further!

MotherDuck configuration:
image

DuckLake configuration:
image

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants