feat: Add support for DuckLake as a new warehouse type#23309
feat: Add support for DuckLake as a new warehouse type#23309AlphaJack wants to merge 12 commits into
Conversation
|
Review the following changes in direct dependencies. Learn more about Socket for GitHub.
|
|
Hi @AlphaJack 👋, Thank you for the contribution! I reviewed the PR, checked out locally and it indeed works with the DuckLake setup. One issue I identified during testing is that cli fails when calling I talked to the team on how to approach this PR. MotherDuck and DuckLake implementations are almost identical except for configuration differences. Our preference is to have one DuckDB warehouse, instead of two MotherDuck and DuckLake ones. There are two ways we can approach this issue:
Tell me what you think and again, thanks for taking time and helping Lightdash improve! |
Adds WarehouseTypes.DUCKLAKE with discriminated catalog (postgres / sqlite / duckdb-file) and dataPath (s3 / gcs / azure / local) credential unions, plus stripDucklakeNestedSensitive() for scrubbing nested secrets that the top-level sensitiveCredentialsFieldNames filter can't reach. Wires DUCKLAKE through the SQL dialect map (uses the DuckDB dialect) and the legacy field-quote helper.
Constructor now accepts CreateDucklakeCredentials. Bootstrap emits the three named secrets recommended by the DuckLake docs (catalog + data-path + ducklake-typed wrapping secret) and runs ATTACH 'ducklake:<secret>' AS <alias> (READ_ONLY) for postgres catalogs, or an inline ATTACH for sqlite/duckdb-file catalogs. READ_ONLY is unconditional — Lightdash never writes to the lake. hardenInstance() gains an allowKnownExtensionAutoload flag (true only in DuckLake mode) so the ducklake/postgres/httpfs/azure extensions autoload at ATTACH time. The user-SQL allowlist (SELECT only) stays intact. Bumps @duckdb/node-api 1.4.4-r.1 -> 1.5.2-r.1 to match the DuckLake catalog file format. Wires DUCKLAKE into warehouseClientFromCredentials, sshTunnel (no-op), and warehouseSqlBuilderFromType (normalises to the DuckDB adapter). Adds unit tests for the bootstrap SQL ordering and a real-driver integration test against an ephemeral sqlite+local-fs lake.
- buildDucklakeProfile in dbt/profiles.ts emits the named-secret pattern from the DuckLake access-control docs: postgres + s3/gcs/azure + ducklake-typed wrapping secret, with metadata_parameters as a dict that dbt-duckdb 1.10+ serialises into a DuckDB MAP literal. Credentials go through env_var indirection — nothing in the on-disk profiles.yml. - Migration 20260519130000_add_ducklake_warehouse inserts 'ducklake' into warehouse_types so the warehouse_credentials FK accepts it. - ProjectModel.get + OrganizationWarehouseCredentialsModel.stripSensitive call stripDucklakeNestedSensitive after the flat top-level strip to scrub secrets nested inside catalog/dataPath. - ProjectModel.mergeMissingWarehouseSecrets merges nested missing catalog/dataPath secrets on update so the form can omit unchanged passwords. - ProjectService.clearSecretsFromCredentials clears the DUCKLAKE branch. - UserWarehouseCredentialsModel + AI SQL-prompt hints + WarehouseTypes switches all updated. - @types/express-account.d.ts pulls the Express.Request.account augmentation into a standalone d.ts so the seed runner's ts-node picks it up without importing App.ts. Not DuckLake-specific; surfaced while running the dev seed. - generated/routes.ts + generated/swagger.json regenerated via `pnpm -F backend generate-api`.
DucklakeForm renders a discriminated UI: catalog backend picker (PostgreSQL / SQLite-file / DuckDB-file) -> conditional credential fields; data path picker (S3 / GCS / Azure / local) -> conditional credential fields. Registered in WarehouseSettingsForm + DbtSettingsForm + defaultValues + validators. Adds a DuckLake tile in ProjectConnectFlow using the official ducklake comb svg from ducklake.select. Per-user credentials are filtered out of CreateCredentialsModal + EditCredentialsModal + WarehouseFormInputs for DUCKLAKE — the nested catalog/dataPath shape doesn't fit the existing per-user override model. ProjectManagementPanel gets a DuckLake label. MySQL is intentionally absent from the catalog options — DuckLake's MySQL backend SIGSEGVs on reattach (reproduces on MariaDB too).
Adds dbt-duckdb 1.10.1 to the v1.10 + v1.11 venvs used by setup-dbt-venvs.sh. No entry in v1.9 — dbt-duckdb 1.9.x can't emit the wrapping ducklake-typed secret cleanly (it stringifies the MAP literal, which DuckDB rejects). DuckLake projects in Lightdash require dbt 1.10+.
Collapse duplicate "DuckLake nests secrets in catalog/dataPath" notes to a single canonical docblock on stripDucklakeNestedSensitive, drop a self-evident SQL-dialect comment, and remove the integration test that was silently skipped without @duckdb/node-api — the unit tests in DuckdbWarehouseClient.test.ts already cover the ATTACH wiring.
…m DuckDB instance
|
Hi @sdolidze, thank you for reviewing the PR in such a short time! My rationale for a separate Warehouse type was altering as least as possible the existing MotherDuck implementation, but I totally share your concern about code duplication. I've added a few commits to keep one DuckDB Warehouse with two possible configurations. I've tested these combinations:
Let me know what you think and if I can support further! |


Closes: #23310
Description:
Add first-class support for DuckLake as a new warehouse type which reuses the existing
DuckdbWarehouseClientand thedbt-duckdbadapter under the hood.Main features:
ATTACH '…' (READ_ONLY)ENCRYPTEDwas omitted as it is not needed for read-only accesscatalog/dataPathcredentials bypass the top-levelsensitiveCredentialsFieldNamesfilter, so I addedstripDucklakeNestedSensitive()incommonand called it after the flat strip in bothProjectModel.getandOrganizationWarehouseCredentialsModel.stripSensitiveCredentials@duckdb/node-apiwas bumped to 1.5.2 to support DuckLake 1.0 formatdbt-duckdbwas bumped to 1.10.0 to use the three named secrets connection pattern when generating the dbt profileNot in scope:
catalog/dataPathshape doesn't fit the existing per-user override model, TBD if requested.Unrelated changes:
packages/backend/src/@types/express-account.d.ts:ts-nodeseed runner couldn't see theExpress.Request.accountaugmentation declared inline inApp.ts. It was lifted into a standalone ambient declaration.AI disclosure:
Screenshots:
DuckDB catalog + local filesystem store

PostgreSQL catalog + MinIO store

SQL generated by Lightdash

Generated SQL used in DuckDB UI

Underlying calog entry in PostgreSQL

Underlying data file in MinIO console
