Multischema datasets by burnash · Pull Request #3770 · dlt-hub/dlt

burnash · 2026-03-23T13:12:21Z

cloudflare-workers-and-pages · 2026-03-23T13:18:54Z

Deploying with Cloudflare Workers

The latest updates on your project. Learn more about integrating Git with Workers.

Status	Name	Latest Commit	Preview URL	Updated (UTC)
✅ Deployment successful! View logs	docs	`68ddbab`	Commit Preview URL Branch Preview URL	Apr 12 2026, 05:35 PM

rudolfix

pls take a look at the original issue: we want to unify SQLGlot schema, not dlt schemas - many reasons for that, one of those being able to support foreign schemas with the same code. ping me for details

rudolfix · 2026-03-24T21:23:48Z

        union_all_expr: Optional[sge.Query] = None

        for table_name in selected_tables:
            counts_expr = build_row_counts_expr(


note: this reimplements filtering by _dlt_load_id available on the relation

rudolfix · 2026-03-24T21:45:25Z

@@ -505,12 +591,12 @@
 def _get_latest_load_id(dataset: dlt.Dataset) -> Optional[str]:


rudolfix

looks pretty good! but code could be simpler. we could also use unify_schemas when generating sqlglot schema

…t, lazy all-schema resolution via pipeline state, per-schema load_ids, remove Dataset.from_pipeline()

…e_sqlglot_schema

…stination. row_counts() collects tables from all schemas, but WithTableScanners only resolved tables against the default schema causing table not found errors

rudolfix

good find with WithTableScanners supporting just one schema! I suggest some improvements with instance check. I just looked at the test:

no test with schemas that have overlapping tables which are not identical (_dlt_tables are) but have non conflicting columns (different names or same types)
a test that shows column conflict that actually prevents schema unification (ie. same column name, different data type - that should fail)
schemas with different naming convention ie. sql naming convention case sensitive and insensitive. I'd like to let unify_schema to work with different naming conventions. for now test should expect unify_schema to fail

rudolfix · 2026-04-06T22:19:42Z

        """Tells if a view for a table `table_schema` can be created"""
        pass

+    def set_schemas(self, schemas: Sequence[Schema]) -> None:


good find, I forgot about table scanners. I wanted to make it a mixin class (not to derive from duckdb) then adding set_schemas to it would be trivial

Could you clarify if mixing refactor of WithTableScanners be part of this PR or a separate one? The coupling to duckdb is deep now.

not this PR, this is old tech debt not to be fixed now

…g tables, compat conflict, naming convention conflict

…a-datasets

…multiple schemas

…lt into feat/3746-multischema-datasets

rudolfix

there was still a problem with filesystem sql_client that I discovered by running test with conflicting user tables on it

we need to UNION views that point to separate data locations
we need to merge columns in views on the same data location
this is now done. create_view just returns sql - this allows manipulation of SQL to handle cases above.
I promoted a few tests so they run on all destinations incl. lance and delta/iceberg. this surfaced ibis() creation problem (schemas were not passed there). that I fixed

I think ticket is complete. docs still remain:
docs/website/docs/general-usage/dataset-access/dataset.md
should explain how we deal with multiple schemas in dataset. it could be NOTE admonition - we do not recommend having such datasets.

rudolfix · 2026-04-10T08:08:05Z

        """Tells if a view for a table `table_schema` can be created"""
        pass

+    def set_schemas(self, schemas: Sequence[Schema]) -> None:


not this PR, this is old tech debt not to be fixed now

rudolfix

LGTM!

…lt into feat/3746-multischema-datasets

rudolfix

LGTM!

rudolfix added the breaking This issue introduces breaking change label Mar 23, 2026

rudolfix reviewed Mar 23, 2026

View reviewed changes

Comment thread dlt/dataset/dataset.py Outdated

rudolfix requested changes Mar 24, 2026

View reviewed changes

rudolfix requested changes Mar 30, 2026

View reviewed changes

Comment thread dlt/dataset/dataset.py Outdated

Comment thread dlt/dataset/dataset.py Outdated

Comment thread dlt/dataset/dataset.py Outdated

Comment thread dlt/dataset/dataset.py

Comment thread dlt/dataset/lineage.py

Comment thread dlt/dataset/lineage.py Outdated

burnash added 10 commits April 1, 2026 13:58

add multiple schemas to Dataset and Pipeline

c63bc34

add tests for multi-schema support

df4f716

refactor multiple schema storage in Dataset

1716d27

Refactor Dataset: dict-based sqlglot schema for foreign schema suppor…

8cb853f

…t, lazy all-schema resolution via pipeline state, per-schema load_ids, remove Dataset.from_pipeline()

Fix lint

19a0aea

Refactor Dataset init for lazy resolution; use unify_schemas in creat…

c784279

…e_sqlglot_schema

Fix schema type check

f330f29

Fix extend_list_deduplicated not deduping within the incoming iterable

102adfd

Use extend_list_deduplicated for table name collection

93bb743

Update Dataset string repr to better output multi schema + fix tests

27ad7b5

burnash force-pushed the feat/3746-multischema-datasets branch from 1bb04a2 to 27ad7b5 Compare April 1, 2026 11:59

burnash added 2 commits April 2, 2026 15:00

Fix multi-schema view creation in WithTableScanners for filesystem de…

ebec948

…stination. row_counts() collects tables from all schemas, but WithTableScanners only resolved tables against the default schema causing table not found errors

Fix using wrong capabilities in prepare_load_table

0fcc87d

rudolfix requested changes Apr 6, 2026

View reviewed changes

burnash added 4 commits April 7, 2026 11:55

Replace SupportsMultiSchema with WithSchemas mixin

3b1c7b3

ensure type compatibility in Schema.unify_schemas + tests: overlappin…

428e378

…g tables, compat conflict, naming convention conflict

Declare schemas attribute on WithSchemas ABC

0a00d0b

More dataset tests

f13ec05

burnash requested a review from rudolfix April 8, 2026 19:46

burnash and others added 6 commits April 9, 2026 15:42

Merge remote-tracking branch 'origin/devel' into feat/3746-multischem…

f70513d

…a-datasets

Merge remote-tracking branch 'origin/devel' into feat/3746-multischem…

27ae4cb

…a-datasets

allows passing multiple schemas to ibis backend

5d43cd9

allows filesystem sql client to unify or union overlapping tables in …

7618d44

…multiple schemas

Merge branch 'feat/3746-multischema-datasets' of github.com:dlt-hub/d…

f40d12c

…lt into feat/3746-multischema-datasets

uses internal conn to create lance views

71382fe

rudolfix requested changes Apr 10, 2026

View reviewed changes

rudolfix and others added 3 commits April 10, 2026 14:09

adds missing tests

6d7d272

Adds docs for multi-schema datasets + breaking change notes

cd2f54d

Fix broken create_view call in filesystem sql client test

eb92a9e

rudolfix previously approved these changes Apr 12, 2026

View reviewed changes

rudolfix added 2 commits April 12, 2026 19:26

fixes multi dataset parquet format

4971af2

Merge branch 'feat/3746-multischema-datasets' of github.com:dlt-hub/d…

68ddbab

…lt into feat/3746-multischema-datasets

rudolfix dismissed their stale review via 68ddbab April 12, 2026 17:27

rudolfix approved these changes Apr 12, 2026

View reviewed changes

rudolfix merged commit 6c91afc into devel Apr 12, 2026
75 of 76 checks passed

rudolfix deleted the feat/3746-multischema-datasets branch April 12, 2026 19:22

		@@ -505,12 +591,12 @@
		def _get_latest_load_id(dataset: dlt.Dataset) -> Optional[str]:

Conversation

burnash commented Mar 23, 2026

Uh oh!

cloudflare-workers-and-pages bot commented Mar 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Deploying with Cloudflare Workers

Uh oh!

rudolfix left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

rudolfix Mar 24, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

rudolfix Mar 24, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

rudolfix left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

rudolfix left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

rudolfix Apr 6, 2026

Choose a reason for hiding this comment

Uh oh!

burnash Apr 7, 2026

Choose a reason for hiding this comment

Uh oh!

rudolfix Apr 10, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

rudolfix left a comment

Choose a reason for hiding this comment

Uh oh!

rudolfix Apr 10, 2026

Choose a reason for hiding this comment

Uh oh!

rudolfix left a comment

Choose a reason for hiding this comment

Uh oh!

rudolfix left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

cloudflare-workers-and-pages bot commented Mar 23, 2026 •

edited

Loading