Conversation
Deploying with
|
| Status | Name | Latest Commit | Preview URL | Updated (UTC) |
|---|---|---|---|---|
| ✅ Deployment successful! View logs |
docs | 68ddbab | Commit Preview URL Branch Preview URL |
Apr 12 2026, 05:35 PM |
rudolfix
left a comment
There was a problem hiding this comment.
pls take a look at the original issue: we want to unify SQLGlot schema, not dlt schemas - many reasons for that, one of those being able to support foreign schemas with the same code. ping me for details
| union_all_expr: Optional[sge.Query] = None | ||
|
|
||
| for table_name in selected_tables: | ||
| counts_expr = build_row_counts_expr( |
There was a problem hiding this comment.
note: this reimplements filtering by _dlt_load_id available on the relation
| @@ -505,12 +591,12 @@ | |||
| def _get_latest_load_id(dataset: dlt.Dataset) -> Optional[str]: | |||
rudolfix
left a comment
There was a problem hiding this comment.
looks pretty good! but code could be simpler. we could also use unify_schemas when generating sqlglot schema
…t, lazy all-schema resolution via pipeline state, per-schema load_ids, remove Dataset.from_pipeline()
1bb04a2 to
27ad7b5
Compare
…stination. row_counts() collects tables from all schemas, but WithTableScanners only resolved tables against the default schema causing table not found errors
rudolfix
left a comment
There was a problem hiding this comment.
good find with WithTableScanners supporting just one schema! I suggest some improvements with instance check. I just looked at the test:
- no test with schemas that have overlapping tables which are not identical (_dlt_tables are) but have non conflicting columns (different names or same types)
- a test that shows column conflict that actually prevents schema unification (ie. same column name, different data type - that should fail)
- schemas with different naming convention ie. sql naming convention case sensitive and insensitive. I'd like to let
unify_schemato work with different naming conventions. for now test should expectunify_schemato fail
| """Tells if a view for a table `table_schema` can be created""" | ||
| pass | ||
|
|
||
| def set_schemas(self, schemas: Sequence[Schema]) -> None: |
There was a problem hiding this comment.
good find, I forgot about table scanners. I wanted to make it a mixin class (not to derive from duckdb) then adding set_schemas to it would be trivial
There was a problem hiding this comment.
Could you clarify if mixing refactor of WithTableScanners be part of this PR or a separate one? The coupling to duckdb is deep now.
There was a problem hiding this comment.
not this PR, this is old tech debt not to be fixed now
…g tables, compat conflict, naming convention conflict
…lt into feat/3746-multischema-datasets
rudolfix
left a comment
There was a problem hiding this comment.
there was still a problem with filesystem sql_client that I discovered by running test with conflicting user tables on it
- we need to UNION views that point to separate data locations
- we need to merge columns in views on the same data location
this is now done. create_view just returns sql - this allows manipulation of SQL to handle cases above.
I promoted a few tests so they run on all destinations incl. lance and delta/iceberg. this surfaced ibis() creation problem (schemas were not passed there). that I fixed
I think ticket is complete. docs still remain:
docs/website/docs/general-usage/dataset-access/dataset.md
should explain how we deal with multiple schemas in dataset. it could be NOTE admonition - we do not recommend having such datasets.
| """Tells if a view for a table `table_schema` can be created""" | ||
| pass | ||
|
|
||
| def set_schemas(self, schemas: Sequence[Schema]) -> None: |
There was a problem hiding this comment.
not this PR, this is old tech debt not to be fixed now
…lt into feat/3746-multischema-datasets
Closes #3746