dlt version
1.24.0
Describe the problem
The three dlt internal metadata tables (_dlt_version, _dlt_pipeline_state, _dlt_loads) are created on ClickHouse with MergeTree ORDER BY tuple() — no primary key, no ordering, no deduplication. Combined with dlt's append-only write pattern (rows are inserted but never deleted), these tables grow linearly with the number of pipeline runs.
dlt queries these tables at the start of every pipeline run but only needs the latest row. Without an ORDER BY key, ClickHouse performs a full table scan on every read.
In the ClickHouse destination (clickhouse.py), table engine selection skips ReplacingMergeTree for tables with write_disposition: "skip" (which covers _dlt_version and _dlt_loads). All three tables end up as plain MergeTree with no column hints for sort or primary_key, resulting in ORDER BY tuple().
The queries themselves are efficient (ORDER BY inserted_at DESC LIMIT 1), but without an index on the underlying table, ClickHouse must scan every row to satisfy the sort.
Expected behavior
Internal metadata tables should use proper ORDER BY keys matching dlt's query patterns so reads are index lookups, not full table scans. Superseded rows should be cleaned up after writes so tables stay bounded regardless of how many pipeline runs occur.
| Table |
Suggested ORDER BY |
Query Pattern |
_dlt_version |
(schema_name, inserted_at) |
WHERE schema_name = ? ORDER BY inserted_at DESC |
_dlt_pipeline_state |
(pipeline_name, _dlt_load_id) |
WHERE pipeline_name = ? ORDER BY load_id DESC LIMIT 1 |
_dlt_loads |
(load_id) |
Joined on load_id |
Steps to reproduce
- Create a dlt pipeline with the ClickHouse destination
- Run the pipeline repeatedly (the more runs, the more pronounced the issue)
- Inspect the internal tables:
SHOW CREATE TABLE <dataset>._dlt_version;
-- Observe: ENGINE = MergeTree ORDER BY tuple()
SELECT count() FROM <dataset>._dlt_version;
-- Row count grows with every pipeline run
SELECT formatReadableSize(sum(length(schema))) FROM <dataset>._dlt_version;
-- Total size of schema JSON blobs grows unbounded
- Check
system.query_log to observe full table scans on every pipeline run:
SELECT
normalized_query_hash,
count() AS executions,
formatReadableSize(sum(read_bytes)) AS total_read,
left(any(query), 200) AS sample_query
FROM system.query_log
WHERE type = 'QueryFinish'
AND query LIKE '%_dlt_version%'
GROUP BY normalized_query_hash
ORDER BY sum(read_bytes) DESC
LIMIT 5
The issue is not immediately visible on small deployments. It becomes significant when pipelines run frequently (hundreds to thousands of runs/day) over weeks/months, as each run appends rows that all subsequent runs must scan.
Operating system
Linux
Runtime environment
Kubernetes
Python version
3.13
dlt data source
Affects all sources
dlt destination
No response
Other deployment details
No response
Additional information
Current workaround is to run a daily job to prune metadata tables
dlt version
1.24.0
Describe the problem
The three dlt internal metadata tables (
_dlt_version,_dlt_pipeline_state,_dlt_loads) are created on ClickHouse withMergeTree ORDER BY tuple()— no primary key, no ordering, no deduplication. Combined with dlt's append-only write pattern (rows are inserted but never deleted), these tables grow linearly with the number of pipeline runs.dlt queries these tables at the start of every pipeline run but only needs the latest row. Without an ORDER BY key, ClickHouse performs a full table scan on every read.
In the ClickHouse destination (
clickhouse.py), table engine selection skipsReplacingMergeTreefor tables withwrite_disposition: "skip"(which covers_dlt_versionand_dlt_loads). All three tables end up as plainMergeTreewith no column hints forsortorprimary_key, resulting inORDER BY tuple().The queries themselves are efficient (
ORDER BY inserted_at DESC LIMIT 1), but without an index on the underlying table, ClickHouse must scan every row to satisfy the sort.Expected behavior
Internal metadata tables should use proper
ORDER BYkeys matching dlt's query patterns so reads are index lookups, not full table scans. Superseded rows should be cleaned up after writes so tables stay bounded regardless of how many pipeline runs occur._dlt_version(schema_name, inserted_at)WHERE schema_name = ? ORDER BY inserted_at DESC_dlt_pipeline_state(pipeline_name, _dlt_load_id)WHERE pipeline_name = ? ORDER BY load_id DESC LIMIT 1_dlt_loads(load_id)load_idSteps to reproduce
system.query_logto observe full table scans on every pipeline run:The issue is not immediately visible on small deployments. It becomes significant when pipelines run frequently (hundreds to thousands of runs/day) over weeks/months, as each run appends rows that all subsequent runs must scan.
Operating system
Linux
Runtime environment
Kubernetes
Python version
3.13
dlt data source
Affects all sources
dlt destination
No response
Other deployment details
No response
Additional information
Current workaround is to run a daily job to prune metadata tables