Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5 changes: 5 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -24,9 +24,14 @@ The supported method of passing ClickHouse server settings is to prefix such arg
## UNRELEASED

### New Features
- SQLAlchemy: Comprehensive ClickHouse JOIN support via the new `ch_join()` helper. All strictness modifiers (`ALL`, `ANY`, `SEMI`, `ANTI`, `ASOF`), the `GLOBAL` distribution modifier, and explicit `CROSS JOIN` are now available. Use with `select_from()` to generate ClickHouse-specific join syntax like `GLOBAL ALL LEFT OUTER JOIN`. Closes [#635](https://github.com/ClickHouse/clickhouse-connect/issues/635)
- SQLAlchemy: `array_join()` now supports multiple columns for parallel array expansion. Pass a list of columns and a matching list of aliases to generate `ARRAY JOIN col1 AS a, col2 AS b, col3 AS c`. Single-column usage is unchanged. Closes [#633](https://github.com/ClickHouse/clickhouse-connect/issues/633)
- SQLAlchemy: `ch_join()` now supports `USING` syntax via the new `using` parameter. Pass a list of column name strings to generate `USING (col1, col2)` instead of `ON`. This is important for `FULL OUTER JOIN` where `USING` merges the join column correctly while `ON` produces default values (0, '') for unmatched sides. Closes [#636](https://github.com/ClickHouse/clickhouse-connect/issues/636)
- SQLAlchemy: Add missing Replicated table engine variants: `ReplicatedReplacingMergeTree`, `ReplicatedCollapsingMergeTree`, `ReplicatedVersionedCollapsingMergeTree`, and `ReplicatedGraphiteMergeTree`. Closes [#687](https://github.com/ClickHouse/clickhouse-connect/issues/687)

### Bug Fixes
- SQLAlchemy: Fix `.final()` and `.sample()` silently overwriting each other when chained. Both methods now store modifiers as custom attributes on the `Select` instance and render them during compilation, replacing the previous `with_hint()` approach that only allowed one hint per table. Chaining in either order (e.g. `select(t).final().sample(0.1)`) correctly produces `FROM t FINAL SAMPLE 0.1`. Also fixes rendering for aliased tables (`FROM t AS u FINAL`) and supports explicit table targeting in joins. Fixes [#658](https://github.com/ClickHouse/clickhouse-connect/issues/658)
- SQLAlchemy: Fix `sqlalchemy.values()` to generate ClickHouse's `VALUES` table function syntax. The compiler now emits `VALUES('col1 Type1, col2 Type2', ...)` with the column structure as the first argument, instead of the standard SQL form that places column names after the alias. Generic SQLAlchemy types are mapped to ClickHouse equivalents (e.g. `Integer` to `Int32`, `String` to `String`). Also handles CTE usage by wrapping in `SELECT * FROM VALUES(...)`. Fixes [#681](https://github.com/ClickHouse/clickhouse-connect/issues/681)
- SQLAlchemy: Fix `GraphiteMergeTree` and `ReplicatedGraphiteMergeTree` to properly single-quote the `config_section` argument as ClickHouse requires.

## 0.14.1, 2026-03-11
Expand Down
17 changes: 12 additions & 5 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -32,16 +32,23 @@ When creating a Superset Data Source, either use the provided connection dialog,
### SQLAlchemy Implementation

ClickHouse Connect includes a lightweight SQLAlchemy dialect implementation focused on compatibility with **Superset**
and **SQLAlchemy Core**.
and **SQLAlchemy Core**. Both SQLAlchemy 1.4 and 2.x are supported. SQLAlchemy 1.4 compatibility is maintained
because Apache Superset currently requires `sqlalchemy>=1.4,<2`.

Supported features include:
- Basic query execution via SQLAlchemy Core
- `SELECT` queries with `JOIN`s, `ARRAY JOIN`, and `FINAL` modifier
- `SELECT` queries with `JOIN`s (including ClickHouse-specific strictness, `USING`, and `GLOBAL` modifiers),
`ARRAY JOIN` (single and multi-column), `FINAL`, and `SAMPLE`
- `VALUES` table function syntax
- Lightweight `DELETE` statements

The implementation does not include ORM support and is not intended as a full SQLAlchemy dialect. While it can support
a range of Core-based applications beyond Superset, it may not be suitable for more complex SQLAlchemy applications
that rely on full ORM or advanced dialect functionality.
A small number of features require SQLAlchemy 2.x: `Values.cte()` and certain literal-rendering behaviors.
All other dialect features, including those used by Superset, work on both 1.4 and 2.x.

Basic ORM usage works for insert-heavy, read-focused workloads: declarative model definitions, `CREATE TABLE`,
`session.add()`, `bulk_save_objects()`, and read queries all function correctly. However, full ORM support is not
provided. UPDATE compilation, foreign key/relationship reflection, autoincrement/RETURNING, and cascade operations
are not implemented. The dialect is best suited for SQLAlchemy Core usage and Superset connectivity.

### Asyncio Support

Expand Down
6 changes: 3 additions & 3 deletions clickhouse_connect/cc_sqlalchemy/__init__.py
Original file line number Diff line number Diff line change
@@ -1,10 +1,10 @@
from clickhouse_connect import driver_name
from clickhouse_connect.cc_sqlalchemy.datatypes.base import schema_types
from clickhouse_connect.cc_sqlalchemy.sql import final
from clickhouse_connect.cc_sqlalchemy.sql.clauses import array_join, ArrayJoin
from clickhouse_connect.cc_sqlalchemy.sql import final, sample
from clickhouse_connect.cc_sqlalchemy.sql.clauses import array_join, ArrayJoin, ch_join, ClickHouseJoin

# pylint: disable=invalid-name
dialect_name = driver_name
ischema_names = schema_types

__all__ = ['dialect_name', 'ischema_names', 'array_join', 'ArrayJoin', 'final']
__all__ = ['dialect_name', 'ischema_names', 'array_join', 'ArrayJoin', 'ch_join', 'ClickHouseJoin', 'final', 'sample']
113 changes: 64 additions & 49 deletions clickhouse_connect/cc_sqlalchemy/sql/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,10 @@

from clickhouse_connect.driver.binding import quote_identifier

# Dialect name used for non-rendering statement hints that only serve to
# differentiate cache keys when FINAL/SAMPLE modifiers are applied.
_CH_MODIFIER_DIALECT = "_ch_modifier"


def full_table(table_name: str, schema: Optional[str] = None) -> str:
if table_name.startswith('(') or '.' in table_name or not schema:
Expand All @@ -16,38 +20,61 @@ def format_table(table: Table):
return full_table(table.name, table.schema)


def final(select_stmt: Select, table: Optional[FromClause] = None) -> Select:
"""
Apply the ClickHouse FINAL modifier to a select statement.

Args:
select_stmt: The SQLAlchemy Select statement to modify.
table: Optional explicit table/alias to apply FINAL to. When omitted the
method will use the single FROM element present on the select. A
ValueError is raised if the statement has no FROMs or more than one
FROM element and table is not provided.

Returns:
A new Select that renders the FINAL modifier for the target table.
"""
def _resolve_target(select_stmt: Select, table: Optional[FromClause], method_name: str) -> FromClause:
"""Resolve the target FROM clause for ClickHouse modifiers (FINAL/SAMPLE)."""
if not isinstance(select_stmt, Select):
raise TypeError("final() expects a SQLAlchemy Select instance")
raise TypeError(f"{method_name}() expects a SQLAlchemy Select instance")

target = table
if target is None:
froms = select_stmt.get_final_froms()
if not froms:
raise ValueError("final() requires a table to apply the FINAL modifier.")
raise ValueError(f"{method_name}() requires a table to apply the {method_name.upper()} modifier.")
if len(froms) > 1:
raise ValueError(
"final() is ambiguous for statements with multiple FROM clauses. Specify the table explicitly."
f"{method_name}() is ambiguous for statements with multiple FROM clauses. "
"Specify the table explicitly."
)
target = froms[0]

if not isinstance(target, FromClause):
raise TypeError("table must be a SQLAlchemy FromClause when provided")

return select_stmt.with_hint(target, "FINAL")
return target


def _target_cache_key(target: FromClause) -> str:
"""Stable string identifying a FROM target for cache key differentiation."""
if hasattr(target, "fullname"):
return target.fullname
return target.name


# pylint: disable=protected-access
def final(select_stmt: Select, table: Optional[FromClause] = None) -> Select:
"""Apply the ClickHouse FINAL modifier to a select statement.

FINAL forces ClickHouse to merge data parts before returning results,
guaranteeing fully collapsed rows for ReplacingMergeTree, CollapsingMergeTree,
and similar engines.

Args:
select_stmt: The SELECT statement to modify.
table: The target table to apply FINAL to. Required when the query
joins multiple tables, optional when there is a single FROM target.
"""
target = _resolve_target(select_stmt, table, "final")
ch_final = getattr(select_stmt, "_ch_final", set())

if target in ch_final:
return select_stmt

# with_statement_hint creates a generative copy and adds a non-rendering
# hint that participates in the statement cache key.
hint_key = _target_cache_key(target)
new_stmt = select_stmt.with_statement_hint(f"FINAL:{hint_key}", dialect_name=_CH_MODIFIER_DIALECT)
new_stmt._ch_final = ch_final | {target}
return new_stmt


def _select_final(self: Select, table: Optional[FromClause] = None) -> Select:
Expand All @@ -58,39 +85,27 @@ def _select_final(self: Select, table: Optional[FromClause] = None) -> Select:


def sample(select_stmt: Select, sample_value: Union[str, int, float], table: Optional[FromClause] = None) -> Select:
"""
Apply ClickHouse SAMPLE clause to a select statement.
Reference: https://clickhouse.com/docs/sql-reference/statements/select/sample
"""Apply the ClickHouse SAMPLE modifier to a select statement.

Args:
select_stmt: The SQLAlchemy Select statement to modify.
sample_value: Controls the sampling behavior. Accepts three forms:
- A float in (0, 1) for proportional sampling (e.g., 0.1 for ~10% of data).
- A positive integer for row-count sampling (e.g., 10000000 for ~10M rows).
- A string for fraction or offset notation (e.g., "1/10" or "1/10 OFFSET 1/2").
table: Optional explicit table to apply SAMPLE to. When omitted the
method will use the single FROM element present on the select. A
ValueError is raised if the statement has no FROMs or more than one
FROM element and table is not provided.

Returns:
A new Select that renders the SAMPLE clause for the target table.
select_stmt: The SELECT statement to modify.
sample_value: The sample expression. Can be a float between 0 and 1
for a fractional sample (e.g. 0.1 for 10%), an integer for an
approximate row count, or a string for SAMPLE expressions like
'1/10 OFFSET 1/2'.
table: The target table to sample. Required when the query joins
multiple tables, optional when there is a single FROM target.
"""
if not isinstance(select_stmt, Select):
raise TypeError("sample() expects a SQLAlchemy Select instance")

target_table = table
if target_table is None:
froms = select_stmt.get_final_froms()
if not froms:
raise ValueError("sample() requires a FROM clause to apply the SAMPLE modifier.")
if len(froms) > 1:
raise ValueError("sample() is ambiguous for statements with multiple FROM clauses. Specify the table explicitly.")
target_table = froms[0]

if not isinstance(target_table, FromClause):
raise TypeError("table must be a SQLAlchemy FromClause when provided")

return select_stmt.with_hint(target_table, f"SAMPLE {sample_value}")
target = _resolve_target(select_stmt, table, "sample")

hint_key = _target_cache_key(target)
new_stmt = select_stmt.with_statement_hint(
f"SAMPLE:{hint_key}:{sample_value}", dialect_name=_CH_MODIFIER_DIALECT
)
ch_sample = dict(getattr(select_stmt, "_ch_sample", {}))
ch_sample[target] = sample_value
new_stmt._ch_sample = ch_sample
return new_stmt


def _select_sample(self: Select, sample_value: Union[str, int, float], table: Optional[FromClause] = None) -> Select:
Expand Down
Loading
Loading