Skip to content

Feature Clickhouse connection#317

Open
poupou-web3 wants to merge 4 commits intogetnao:mainfrom
poupou-web3:feature/clickhouse_connection
Open

Feature Clickhouse connection#317
poupou-web3 wants to merge 4 commits intogetnao:mainfrom
poupou-web3:feature/clickhouse_connection

Conversation

@poupou-web3
Copy link

Add ClickHouse support

Summary

  • Added ClickHouse database support to the CLI (ClickHouseConfig, DatabaseType.CLICKHOUSE) using ibis.clickhouse and wired it into AnyDatabaseConfig/parsing.
  • Introduced indexes.md generation via a new DatabaseAccessor.INDEXES and indexes.md.j2 template so agents can see ORDER BY, PRIMARY KEY, and PARTITION BY when available for clickhouse only.
  • Improved table selection/preview logic for ClickHouse, including support for AggregateFunction columns and safer handling of engines that disallow direct SELECT (e.g. Kafka, stream-like engines).

Details

ClickHouse context

  • ClickHouseDatabaseContext builds preview queries from the table schema, selecting plain columns directly and using *-Merge (e.g. uniqMerge) for AggregateFunction columns.
  • Added dependancie to ibis-clickhouse
  • For engines that raise code 620 ("Direct select is not allowed"), the context automatically switches to a no-SELECT path, using SHOW CREATE TABLE + system.columns so sync still completes.
  • indexes() returns full DDL (SHOW CREATE TABLE) used by indexes.md so the agent knows how tables are ordered/partitioned for efficient querying.
  • System tables are not sync.

Generic improvements

  • DatabaseConfig.raw_sql_to_dataframe now understands multiple cursor/result shapes, including ClickHouse's result_rows/column_names.
  • New DatabaseContext.indexes() hook powers indexes.md while keeping non-ClickHouse databases unchanged (they simply don't emit indexes docs by default).

Testing

Added ClickHouse integration tests (test_clickhouse.py + clickhouse.sql) covering:

  • Directory structure and markdown outputs (including indexes.md).
  • Engine coverage: MergeTree, SummingMergeTree, ReplacingMergeTree, AggregatingMergeTree, Kafka engine, materialized views, and dictionaries.
  • Include/exclude filters and sync state.

Added docker-compose.test.yml and expanded .env.example for Postgres, ClickHouse, and MSSQL so integration tests can be run locally with:

docker compose -f docker-compose.test.yml up -d
cd cli && cp tests/nao_core/commands/sync/integration/.env.example tests/nao_core/commands/sync/integration/.env
uv run pytest tests/nao_core/commands/sync/integration/test_clickhouse.py -v

Copy link
Contributor

@cubic-dev-ai cubic-dev-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

4 issues found across 16 files

Prompt for AI agents (all issues)

Check if these issues are valid — if so, understand the root cause of each and fix them. If appropriate, use sub-agents to investigate and fix each issue separately.


<file name="cli/nao_core/templates/defaults/databases/indexes.md.j2">

<violation number="1" location="cli/nao_core/templates/defaults/databases/indexes.md.j2:16">
P2: **Duplicate database query:** `db.indexes()` is called twice — once in the `{% if %}` guard and once to render the content. Since the ClickHouse implementation executes `SHOW CREATE TABLE` via `raw_sql` on each call (no caching), this doubles the number of queries per table during sync. Use Jinja2's `{% set %}` to call it once and reuse the result.</violation>
</file>

<file name="cli/nao_core/config/databases/clickhouse.py">

<violation number="1" location="cli/nao_core/config/databases/clickhouse.py:228">
P1: Missing `GROUP BY` clause when `AggregateFunction` columns are present. The `-Merge` combinators (e.g. `uniqMerge`, `sumMerge`) are aggregate functions. When mixed with non-aggregated columns in SELECT, ClickHouse requires a `GROUP BY` on all non-aggregated columns; otherwise the query will fail. Consider detecting when aggregate expressions are present and adding `GROUP BY` over the plain columns.</violation>
</file>

<file name=".env.example">

<violation number="1" location=".env.example:53">
P2: Port/secure mismatch: `CLICKHOUSE_PORT=8443` is the HTTPS port but `CLICKHOUSE_SECURE=false`. For a non-secure connection the default HTTP port should be `8123` (consistent with the integration test `.env.example` and code defaults).</violation>
</file>

<file name="cli/nao_core/config/databases/base.py">

<violation number="1" location="cli/nao_core/config/databases/base.py:87">
P2: `hasattr(cursor, "description")` is `True` even when `cursor.description is None` (DB-API 2.0 sets it to `None` for non-returning statements). This will raise `TypeError` when iterating over `None`. Check that `description` is not `None`.</violation>
</file>

Since this is your first cubic review, here's how it works:

  • cubic automatically reviews your code and comments on bugs and improvements
  • Teach cubic by replying to its comments. cubic learns from your replies and gets better over time
  • Add one-off context when rerunning by tagging @cubic-dev-ai with guidance or docs links (including llms.txt)
  • Ask questions if you need clarification on any suggestion

Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review.

@Bl3f
Copy link
Contributor

Bl3f commented Feb 22, 2026

Hello @poupou-web3 thank you so much for the Clickhouse addition! I will do some tests, run the integration tests once, do a small review and it should be merged asap

Copy link
Contributor

@Bl3f Bl3f left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Small question and change around indexes.md, but apart from it LGTM. Thank you so much integrations tests worked well!

Welcome as a new contributor ✨

Copy link
Contributor

@cubic-dev-ai cubic-dev-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

1 issue found across 5 files (changes from recent commits).

Prompt for AI agents (unresolved issues)

Check if these issues are valid — if so, understand the root cause of each and fix them. If appropriate, use sub-agents to investigate and fix each issue separately.


<file name="cli/nao_core/config/databases/clickhouse.py">

<violation number="1" location="cli/nao_core/config/databases/clickhouse.py:199">
P1: Bug: `columns()` returns `[]` when `_direct_select_disallowed` is True, but `column_count()` still returns the actual count from `system.columns` for the same condition. This means stream-like engines (Kafka, RabbitMQ, FileLog) will have no column metadata in generated docs, and `column_count` vs `columns` will be inconsistent. The previous code correctly fell back to `_columns_from_system()` here.</violation>
</file>

Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review.

@poupou-web3
Copy link
Author

I realized that the index.md is also a duplicate of the columns.md, and we could just put everything in columns.md.

@Bl3f
Copy link
Contributor

Bl3f commented Feb 26, 2026

@poupou-web3 yes, it was a bit was I tried to say. Could you change it in the PR pls?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants