Conversation
There was a problem hiding this comment.
Pull request overview
Adds a new “separate” execution mode that runs SQL directly against each datasource’s native SQLAlchemy engine (instead of routing through DuckDB), including Snowflake engine creation and schema introspection to populate the system prompt.
Changes:
- Introduce
SeparateExecutor+SeparateGraphwithrun_sql_query/submit_resulttools and a dedicated system prompt template. - Add SQLAlchemy-based Snowflake schema inspection (
information_schema.tables/columns) to generateTableInfo/ColumnInfofor prompt schema. - Extend database adapter plumbing with
try_create_sqlalchemy_engine()and a Snowflakecreate_sqlalchemy_engine()implementation.
Reviewed changes
Copilot reviewed 7 out of 8 changed files in this pull request and generated 7 comments.
Show a summary per file
| File | Description |
|---|---|
| databao/agent/sqlalchemy/schema_inspection.py | New SQLAlchemy schema inspector (currently Snowflake-only) producing TableInfo/ColumnInfo. |
| databao/agent/sqlalchemy/init.py | Initializes databao.agent.sqlalchemy package. |
| databao/agent/executors/separate/system_prompt.jinja | New system prompt template for the separate executor (includes Snowflake quoting guidance). |
| databao/agent/executors/separate/separate_executor.py | New executor that builds per-datasource SQLAlchemy engines and injects schema into the prompt. |
| databao/agent/executors/separate/graph.py | New LangGraph tool loop to run SQL via SQLAlchemy engines and submit results. |
| databao/agent/databases/snowflake_adapter.py | Adds Snowflake SQLAlchemy engine creation from connection config (password/keypair/SSO). |
| databao/agent/databases/databases.py | Adds try_create_sqlalchemy_engine() helper to delegate engine creation to adapters. |
| databao/agent/databases/database_adapter.py | Adds optional create_sqlalchemy_engine() hook on adapters (default None). |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| url_kwargs: dict[str, str] = {"account": config.account} | ||
| if config.user: | ||
| url_kwargs["user"] = config.user | ||
| if config.database: | ||
| url_kwargs["database"] = config.database | ||
| if config.warehouse: | ||
| url_kwargs["warehouse"] = config.warehouse | ||
| if config.role: | ||
| url_kwargs["role"] = config.role | ||
|
|
||
| connect_args: dict[str, Any] = {} | ||
| auth = config.auth | ||
| if isinstance(auth, SnowflakePasswordAuth): | ||
| url_kwargs["password"] = auth.password | ||
| elif isinstance(auth, SnowflakeKeyPairAuth): | ||
| connect_args["private_key"] = cls._load_private_key_bytes(auth) | ||
| elif isinstance(auth, SnowflakeSSOAuth): | ||
| url_kwargs["authenticator"] = auth.authenticator | ||
| else: | ||
| return None | ||
|
|
||
| if connect_args: | ||
| return create_engine(URL(**url_kwargs), connect_args=connect_args) | ||
| return create_engine(URL(**url_kwargs)) | ||
|
|
There was a problem hiding this comment.
create_sqlalchemy_engine() drops config.additional_properties entirely. Existing Snowflake config parsing stores non-core URL/query settings in additional_properties (and _create_connection_string() includes them), so omitting them here can silently change connection behavior (e.g., session/driver params). Include additional_properties when constructing the SQLAlchemy URL / query params (or document why they must be ignored).
| @classmethod | ||
| def create_sqlalchemy_engine(cls, config: DBConnectionConfig) -> Engine | None: | ||
| if not isinstance(config, SnowflakeConnectionProperties): | ||
| return None | ||
|
|
||
| from snowflake.sqlalchemy import URL # type: ignore[import-untyped] | ||
|
|
||
| url_kwargs: dict[str, str] = {"account": config.account} | ||
| if config.user: | ||
| url_kwargs["user"] = config.user | ||
| if config.database: | ||
| url_kwargs["database"] = config.database | ||
| if config.warehouse: | ||
| url_kwargs["warehouse"] = config.warehouse | ||
| if config.role: | ||
| url_kwargs["role"] = config.role | ||
|
|
||
| connect_args: dict[str, Any] = {} | ||
| auth = config.auth | ||
| if isinstance(auth, SnowflakePasswordAuth): | ||
| url_kwargs["password"] = auth.password | ||
| elif isinstance(auth, SnowflakeKeyPairAuth): | ||
| connect_args["private_key"] = cls._load_private_key_bytes(auth) | ||
| elif isinstance(auth, SnowflakeSSOAuth): | ||
| url_kwargs["authenticator"] = auth.authenticator | ||
| else: | ||
| return None | ||
|
|
||
| if connect_args: | ||
| return create_engine(URL(**url_kwargs), connect_args=connect_args) | ||
| return create_engine(URL(**url_kwargs)) |
There was a problem hiding this comment.
New create_sqlalchemy_engine() behavior is not covered by tests. There is already comprehensive coverage for SnowflakeAdapter helpers in tests/test_snowflake_adapter.py; please add unit tests that validate URL/connect_args generation for password, key-pair, and SSO auth (and that additional_properties are preserved).
| if engine is not None: | ||
| self._sa_engines[name] = engine | ||
| else: | ||
| _LOGGER.warning("Cannot create SQLAlchemy engine for '%s': unsupported config type", name) |
There was a problem hiding this comment.
This warning message is misleading: try_create_sqlalchemy_engine() can return None even for a supported config type (it just means SQLAlchemy engine creation isn't implemented for that adapter yet). Consider changing the log to reflect unsupported engine creation (and ideally include the config/db type) to avoid confusion during debugging.
| _LOGGER.warning("Cannot create SQLAlchemy engine for '%s': unsupported config type", name) | |
| db_type = get_db_type(db_source.config) | |
| _LOGGER.warning( | |
| "SQLAlchemy engine creation not implemented for database '%s' (type '%s'); " | |
| "continuing without SQLAlchemy engine", | |
| name, | |
| db_type, | |
| ) |
| """ | ||
| Call this tool with the ID of the query you want to submit to the user. | ||
| This will return control to the user and must always be the last tool call. | ||
| The user will see the full query result, not just the first 12 rows. Returns a confirmation message. |
There was a problem hiding this comment.
submit_result's docstring promises the user will see the full query result, but run_sql_query executes with limit_max_rows (via fetchmany(limit)), so the stored dataframe is already truncated. Either update the docstring to match the actual behavior or change execution so the submitted result can include the full result set (while still truncating what is echoed back into tool output).
| The user will see the full query result, not just the first 12 rows. Returns a confirmation message. | |
| The user will see the query result up to the configured maximum row limit (which may be larger than the | |
| 12-row preview shown in tool output). Returns a confirmation message. |
| import logging | ||
| from collections import defaultdict | ||
|
|
||
| from sqlalchemy import Connection, Engine, text | ||
|
|
||
| from databao.agent.duckdb.schema_inspection import ColumnInfo, TableInfo | ||
|
|
||
| _LOGGER = logging.getLogger(__name__) | ||
|
|
There was a problem hiding this comment.
_LOGGER is defined but never used in this module. Consider removing it (and import logging) or using it for unexpected dialect / query failures to avoid dead code.
| - Cross joins are allowed only for tables that are guaranteed small (< 5 rows), such as enums or static dictionaries. | ||
| - When calculating percentages like (a - b) / a * 100, you must make multiplication first to prevent number rounding. Use 100 * (a - b) / a. | ||
| - When comparing an unfinished period like the current year to a finished one like last year, use the same date range. Never compare unfinished periods to finished one. | ||
| - Make sure the submitted result answers the user's question and it is not-empty |
There was a problem hiding this comment.
Grammar: "it is not-empty" should be "it is not empty" / "it is non-empty".
| - Make sure the submitted result answers the user's question and it is not-empty | |
| - Make sure the submitted result answers the user's question and it is not empty |
| - Pay attention to SQL dialect specific commands | ||
| - Cross joins are allowed only for tables that are guaranteed small (< 5 rows), such as enums or static dictionaries. | ||
| - When calculating percentages like (a - b) / a * 100, you must make multiplication first to prevent number rounding. Use 100 * (a - b) / a. | ||
| - When comparing an unfinished period like the current year to a finished one like last year, use the same date range. Never compare unfinished periods to finished one. |
There was a problem hiding this comment.
Grammar: "Never compare unfinished periods to finished one" should be plural ("finished ones") to read correctly.
| - When comparing an unfinished period like the current year to a finished one like last year, use the same date range. Never compare unfinished periods to finished one. | |
| - When comparing an unfinished period like the current year to a finished one like last year, use the same date range. Never compare unfinished periods to finished ones. |
|
@kosstbarz please address the copilot's comments somehow :) I personally don't look into the code until AI is happy |
No description provided.