AIP-99: Add LLMSchemaCompareOperator#62793
Conversation
providers/common/ai/src/airflow/providers/common/ai/operators/llm_schema_compare.py
Show resolved
Hide resolved
e341889 to
4c62fbf
Compare
4c62fbf to
00e3995
Compare
providers/common/ai/src/airflow/providers/common/ai/operators/llm_schema_compare.py
Outdated
Show resolved
Hide resolved
providers/common/ai/src/airflow/providers/common/ai/operators/llm_schema_compare.py
Outdated
Show resolved
Hide resolved
providers/common/ai/src/airflow/providers/common/ai/operators/llm_schema_compare.py
Outdated
Show resolved
Hide resolved
providers/common/ai/src/airflow/providers/common/ai/operators/llm_schema_compare.py
Outdated
Show resolved
Hide resolved
providers/common/ai/src/airflow/providers/common/ai/operators/llm_schema_compare.py
Outdated
Show resolved
Hide resolved
providers/common/ai/src/airflow/providers/common/ai/operators/llm_schema_compare.py
Outdated
Show resolved
Hide resolved
providers/common/ai/src/airflow/providers/common/ai/operators/llm_schema_compare.py
Outdated
Show resolved
Hide resolved
providers/common/ai/src/airflow/providers/common/ai/operators/llm_schema_compare.py
Outdated
Show resolved
Hide resolved
providers/common/ai/src/airflow/providers/common/ai/operators/llm_schema_compare.py
Show resolved
Hide resolved
providers/common/ai/src/airflow/providers/common/ai/operators/llm_schema_compare.py
Show resolved
Hide resolved
8329899 to
b5cbe02
Compare
kaxil
left a comment
There was a problem hiding this comment.
Thanks for addressing the first round of feedback — DataFusionEngine lazy import, Literal types, exception handling in _is_dbapi_connection, and the prompt duplication fix all look good. A few new issues from the changes.
providers/common/ai/src/airflow/providers/common/ai/operators/llm_schema_compare.py
Outdated
Show resolved
Hide resolved
providers/common/ai/src/airflow/providers/common/ai/operators/llm_schema_compare.py
Show resolved
Hide resolved
providers/common/ai/src/airflow/providers/common/ai/operators/llm_schema_compare.py
Outdated
Show resolved
Hide resolved
providers/common/ai/src/airflow/providers/common/ai/operators/llm_schema_compare.py
Outdated
Show resolved
Hide resolved
providers/common/ai/src/airflow/providers/common/ai/example_dags/example_llm_schema_compare.py
Outdated
Show resolved
Hide resolved
4533efc to
28506d7
Compare
kaxil
left a comment
There was a problem hiding this comment.
Much better. Round 2 issues all addressed: log format strings fixed, type-equivalence hints folded into DEFAULT_SYSTEM_PROMPT, conn_id restored in Source labels, AirflowException through compat, example DAGs cleaned up.
Two remaining items — one typo bug and one design question.
providers/common/ai/src/airflow/providers/common/ai/operators/llm_schema_compare.py
Outdated
Show resolved
Hide resolved
providers/common/ai/src/airflow/providers/common/ai/operators/llm_schema_compare.py
Show resolved
Hide resolved
providers/common/ai/src/airflow/providers/common/ai/operators/llm_schema_compare.py
Outdated
Show resolved
Hide resolved
kaxil
left a comment
There was a problem hiding this comment.
LGTM with minor nits (already commented inline):
- Missing
)in f-string at line 267 of the operator —({dialect_name}should be({dialect_name}) - Docstring for
system_promptsays "appended" but the behavior is "replaces" — update the wording - Warning logs for PK/FK/index failures should include
table_namefor context
28506d7 to
06fb433
Compare
thank you, if you would like to take a look one more time? |
closes: #62734
Add a new operator for cross-system schema drift detection powered by LLM reasoning to the common.ai provider.
LLMSchemaCompareOperator introspects schemas from multiple data sources (databases via DbApiHook, object storage via DataFusionEngine) and uses an LLM to identify mismatches that would break data loading. The LLM handles complex cross-system type mapping that simple equality checks miss (e.g., varchar(255) vs string, timestamp vs timestamptz).
An example output view:
Was generative AI tooling used to co-author this PR?
{pr_number}.significant.rstor{issue_number}.significant.rst, in airflow-core/newsfragments.