added azureopenai support & extandable architecture for pydanticai by cetingokhan · Pull Request #62816 · apache/airflow

cetingokhan · 2026-03-03T20:23:54Z

This pull request refactors the model resolution logic in the PydanticAIHook to use a modular builder pattern, improving extensibility and clarity for handling different AI providers, especially Azure OpenAI. It introduces dedicated builder classes for Azure OpenAI, custom endpoints, and default resolution, and updates documentation and tests to match the new structure.

Builder pattern for model resolution:

Added new builder classes: AzureOpenAIBuilder, CustomEndpointBuilder, DefaultBuilder, and a ProviderBuilder protocol in the builders package to modularize how models are constructed from Airflow connection details. [1] [2] [3] [4]

Refactoring in PydanticAIHook:

Replaced direct calls to infer_model and provider factory logic with a prioritized builder selection process (AzureOpenAIBuilder → CustomEndpointBuilder → DefaultBuilder), improving support for Azure OpenAI and custom endpoints.

Documentation and UI improvements:

Updated docstrings and UI field behavior in PydanticAIHook to clarify connection fields for Azure OpenAI and provide examples for required extras like api_version and azure_deployment. [1] [2]

Testing updates:

Modified unit tests to patch infer_model and infer_provider_class from the new builder modules instead of the hook, ensuring tests match the refactored code structure. [1] [2] [3] [4] [5]

Connection validation enhancements:

Improved connection testing to validate Azure-specific requirements (presence of api_version and host) and clarified error handling for missing model configuration.

Was generative AI tooling used to co-author this PR?

Yes (please specify the tool below)
Cloude Sonnet 4.6 & Gemini 3.1 Pro
Filled some of methods scope and tests created via copilot

Read the Pull Request Guidelines for more information. Note: commit author/co-author name and email in commits become permanently public when merged.
For fundamental code changes, an Airflow Improvement Proposal (AIP) is needed.
When adding dependency, check compliance with the ASF 3rd Party License Policy.
For significant user-facing changes create newsfragment: {pr_number}.significant.rst or {issue_number}.significant.rst, in airflow-core/newsfragments.

kaxil

Thanks for the contribution! Adding Azure OpenAI support is a good idea.

My main concern: pydantic-ai already ships with native AzureProvider support — see the docs. You can do:

from pydantic_ai.providers.azure import AzureProvider

model = OpenAIChatModel(
    'gpt-5.2',
    provider=AzureProvider(
        azure_endpoint='https://myresource.openai.azure.com',
        api_version='2024-07-01-preview',
        api_key='...',
    ),
)

Or even just use "azure:gpt-5.2" as the model string (with env vars set). So we don't need to manually construct AsyncAzureOpenAI clients from the openai SDK — pydantic-ai handles Azure natively.

The current hook's get_conn() is ~25 lines and delegates to pydantic-ai's infer_model() + provider_factory. Azure support could be a simple conditional branch using AzureProvider, without the 4-file builder pattern. The whole point of using pydantic-ai is that it abstracts provider differences for us — we should lean on that rather than wrapping it in another layer.

Specific comments inline.

kaxil · 2026-03-03T20:42:39Z

providers/common/ai/src/airflow/providers/common/ai/builders/base.py

+    from pydantic_ai.models import KnownModelName, Model
+
+
+class ProviderBuilder(Protocol):


Do we need this abstraction? The current get_conn() is a straightforward if/else that delegates to pydantic-ai's own infer_model(). Adding a Protocol + 3 builder classes + a dispatch loop for what's essentially a single new code path (Azure) feels like premature abstraction for a problem that doesn't exist yet. If/when we genuinely need pluggable resolution, we can introduce it then.

Also — ProviderBuilder is declared as a Protocol (structural typing), but the concrete classes inherit from it (nominal typing). These are two different patterns — if you want an inheritance hierarchy, use ABC; if you want duck typing, don't inherit from the Protocol in the concrete classes. Mixing both is confusing for contributors.

kaxil · 2026-03-03T20:42:39Z

providers/common/ai/src/airflow/providers/common/ai/builders/azure_openai.py

+        base_url: str | None,
+    ) -> Model:
+        try:
+            from openai import AsyncAzureOpenAI


pydantic-ai already has native Azure support via AzureProvider — no need to drop down to the raw openai SDK:

from pydantic_ai.providers.azure import AzureProvider from pydantic_ai.models.openai import OpenAIChatModel provider = AzureProvider( azure_endpoint=base_url, api_version=api_version, api_key=api_key, ) model = OpenAIChatModel(model_name, provider=provider)

See https://ai.pydantic.dev/models/openai/#azure

By constructing AsyncAzureOpenAI directly we're bypassing pydantic-ai's own provider abstraction (which may handle retries, error mapping, etc.) and coupling ourselves to openai SDK internals.

Using AzureProvider would also make a separate builder class unnecessary — it could just be a few lines in get_conn().

kaxil · 2026-03-03T20:42:39Z

providers/common/ai/src/airflow/providers/common/ai/builders/azure_openai.py

+        return OpenAIChatModel(slug, provider=OpenAIProvider(openai_client=azure_client))
+
+    @staticmethod
+    def _import_callable(dotted_path: str) -> Any:


This calls importlib.import_module() on a user-provided string from connection extras — module imports can run arbitrary code at import time. Connection extras are editable by any user with connection-edit permissions, which is a lower-privilege surface than DAG deployment.

If we end up needing a custom token provider path, this should at least be documented as a security-sensitive field. But since pydantic-ai's AzureProvider accepts api_key directly, we may not need this at all for the initial implementation.

kaxil · 2026-03-03T20:42:39Z

providers/common/ai/src/airflow/providers/common/ai/hooks/pydantic_ai.py

-
-        self._model = infer_model(model_name, provider_factory=_provider_factory)
-        return self._model
+        raise RuntimeError("No suitable ProviderBuilder found to construct the model.")


This line is unreachable — DefaultBuilder.supports() always returns True, so the loop will always match on the third iteration. Dead code that implies the loop might not match, which could mislead future readers.

kaxil · 2026-03-03T20:42:39Z

providers/common/ai/src/airflow/providers/common/ai/builders/custom_endpoint.py

+        return _factory
+
+
+class DefaultBuilder(ProviderBuilder):


DefaultBuilder.build() is return infer_model(model_name) — a single call. The existing hook does this in two lines: if not api_key and not base_url: return infer_model(model_name). Does this one-liner need its own class with Protocol conformance and a supports() method?

kaxil

#62816 (review)

cetingokhan · 2026-03-03T21:15:09Z

Thanks for the comments.
Actually, I initially only added Azure support to the get_conn method in the pydantic_ai.py file for needs like api_version, but considering that similar needs might arise in other LLM providers and anticipating a significant increase in LLM providers, I thought creating a protocol would provide more flexibility. However, from what you wrote, I understand the necessary logic; following a restrictive approach rather than focusing on flexibility is a more appropriate perspective.
So, I will try to proceed as you suggested ;)

Guide AI coding tools and contributors toward the right design decisions: delegate to pydantic-ai instead of re-implementing provider-specific logic, keep the hook thin, avoid premature abstractions like builder patterns or registries. Motivated by PR apache#62816 which added ~280 lines of Azure OpenAI builder code that duplicated what pydantic-ai's AzureProvider already handles natively.

added azureopenai support & extandable architecture for pydanticai

fae99c8

cetingokhan requested review from gopidesupavan and kaxil as code owners March 3, 2026 20:23

boring-cyborg bot added area:providers provider:common-ai labels Mar 3, 2026

kaxil reviewed Mar 3, 2026

View reviewed changes

kaxil requested changes Mar 3, 2026

View reviewed changes

kaxil mentioned this pull request Mar 3, 2026

Add AGENTS.md to common AI provider #62824

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

added azureopenai support & extandable architecture for pydanticai#62816

added azureopenai support & extandable architecture for pydanticai#62816
cetingokhan wants to merge 1 commit intoapache:mainfrom
cetingokhan:aip-99-phase1-azureopenai-support

cetingokhan commented Mar 3, 2026

Uh oh!

kaxil left a comment

Uh oh!

kaxil Mar 3, 2026

Uh oh!

kaxil Mar 3, 2026

Uh oh!

kaxil Mar 3, 2026

Uh oh!

kaxil Mar 3, 2026

Uh oh!

kaxil Mar 3, 2026

Uh oh!

kaxil left a comment

Uh oh!

cetingokhan commented Mar 3, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

		from pydantic_ai.models import KnownModelName, Model


		class ProviderBuilder(Protocol):

Conversation

cetingokhan commented Mar 3, 2026

Was generative AI tooling used to co-author this PR?

Uh oh!

kaxil left a comment

Choose a reason for hiding this comment

Uh oh!

kaxil Mar 3, 2026

Choose a reason for hiding this comment

Uh oh!

kaxil Mar 3, 2026

Choose a reason for hiding this comment

Uh oh!

kaxil Mar 3, 2026

Choose a reason for hiding this comment

Uh oh!

kaxil Mar 3, 2026

Choose a reason for hiding this comment

Uh oh!

kaxil Mar 3, 2026

Choose a reason for hiding this comment

Uh oh!

kaxil left a comment

Choose a reason for hiding this comment

Uh oh!

cetingokhan commented Mar 3, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants