Skip to content

Add foundation provider infrastructure for data and compute providers#48

Merged
punit-naik-amp merged 1 commit intoCHUCK-10-redshiftfrom
CHUCK-10-provider-infrastructure-foundation
Dec 15, 2025
Merged

Add foundation provider infrastructure for data and compute providers#48
punit-naik-amp merged 1 commit intoCHUCK-10-redshiftfrom
CHUCK-10-provider-infrastructure-foundation

Conversation

@punit-naik-amp
Copy link
Contributor

This PR establishes the base provider architecture for accessing data from different platforms and running Stitch jobs on different compute backends.

Changes:

  • Add DataProvider protocol defining the interface for data sources
  • Add DatabricksProviderAdapter stub (implementation in PR 2)
  • Add RedshiftProviderAdapter stub with required AWS credentials, IAM role, and EMR cluster ID (implementation in PR 2)
  • Add DataProviderFactory for creating data providers
  • Add ComputeProvider protocol defining the interface for compute backends
  • Add DatabricksComputeProvider stub (implementation in PR 3)
  • Add EMRComputeProvider stub (implementation in PR 4)
  • Add ProviderFactory with unified interface for both provider types
  • Add comprehensive unit tests (52 tests, all passing)

Key design decisions:

  • Data providers handle storage operations (no separate abstraction)
  • EMR uses boto3 credential discovery (aws_profile, IAM roles, env vars)
  • RedshiftProviderAdapter requires AWS credentials and accepts redshift_iam_role for COPY/UNLOAD operations
  • ComputeProvider.prepare_stitch_job() receives data_provider parameter
  • Pure additive changes (no modifications to existing code)

Jira: CHUCK-10

These is just the scaffolding/additive changes. No code is modified. Doing it in stages so that reviewing becomes easy. Will fold in the actual implementation of databricks and redshift in later PRs.

This PR establishes the base provider architecture for accessing data from
different platforms and running Stitch jobs on different compute backends.

Changes:
- Add DataProvider protocol defining the interface for data sources
- Add DatabricksProviderAdapter stub (implementation in PR 2)
- Add RedshiftProviderAdapter stub with required AWS credentials, IAM role, and EMR cluster ID (implementation in PR 2)
- Add DataProviderFactory for creating data providers
- Add ComputeProvider protocol defining the interface for compute backends
- Add DatabricksComputeProvider stub (implementation in PR 3)
- Add EMRComputeProvider stub (implementation in PR 4)
- Add ProviderFactory with unified interface for both provider types
- Add comprehensive unit tests (52 tests, all passing)

Key design decisions:
- Data providers handle storage operations (no separate abstraction)
- EMR uses boto3 credential discovery (aws_profile, IAM roles, env vars)
- RedshiftProviderAdapter requires AWS credentials and accepts redshift_iam_role for COPY/UNLOAD operations
- ComputeProvider.prepare_stitch_job() receives data_provider parameter
- Pure additive changes (no modifications to existing code)

Jira: CHUCK-10
@punit-naik-amp punit-naik-amp force-pushed the CHUCK-10-provider-infrastructure-foundation branch from b537298 to ec5a0a6 Compare December 12, 2025 16:28
@punit-naik-amp punit-naik-amp changed the base branch from main to CHUCK-10-redshift December 13, 2025 06:57
@punit-naik-amp
Copy link
Contributor Author

@pragyan-amp Changed the base branch from main to CHUCK-10-redshift so that the main branch can stay clean while we merge a bunch of reviewed and tested PRs in stages to the CHUCK-10-redshift branch (as this feature involves a lot of code changes which can't be reviewed easily in one single and huge PR). In the end I will create one final PR from CHUCK-10-redshift to main.

Copy link
Contributor

@pragyan-amp pragyan-amp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks Good.. :shipit:

@punit-naik-amp punit-naik-amp merged commit a7ffd33 into CHUCK-10-redshift Dec 15, 2025
2 checks passed
@punit-naik-amp punit-naik-amp deleted the CHUCK-10-provider-infrastructure-foundation branch December 15, 2025 03:50
punit-naik-amp added a commit that referenced this pull request Jan 13, 2026
…#48)

This PR establishes the base provider architecture for accessing data
from different platforms and running Stitch jobs on different compute
backends.

Changes:
- Add DataProvider protocol defining the interface for data sources
- Add DatabricksProviderAdapter stub (implementation in PR 2)
- Add RedshiftProviderAdapter stub with required AWS credentials, IAM
role, and EMR cluster ID (implementation in PR 2)
- Add DataProviderFactory for creating data providers
- Add ComputeProvider protocol defining the interface for compute
backends
- Add DatabricksComputeProvider stub (implementation in PR 3)
- Add EMRComputeProvider stub (implementation in PR 4)
- Add ProviderFactory with unified interface for both provider types
- Add comprehensive unit tests (52 tests, all passing)

Key design decisions:
- Data providers handle storage operations (no separate abstraction)
- EMR uses boto3 credential discovery (aws_profile, IAM roles, env vars)
- RedshiftProviderAdapter requires AWS credentials and accepts
redshift_iam_role for COPY/UNLOAD operations
- ComputeProvider.prepare_stitch_job() receives data_provider parameter
- Pure additive changes (no modifications to existing code)

Jira: CHUCK-10

These is just the scaffolding/additive changes. No code is modified.
Doing it in stages so that reviewing becomes easy. Will fold in the
actual implementation of databricks and redshift in later PRs.
punit-naik-amp added a commit that referenced this pull request Jan 13, 2026
…#48)

This PR establishes the base provider architecture for accessing data
from different platforms and running Stitch jobs on different compute
backends.

Changes:
- Add DataProvider protocol defining the interface for data sources
- Add DatabricksProviderAdapter stub (implementation in PR 2)
- Add RedshiftProviderAdapter stub with required AWS credentials, IAM
role, and EMR cluster ID (implementation in PR 2)
- Add DataProviderFactory for creating data providers
- Add ComputeProvider protocol defining the interface for compute
backends
- Add DatabricksComputeProvider stub (implementation in PR 3)
- Add EMRComputeProvider stub (implementation in PR 4)
- Add ProviderFactory with unified interface for both provider types
- Add comprehensive unit tests (52 tests, all passing)

Key design decisions:
- Data providers handle storage operations (no separate abstraction)
- EMR uses boto3 credential discovery (aws_profile, IAM roles, env vars)
- RedshiftProviderAdapter requires AWS credentials and accepts
redshift_iam_role for COPY/UNLOAD operations
- ComputeProvider.prepare_stitch_job() receives data_provider parameter
- Pure additive changes (no modifications to existing code)

Jira: CHUCK-10

These is just the scaffolding/additive changes. No code is modified.
Doing it in stages so that reviewing becomes easy. Will fold in the
actual implementation of databricks and redshift in later PRs.
punit-naik-amp added a commit that referenced this pull request Jan 16, 2026
…#48)

This PR establishes the base provider architecture for accessing data
from different platforms and running Stitch jobs on different compute
backends.

Changes:
- Add DataProvider protocol defining the interface for data sources
- Add DatabricksProviderAdapter stub (implementation in PR 2)
- Add RedshiftProviderAdapter stub with required AWS credentials, IAM
role, and EMR cluster ID (implementation in PR 2)
- Add DataProviderFactory for creating data providers
- Add ComputeProvider protocol defining the interface for compute
backends
- Add DatabricksComputeProvider stub (implementation in PR 3)
- Add EMRComputeProvider stub (implementation in PR 4)
- Add ProviderFactory with unified interface for both provider types
- Add comprehensive unit tests (52 tests, all passing)

Key design decisions:
- Data providers handle storage operations (no separate abstraction)
- EMR uses boto3 credential discovery (aws_profile, IAM roles, env vars)
- RedshiftProviderAdapter requires AWS credentials and accepts
redshift_iam_role for COPY/UNLOAD operations
- ComputeProvider.prepare_stitch_job() receives data_provider parameter
- Pure additive changes (no modifications to existing code)

Jira: CHUCK-10

These is just the scaffolding/additive changes. No code is modified.
Doing it in stages so that reviewing becomes easy. Will fold in the
actual implementation of databricks and redshift in later PRs.
punit-naik-amp added a commit that referenced this pull request Jan 16, 2026
…#48)

This PR establishes the base provider architecture for accessing data
from different platforms and running Stitch jobs on different compute
backends.

Changes:
- Add DataProvider protocol defining the interface for data sources
- Add DatabricksProviderAdapter stub (implementation in PR 2)
- Add RedshiftProviderAdapter stub with required AWS credentials, IAM
role, and EMR cluster ID (implementation in PR 2)
- Add DataProviderFactory for creating data providers
- Add ComputeProvider protocol defining the interface for compute
backends
- Add DatabricksComputeProvider stub (implementation in PR 3)
- Add EMRComputeProvider stub (implementation in PR 4)
- Add ProviderFactory with unified interface for both provider types
- Add comprehensive unit tests (52 tests, all passing)

Key design decisions:
- Data providers handle storage operations (no separate abstraction)
- EMR uses boto3 credential discovery (aws_profile, IAM roles, env vars)
- RedshiftProviderAdapter requires AWS credentials and accepts
redshift_iam_role for COPY/UNLOAD operations
- ComputeProvider.prepare_stitch_job() receives data_provider parameter
- Pure additive changes (no modifications to existing code)

Jira: CHUCK-10

These is just the scaffolding/additive changes. No code is modified.
Doing it in stages so that reviewing becomes easy. Will fold in the
actual implementation of databricks and redshift in later PRs.
punit-naik-amp added a commit that referenced this pull request Jan 16, 2026
…#48)

This PR establishes the base provider architecture for accessing data
from different platforms and running Stitch jobs on different compute
backends.

Changes:
- Add DataProvider protocol defining the interface for data sources
- Add DatabricksProviderAdapter stub (implementation in PR 2)
- Add RedshiftProviderAdapter stub with required AWS credentials, IAM
role, and EMR cluster ID (implementation in PR 2)
- Add DataProviderFactory for creating data providers
- Add ComputeProvider protocol defining the interface for compute
backends
- Add DatabricksComputeProvider stub (implementation in PR 3)
- Add EMRComputeProvider stub (implementation in PR 4)
- Add ProviderFactory with unified interface for both provider types
- Add comprehensive unit tests (52 tests, all passing)

Key design decisions:
- Data providers handle storage operations (no separate abstraction)
- EMR uses boto3 credential discovery (aws_profile, IAM roles, env vars)
- RedshiftProviderAdapter requires AWS credentials and accepts
redshift_iam_role for COPY/UNLOAD operations
- ComputeProvider.prepare_stitch_job() receives data_provider parameter
- Pure additive changes (no modifications to existing code)

Jira: CHUCK-10

These is just the scaffolding/additive changes. No code is modified.
Doing it in stages so that reviewing becomes easy. Will fold in the
actual implementation of databricks and redshift in later PRs.
punit-naik-amp added a commit that referenced this pull request Jan 16, 2026
…#48)

This PR establishes the base provider architecture for accessing data
from different platforms and running Stitch jobs on different compute
backends.

Changes:
- Add DataProvider protocol defining the interface for data sources
- Add DatabricksProviderAdapter stub (implementation in PR 2)
- Add RedshiftProviderAdapter stub with required AWS credentials, IAM
role, and EMR cluster ID (implementation in PR 2)
- Add DataProviderFactory for creating data providers
- Add ComputeProvider protocol defining the interface for compute
backends
- Add DatabricksComputeProvider stub (implementation in PR 3)
- Add EMRComputeProvider stub (implementation in PR 4)
- Add ProviderFactory with unified interface for both provider types
- Add comprehensive unit tests (52 tests, all passing)

Key design decisions:
- Data providers handle storage operations (no separate abstraction)
- EMR uses boto3 credential discovery (aws_profile, IAM roles, env vars)
- RedshiftProviderAdapter requires AWS credentials and accepts
redshift_iam_role for COPY/UNLOAD operations
- ComputeProvider.prepare_stitch_job() receives data_provider parameter
- Pure additive changes (no modifications to existing code)

Jira: CHUCK-10

These is just the scaffolding/additive changes. No code is modified.
Doing it in stages so that reviewing becomes easy. Will fold in the
actual implementation of databricks and redshift in later PRs.
punit-naik-amp added a commit that referenced this pull request Jan 16, 2026
…#48)

This PR establishes the base provider architecture for accessing data
from different platforms and running Stitch jobs on different compute
backends.

Changes:
- Add DataProvider protocol defining the interface for data sources
- Add DatabricksProviderAdapter stub (implementation in PR 2)
- Add RedshiftProviderAdapter stub with required AWS credentials, IAM
role, and EMR cluster ID (implementation in PR 2)
- Add DataProviderFactory for creating data providers
- Add ComputeProvider protocol defining the interface for compute
backends
- Add DatabricksComputeProvider stub (implementation in PR 3)
- Add EMRComputeProvider stub (implementation in PR 4)
- Add ProviderFactory with unified interface for both provider types
- Add comprehensive unit tests (52 tests, all passing)

Key design decisions:
- Data providers handle storage operations (no separate abstraction)
- EMR uses boto3 credential discovery (aws_profile, IAM roles, env vars)
- RedshiftProviderAdapter requires AWS credentials and accepts
redshift_iam_role for COPY/UNLOAD operations
- ComputeProvider.prepare_stitch_job() receives data_provider parameter
- Pure additive changes (no modifications to existing code)

Jira: CHUCK-10

These is just the scaffolding/additive changes. No code is modified.
Doing it in stages so that reviewing becomes easy. Will fold in the
actual implementation of databricks and redshift in later PRs.
punit-naik-amp added a commit that referenced this pull request Jan 23, 2026
…#48)

This PR establishes the base provider architecture for accessing data
from different platforms and running Stitch jobs on different compute
backends.

Changes:
- Add DataProvider protocol defining the interface for data sources
- Add DatabricksProviderAdapter stub (implementation in PR 2)
- Add RedshiftProviderAdapter stub with required AWS credentials, IAM
role, and EMR cluster ID (implementation in PR 2)
- Add DataProviderFactory for creating data providers
- Add ComputeProvider protocol defining the interface for compute
backends
- Add DatabricksComputeProvider stub (implementation in PR 3)
- Add EMRComputeProvider stub (implementation in PR 4)
- Add ProviderFactory with unified interface for both provider types
- Add comprehensive unit tests (52 tests, all passing)

Key design decisions:
- Data providers handle storage operations (no separate abstraction)
- EMR uses boto3 credential discovery (aws_profile, IAM roles, env vars)
- RedshiftProviderAdapter requires AWS credentials and accepts
redshift_iam_role for COPY/UNLOAD operations
- ComputeProvider.prepare_stitch_job() receives data_provider parameter
- Pure additive changes (no modifications to existing code)

Jira: CHUCK-10

These is just the scaffolding/additive changes. No code is modified.
Doing it in stages so that reviewing becomes easy. Will fold in the
actual implementation of databricks and redshift in later PRs.
punit-naik-amp added a commit that referenced this pull request Jan 28, 2026
…#48)

This PR establishes the base provider architecture for accessing data
from different platforms and running Stitch jobs on different compute
backends.

Changes:
- Add DataProvider protocol defining the interface for data sources
- Add DatabricksProviderAdapter stub (implementation in PR 2)
- Add RedshiftProviderAdapter stub with required AWS credentials, IAM
role, and EMR cluster ID (implementation in PR 2)
- Add DataProviderFactory for creating data providers
- Add ComputeProvider protocol defining the interface for compute
backends
- Add DatabricksComputeProvider stub (implementation in PR 3)
- Add EMRComputeProvider stub (implementation in PR 4)
- Add ProviderFactory with unified interface for both provider types
- Add comprehensive unit tests (52 tests, all passing)

Key design decisions:
- Data providers handle storage operations (no separate abstraction)
- EMR uses boto3 credential discovery (aws_profile, IAM roles, env vars)
- RedshiftProviderAdapter requires AWS credentials and accepts
redshift_iam_role for COPY/UNLOAD operations
- ComputeProvider.prepare_stitch_job() receives data_provider parameter
- Pure additive changes (no modifications to existing code)

Jira: CHUCK-10

These is just the scaffolding/additive changes. No code is modified.
Doing it in stages so that reviewing becomes easy. Will fold in the
actual implementation of databricks and redshift in later PRs.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants