Add foundation provider infrastructure for data and compute providers#48
Merged
punit-naik-amp merged 1 commit intoCHUCK-10-redshiftfrom Dec 15, 2025
Conversation
This PR establishes the base provider architecture for accessing data from different platforms and running Stitch jobs on different compute backends. Changes: - Add DataProvider protocol defining the interface for data sources - Add DatabricksProviderAdapter stub (implementation in PR 2) - Add RedshiftProviderAdapter stub with required AWS credentials, IAM role, and EMR cluster ID (implementation in PR 2) - Add DataProviderFactory for creating data providers - Add ComputeProvider protocol defining the interface for compute backends - Add DatabricksComputeProvider stub (implementation in PR 3) - Add EMRComputeProvider stub (implementation in PR 4) - Add ProviderFactory with unified interface for both provider types - Add comprehensive unit tests (52 tests, all passing) Key design decisions: - Data providers handle storage operations (no separate abstraction) - EMR uses boto3 credential discovery (aws_profile, IAM roles, env vars) - RedshiftProviderAdapter requires AWS credentials and accepts redshift_iam_role for COPY/UNLOAD operations - ComputeProvider.prepare_stitch_job() receives data_provider parameter - Pure additive changes (no modifications to existing code) Jira: CHUCK-10
b537298 to
ec5a0a6
Compare
Contributor
Author
|
@pragyan-amp Changed the base branch from |
punit-naik-amp
added a commit
that referenced
this pull request
Jan 13, 2026
…#48) This PR establishes the base provider architecture for accessing data from different platforms and running Stitch jobs on different compute backends. Changes: - Add DataProvider protocol defining the interface for data sources - Add DatabricksProviderAdapter stub (implementation in PR 2) - Add RedshiftProviderAdapter stub with required AWS credentials, IAM role, and EMR cluster ID (implementation in PR 2) - Add DataProviderFactory for creating data providers - Add ComputeProvider protocol defining the interface for compute backends - Add DatabricksComputeProvider stub (implementation in PR 3) - Add EMRComputeProvider stub (implementation in PR 4) - Add ProviderFactory with unified interface for both provider types - Add comprehensive unit tests (52 tests, all passing) Key design decisions: - Data providers handle storage operations (no separate abstraction) - EMR uses boto3 credential discovery (aws_profile, IAM roles, env vars) - RedshiftProviderAdapter requires AWS credentials and accepts redshift_iam_role for COPY/UNLOAD operations - ComputeProvider.prepare_stitch_job() receives data_provider parameter - Pure additive changes (no modifications to existing code) Jira: CHUCK-10 These is just the scaffolding/additive changes. No code is modified. Doing it in stages so that reviewing becomes easy. Will fold in the actual implementation of databricks and redshift in later PRs.
punit-naik-amp
added a commit
that referenced
this pull request
Jan 13, 2026
…#48) This PR establishes the base provider architecture for accessing data from different platforms and running Stitch jobs on different compute backends. Changes: - Add DataProvider protocol defining the interface for data sources - Add DatabricksProviderAdapter stub (implementation in PR 2) - Add RedshiftProviderAdapter stub with required AWS credentials, IAM role, and EMR cluster ID (implementation in PR 2) - Add DataProviderFactory for creating data providers - Add ComputeProvider protocol defining the interface for compute backends - Add DatabricksComputeProvider stub (implementation in PR 3) - Add EMRComputeProvider stub (implementation in PR 4) - Add ProviderFactory with unified interface for both provider types - Add comprehensive unit tests (52 tests, all passing) Key design decisions: - Data providers handle storage operations (no separate abstraction) - EMR uses boto3 credential discovery (aws_profile, IAM roles, env vars) - RedshiftProviderAdapter requires AWS credentials and accepts redshift_iam_role for COPY/UNLOAD operations - ComputeProvider.prepare_stitch_job() receives data_provider parameter - Pure additive changes (no modifications to existing code) Jira: CHUCK-10 These is just the scaffolding/additive changes. No code is modified. Doing it in stages so that reviewing becomes easy. Will fold in the actual implementation of databricks and redshift in later PRs.
punit-naik-amp
added a commit
that referenced
this pull request
Jan 16, 2026
…#48) This PR establishes the base provider architecture for accessing data from different platforms and running Stitch jobs on different compute backends. Changes: - Add DataProvider protocol defining the interface for data sources - Add DatabricksProviderAdapter stub (implementation in PR 2) - Add RedshiftProviderAdapter stub with required AWS credentials, IAM role, and EMR cluster ID (implementation in PR 2) - Add DataProviderFactory for creating data providers - Add ComputeProvider protocol defining the interface for compute backends - Add DatabricksComputeProvider stub (implementation in PR 3) - Add EMRComputeProvider stub (implementation in PR 4) - Add ProviderFactory with unified interface for both provider types - Add comprehensive unit tests (52 tests, all passing) Key design decisions: - Data providers handle storage operations (no separate abstraction) - EMR uses boto3 credential discovery (aws_profile, IAM roles, env vars) - RedshiftProviderAdapter requires AWS credentials and accepts redshift_iam_role for COPY/UNLOAD operations - ComputeProvider.prepare_stitch_job() receives data_provider parameter - Pure additive changes (no modifications to existing code) Jira: CHUCK-10 These is just the scaffolding/additive changes. No code is modified. Doing it in stages so that reviewing becomes easy. Will fold in the actual implementation of databricks and redshift in later PRs.
punit-naik-amp
added a commit
that referenced
this pull request
Jan 16, 2026
…#48) This PR establishes the base provider architecture for accessing data from different platforms and running Stitch jobs on different compute backends. Changes: - Add DataProvider protocol defining the interface for data sources - Add DatabricksProviderAdapter stub (implementation in PR 2) - Add RedshiftProviderAdapter stub with required AWS credentials, IAM role, and EMR cluster ID (implementation in PR 2) - Add DataProviderFactory for creating data providers - Add ComputeProvider protocol defining the interface for compute backends - Add DatabricksComputeProvider stub (implementation in PR 3) - Add EMRComputeProvider stub (implementation in PR 4) - Add ProviderFactory with unified interface for both provider types - Add comprehensive unit tests (52 tests, all passing) Key design decisions: - Data providers handle storage operations (no separate abstraction) - EMR uses boto3 credential discovery (aws_profile, IAM roles, env vars) - RedshiftProviderAdapter requires AWS credentials and accepts redshift_iam_role for COPY/UNLOAD operations - ComputeProvider.prepare_stitch_job() receives data_provider parameter - Pure additive changes (no modifications to existing code) Jira: CHUCK-10 These is just the scaffolding/additive changes. No code is modified. Doing it in stages so that reviewing becomes easy. Will fold in the actual implementation of databricks and redshift in later PRs.
punit-naik-amp
added a commit
that referenced
this pull request
Jan 16, 2026
…#48) This PR establishes the base provider architecture for accessing data from different platforms and running Stitch jobs on different compute backends. Changes: - Add DataProvider protocol defining the interface for data sources - Add DatabricksProviderAdapter stub (implementation in PR 2) - Add RedshiftProviderAdapter stub with required AWS credentials, IAM role, and EMR cluster ID (implementation in PR 2) - Add DataProviderFactory for creating data providers - Add ComputeProvider protocol defining the interface for compute backends - Add DatabricksComputeProvider stub (implementation in PR 3) - Add EMRComputeProvider stub (implementation in PR 4) - Add ProviderFactory with unified interface for both provider types - Add comprehensive unit tests (52 tests, all passing) Key design decisions: - Data providers handle storage operations (no separate abstraction) - EMR uses boto3 credential discovery (aws_profile, IAM roles, env vars) - RedshiftProviderAdapter requires AWS credentials and accepts redshift_iam_role for COPY/UNLOAD operations - ComputeProvider.prepare_stitch_job() receives data_provider parameter - Pure additive changes (no modifications to existing code) Jira: CHUCK-10 These is just the scaffolding/additive changes. No code is modified. Doing it in stages so that reviewing becomes easy. Will fold in the actual implementation of databricks and redshift in later PRs.
punit-naik-amp
added a commit
that referenced
this pull request
Jan 16, 2026
…#48) This PR establishes the base provider architecture for accessing data from different platforms and running Stitch jobs on different compute backends. Changes: - Add DataProvider protocol defining the interface for data sources - Add DatabricksProviderAdapter stub (implementation in PR 2) - Add RedshiftProviderAdapter stub with required AWS credentials, IAM role, and EMR cluster ID (implementation in PR 2) - Add DataProviderFactory for creating data providers - Add ComputeProvider protocol defining the interface for compute backends - Add DatabricksComputeProvider stub (implementation in PR 3) - Add EMRComputeProvider stub (implementation in PR 4) - Add ProviderFactory with unified interface for both provider types - Add comprehensive unit tests (52 tests, all passing) Key design decisions: - Data providers handle storage operations (no separate abstraction) - EMR uses boto3 credential discovery (aws_profile, IAM roles, env vars) - RedshiftProviderAdapter requires AWS credentials and accepts redshift_iam_role for COPY/UNLOAD operations - ComputeProvider.prepare_stitch_job() receives data_provider parameter - Pure additive changes (no modifications to existing code) Jira: CHUCK-10 These is just the scaffolding/additive changes. No code is modified. Doing it in stages so that reviewing becomes easy. Will fold in the actual implementation of databricks and redshift in later PRs.
punit-naik-amp
added a commit
that referenced
this pull request
Jan 16, 2026
…#48) This PR establishes the base provider architecture for accessing data from different platforms and running Stitch jobs on different compute backends. Changes: - Add DataProvider protocol defining the interface for data sources - Add DatabricksProviderAdapter stub (implementation in PR 2) - Add RedshiftProviderAdapter stub with required AWS credentials, IAM role, and EMR cluster ID (implementation in PR 2) - Add DataProviderFactory for creating data providers - Add ComputeProvider protocol defining the interface for compute backends - Add DatabricksComputeProvider stub (implementation in PR 3) - Add EMRComputeProvider stub (implementation in PR 4) - Add ProviderFactory with unified interface for both provider types - Add comprehensive unit tests (52 tests, all passing) Key design decisions: - Data providers handle storage operations (no separate abstraction) - EMR uses boto3 credential discovery (aws_profile, IAM roles, env vars) - RedshiftProviderAdapter requires AWS credentials and accepts redshift_iam_role for COPY/UNLOAD operations - ComputeProvider.prepare_stitch_job() receives data_provider parameter - Pure additive changes (no modifications to existing code) Jira: CHUCK-10 These is just the scaffolding/additive changes. No code is modified. Doing it in stages so that reviewing becomes easy. Will fold in the actual implementation of databricks and redshift in later PRs.
punit-naik-amp
added a commit
that referenced
this pull request
Jan 23, 2026
…#48) This PR establishes the base provider architecture for accessing data from different platforms and running Stitch jobs on different compute backends. Changes: - Add DataProvider protocol defining the interface for data sources - Add DatabricksProviderAdapter stub (implementation in PR 2) - Add RedshiftProviderAdapter stub with required AWS credentials, IAM role, and EMR cluster ID (implementation in PR 2) - Add DataProviderFactory for creating data providers - Add ComputeProvider protocol defining the interface for compute backends - Add DatabricksComputeProvider stub (implementation in PR 3) - Add EMRComputeProvider stub (implementation in PR 4) - Add ProviderFactory with unified interface for both provider types - Add comprehensive unit tests (52 tests, all passing) Key design decisions: - Data providers handle storage operations (no separate abstraction) - EMR uses boto3 credential discovery (aws_profile, IAM roles, env vars) - RedshiftProviderAdapter requires AWS credentials and accepts redshift_iam_role for COPY/UNLOAD operations - ComputeProvider.prepare_stitch_job() receives data_provider parameter - Pure additive changes (no modifications to existing code) Jira: CHUCK-10 These is just the scaffolding/additive changes. No code is modified. Doing it in stages so that reviewing becomes easy. Will fold in the actual implementation of databricks and redshift in later PRs.
punit-naik-amp
added a commit
that referenced
this pull request
Jan 28, 2026
…#48) This PR establishes the base provider architecture for accessing data from different platforms and running Stitch jobs on different compute backends. Changes: - Add DataProvider protocol defining the interface for data sources - Add DatabricksProviderAdapter stub (implementation in PR 2) - Add RedshiftProviderAdapter stub with required AWS credentials, IAM role, and EMR cluster ID (implementation in PR 2) - Add DataProviderFactory for creating data providers - Add ComputeProvider protocol defining the interface for compute backends - Add DatabricksComputeProvider stub (implementation in PR 3) - Add EMRComputeProvider stub (implementation in PR 4) - Add ProviderFactory with unified interface for both provider types - Add comprehensive unit tests (52 tests, all passing) Key design decisions: - Data providers handle storage operations (no separate abstraction) - EMR uses boto3 credential discovery (aws_profile, IAM roles, env vars) - RedshiftProviderAdapter requires AWS credentials and accepts redshift_iam_role for COPY/UNLOAD operations - ComputeProvider.prepare_stitch_job() receives data_provider parameter - Pure additive changes (no modifications to existing code) Jira: CHUCK-10 These is just the scaffolding/additive changes. No code is modified. Doing it in stages so that reviewing becomes easy. Will fold in the actual implementation of databricks and redshift in later PRs.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This PR establishes the base provider architecture for accessing data from different platforms and running Stitch jobs on different compute backends.
Changes:
Key design decisions:
Jira: CHUCK-10
These is just the scaffolding/additive changes. No code is modified. Doing it in stages so that reviewing becomes easy. Will fold in the actual implementation of databricks and redshift in later PRs.