Bring BigQuery FastSync implementation into alignment with Snowflake#901
Bring BigQuery FastSync implementation into alignment with Snowflake#901judahrand wants to merge 20 commits intotransferwise:masterfrom
Conversation
|
@Samira-El is this something that you'd be interested in looking at and merging given that Wise doesn't maintain or test this area of the codebase? |
|
Hey Judah, i would ping @jmriego to get his opinion on this and whether this would create any conflict with pipelinewise-target-bigquery. Also I reckon usage of GCS should also be implemented in that target. |
It doesn't conflict we're running it this way currently.
Yup, this might be a good addition. Though there are some peculiarities between the FastSync and Singer targets. The FastSync implementation uses CSVs whereas the Singer version uses Avro. What are your thoughts @jmriego? |
|
sorry, just seeing this now. I think the GCS implementation makes sense but I'm a bit worried about the effects of making that mandatory. I know in my company we would have issues doing that and it's also a different service you have to enable. It's not as integrated as it is on Snowflake. Sorry about this, I think GCS support totally makes sense. It would enable that if it's not possible to load the file for some reason, you could at least download it locally and check any issues with the data |
Problem
The BigQuery FastSync mechanism currently works differently to the Snowflake implementation which uploads the CSVs to S3 before importing into Snowflake. BigQuery is able to do the same thing but with GCS. This should result in faster operation assuming the PipelineWise is running in GCP.
Proposed changes
Types of changes
What types of changes does your code introduce to PipelineWise?
Put an
xin the boxes that applyChecklist
setup.pyis an individual PR and not mixed with feature or bugfix PRs[AP-NNNN](if applicable. AP-NNNN = JIRA ID)AP-NNN(if applicable. AP-NNN = JIRA ID)