Add BigQuery API support with query execution and permission model#275
Add BigQuery API support with query execution and permission model#275wbssbw wants to merge 19 commits intoobelisk:mainfrom
Conversation
6dcf5a0 to
a831dd1
Compare
There was a problem hiding this comment.
Pull request overview
This PR implements a comprehensive BigQuery API that enables Plaid modules to execute read-only SQL queries against configured BigQuery tables. The implementation follows a defense-in-depth security model with service account authentication, a three-level permission system (module → dataset → table), strict SQL identifier validation to prevent injection attacks, and schema-driven type decoding to preserve numeric and boolean types in JSON responses.
Changes:
- Added
gcp::bigquerymodule to both plaid-stl and plaid runtime withquery_table()function - Implemented Filter expression tree API for building WHERE clauses with validation
- Added schema configuration to decode BigQuery string responses into correct JSON types (integer, float, boolean, string)
Reviewed changes
Copilot reviewed 8 out of 10 changed files in this pull request and generated 5 comments.
Show a summary per file
| File | Description |
|---|---|
| runtime/plaid/src/functions/api.rs | Registers gcp_bigquery_query_table function with proper feature gating and test mode restrictions |
| runtime/plaid/src/apis/mod.rs | Adds BigQueryError to ApiError enum with From trait implementation for error conversion |
| runtime/plaid/src/apis/gcp/mod.rs | Integrates BigQuery client into GCP module structure with async initialization |
| runtime/plaid/src/apis/gcp/bigquery.rs | Core implementation: authentication, permission checking, SQL generation with identifier validation, type decoding, and query execution |
| runtime/plaid/Cargo.toml | Adds google-cloud-bigquery dependency and pins chrono to 0.4.39 for compatibility |
| runtime/plaid-stl/src/gcp/mod.rs | Exports bigquery module in standard library |
| runtime/plaid-stl/src/gcp/bigquery.rs | Defines request/response types, Filter expression tree, Operator enum, and query_table() function with 1 MiB return buffer |
| runtime/Cargo.lock | Adds transitive dependencies for BigQuery support including arrow, prost, tonic, and authentication libraries |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
60b6ad5 to
f4b64a4
Compare
7e76ee1 to
fd7131b
Compare
| let mut sql = format!("SELECT {column_list} FROM `{dataset}`.`{table}`"); | ||
|
|
||
| if let Some(f) = filter { | ||
| sql.push_str(" WHERE "); | ||
| sql.push_str(&build_filter_sql(f, 0)?); | ||
| } |
There was a problem hiding this comment.
Format strings for building SQL is highly discouraged. This should use a query builder for safety or we need more information on why we can't use a query builder.
There was a problem hiding this comment.
I looked at full query builder libraries but none have mature BigQuery/GoogleSQL support, so that path would have added a dependency without meaningfully improving security
Instead I switched the WHERE clause to use GoogleSQL named query parameters (@pn). Filter values are now passed as typed QueryParameter entries on QueryRequest rather than being inlined as SQL literals. Now, BigQuery owns value serialization and the hand-rolled string-escaping code (build_value_sql) is gone entirely
See 489bc48
obelisk
left a comment
There was a problem hiding this comment.
This seems like it has some rather large structural issues
|
@wbssbw I think I've fixed all the issues with 1.93. Please try rebasing on main |
obelisk
left a comment
There was a problem hiding this comment.
Please rebase on main where 1.93 should have all the issues resolved
489bc48 to
09bc509
Compare
Summary
Implements a new BigQuery API that allows Plaid modules to execute read-only queries against configured tables. The implementation includes service account authentication, a permission model that maps modules to allowed datasets and tables, SQL query construction with strict identifier validation to prevent injection attacks, and schema-driven type decoding to preserve numeric and boolean types instead of coercing all values to strings.
Changes
Standard library (plaid-stl):
gcp::bigquerymodule withquery_table()functionReadTableRequestandReadTableResponsetypesFilterexpression tree for WHERE clauses withAnd,Or, andConditionnodesOperatorenum (Eq,Ne,Lt,Le,Gt,Ge,Like,IsNull,IsNotNull)DerefandIntoIteratorforReadTableResponseto enable direct iterationRuntime API (plaid):
gcp::bigquerymodule withBigQueryclientrconfig map: module → dataset → allowed tables[A-Za-z_][A-Za-z0-9_]*)schemasconfig) to decode BigQuery string responses into correct JSON types (integer, float, boolean, string)Configuration:
bigqueryfield toGcpConfigcontaining credentials, permissions, schemas, and timeoutr[module][dataset] = [table1, table2, ...]schemas[dataset][table][column] = typeDependencies:
google-cloud-bigqueryv0.15.0 (gated behindgcpfeature)chronoto0.4.39(compatibility problems with bigquery crate)Security Considerations
SQL injection prevention:
'→'')Authorization:
rconfig mapBadRequestbefore any network callData access:
SELECT *is rejectedPerformance/Operational Impact