Credits: Most scripts have been referenced from Fivetran DW Benchmark and have been adapted to suit our particular usecase.
- Move
dsdgento a GCS bucket to a specific location as mentioned in the bootstrap script - Create a High CPU
VMeg. 16vCPU - Clone this repository
git clone $REPO_URL- Give all script files executable permission
chmod +x *.sh- Run
bootstrap.sh- This pulls
dsdgenbinary - Installs Google Fuse; this is to mount GCS bucket as a local folder - More info
- This pulls
- Run
data_gen.sh
Usage:./data_gen.sh $CPU $SCALE
- This is responsible for generating data
$CPUdenotes the amount of parallelism must be > 1$SCALEdenotes the scale of data that needs to be generated- This creates and mounts a GCS Bucket and writes data to it
- NOTE: Ensure that
$CPUis close to number of CPUs in VM for efficient parallel generation
- Run
load_data.sh
Usage:./load_data.sh $SCALE- This is responsible of loading data in GCP buckets created in step 5 to BigQuery
$SCALEdenotes the scale of data that needs to be loaded to BigQuery- Note: Before running this step ensure that data is generated and present in the appropriate GCS Bucket
- Run
benchmark.sh
Usage:./benchmark.sh $SCALE- This is responsible for running TPC-DS queries and measuring query execution time
- Generates a
csvfile inresultsfolder containing the querystart_timeandend_time - Saves query statistics in the same directory