Pre-requisites

Java
Poetry
Docker

Data sources

Pick your own data to experiment with

https://www.kaggle.com/datasets?fileType=csv&sizeStart=1%2CGB&page=2&minUsabilityRating=8.00+or+higher

Steps to run the project

Clone this repository
Run poetry install in the root directory of the project
Get a dataset of your choice, we use ecommerce data for its shear size
Once the data has been saved as delta format
Be sure to kill the Spark spawned from Notebook
Start docker-compose to expose Spark's thrift server for dbt to utilize
Create external table within the container's context (Steps below)
Change directory into dbt-spark
poetry run dbt run to start running dbt

Creating external tables in container's context

Connect to Hive with client in container docker exec -it delta-lake-dbt-spark3-thrift-1 beeline -u "jdbc:hive2://localhost:10000/default" -n root
Create external tables by importing the data from delta format

CREATE SCHEMA raw;
CREATE SCHEMA rfn;
CREATE SCHEMA ast;

CREATE TABLE raw.ecommerce
USING DELTA
LOCATION '/data/delta/raw/ecommerce';

CREATE TABLE rfn.ecommerce
USING DELTA
LOCATION '/data/delta/rfn/ecommerce';

References

Most of the code here take lots of inspirations from this awesome blog

Most of the Dockerfiles came from this repo

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
.vscode		.vscode
dbt-spark		dbt-spark
docker		docker
.gitignore		.gitignore
DBT with Spark.pdf		DBT with Spark.pdf
Delta-lake.ipynb		Delta-lake.ipynb
README.md		README.md
docker-compose.yml		docker-compose.yml
poetry.lock		poetry.lock
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Pre-requisites

Data sources

Pick your own data to experiment with

Steps to run the project

Creating external tables in container's context

References

About

Uh oh!

Releases

Packages

Languages

yulian-tw/de-dbt-spark

Folders and files

Latest commit

History

Repository files navigation

Pre-requisites

Data sources

Pick your own data to experiment with

Steps to run the project

Creating external tables in container's context

References

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages