In this tutorial, we will create transformations using Delta Live Tables (DLT) in Databricks. Read the full article here.
Make sure you have a Databricks account and a cluster up and running.
To create some real transformations, we need to provide seed (raw) data to DLT.
We'll manually create a few raw tables in Databricks using scripts available in the seed directory.
You can run these scripts directly on a Databricks notebook that is attached to an active cluster.
Note that the seed data can also be created in the form of parquet files.
- Open up a Databricks notebook and run the SQL commands found in the
transformationsdirectory - Click on Workflows in the sidebar > Delta Live Tables > Create pipeline
- Select the notebook created earlier
- Select Triggered for the pipeline mode and hit Create
- Click Start on top bar of the pipeline window
Databricks would now start creating the pipeline, populate your medallion tables, and generate a dependency graph. You can modify the pipeline any time including the schedule and target tables.
