This project sets up an Extract & Load (ELT) workflow using Docker and PostgreSQL. The workflow extracts data from a source PostgreSQL database and loads it into a destination PostgreSQL database. This is done without any transformations, simply moving data between the two databases. After loading this data models and Jinja are created in dbt, elt script is orchestrated through airflow!
-
A
source_postgrescontainer is created and configured with a database namedsource_db. -
A custom SQL initialization script (
init.sql) is mounted to the container and executed automatically upon startup. This script inserts sample data into thesource_dbdatabase. -
Another PostgreSQL container is created, this time configured with a database named
destination_db. -
The EL script (elt_script.py) is a Python script located in the
eltfolder, which is built into a Docker container which connects to both PostgreSQL databases using connection parameters defined in the Docker Compose file. -
The script’s purpose is to extract data from the
source_postgrescontainer (fromsource_db) and load it into thedestination_postgrescontainer (intodestination_db). -
The container automatically waits for the
source_postgresanddestination_postgrescontainers to be ready before executing the Python script to perform the extraction and loading process because of the "depends_on" directive. -
The process is triggered automatically when the
elt_scriptcontainer starts and completes the data load once it has finished processing. -
All services (source and destination databases, as well as the EL script) are connected to a Docker network (
elt_network) for seamless communication. -
....
Inspired from freeCodeCamp.org!