Skip to content

Modern data stack project: orchestrates Jaffle Shop analytics using Astro/Airflow, models data with dbt, and stores it in DuckDB.

Notifications You must be signed in to change notification settings

shahidmalik4/astro-duckdb-dbt-pipeline

Repository files navigation

Astro + DuckDB + dbt Pipeline

Python Airflow dbt DuckDB Astronomer


📊 Project Overview

This repository is an end-to-end analytics pipeline built on the modern data stack, designed using the Jaffle Shop dataset. It demonstrates how to:

  • Orchestrate workflows with Astro/Airflow
  • Transform data with dbt
  • Query and store data using DuckDB (and Postgres for optional storage)
  • Use modular, production-ready analytics patterns

Think of it as a playground for modern data engineering practices—but fully functional enough to be production-ready.


🛠 Tech Stack

Component Purpose
Astro/Airflow Workflow orchestration & scheduling
dbt Data modeling & transformations
DuckDB Lightweight analytical database
Postgres Optional production database
Docker Containerization for reproducibility

⚡ Features

  • Modular Airflow DAGs for ETL orchestration
  • dbt project structured with staging + marts
  • Seed data included for quick experimentation
  • Dockerized environment for zero hassle setup
  • Full tests for DAGs and transformations

📂 Project Structure

├── dags/                 # Airflow DAGs
├── dbt/                  # dbt project
│   └── jaffle_shop_duckdb
├── include/              # DuckDB files
├── tests/                # Unit & integration tests
├── Dockerfile
├── docker-compose.yml
├── airflow_settings.yaml
└── README.md

Getting Started

  1. Clone the repository
git clone https://github.com/shahidmalik4/astro-duckdb-dbt-pipeline.git
cd astro_duckdb_dbt_pipeline
  1. Set up the Python environment
python -m venv .env
source .env/bin/activate   # or .env\Scripts\activate on Windows
pip install -r requirements.txt
  1. Run Airflow
astro dev start

About

Modern data stack project: orchestrates Jaffle Shop analytics using Astro/Airflow, models data with dbt, and stores it in DuckDB.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published