Skip to content

zukui1984/data-engineer-zoomcamp_2024

Repository files navigation

Datatalks - Data Engineer Zoomcamp 2024

Overview

  • Module 1: Docker, SQL, Terraform

    • Introduction to GCP
    • Docker and docker-compose
    • Running Postgres locally with Docker
    • Setting up infrastructure on GCP with Terraform
  • Module 2: Orchestration with Mage AI (Airflow alternative - https://www.mage.ai)

    • Data Lake
    • Workflow orchestration
    • Workflow orchestration with Mage AI
  • Workshop 1: Data Ingestion with dlt (https://dlthub.com)

    • Reading from apis
    • Building scalable pipelines
    • Normalising data
    • Incremental loading
  • Module 3: Data Warehouses with BigQuery

    • Data Warehouse
    • BigQuery
    • Partitioning and clustering
    • BigQuery best pratices
    • BigQuery Machine Learning
  • Module 4: Analytics Engineering with dbt (data build tools - https://getdbt.com)

    • Basics of analytics engineering
    • BigQuery and dbt | Postgres and dbt
    • dbt models
    • Testing and documenting
    • Deployment to the cloud and locally
    • Visualizing the data with google data studio and metabase
  • Module 5: Batch processing with PySpark

    • Spark - Dataframes, SQL
    • Internals: GroupBy and joins
  • Workshop 2: Stream processing (RisingWave - https://risingwave.com & Redpanda - https://redpanda.com)

Course overview

image

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published