Welcome to my Data Engineering Zoomcamp journey with DataTalks.Club 🌍.
This repo contains my weekly labs, notes, Terraform infrastructure, Python scripts, and architectural diagrams.
| ✅ Week | 🔍 Topic | ⏱ Status | 🔗 Link |
|---|---|---|---|
| 1️⃣ Week 1 | Intro (Docker, Terraform, GCP Setup, IAM, CLI) | ✅ Completed | Week 1 |
| 2️⃣ Week 2 | Data Ingestion (GCS, BigQuery) | ✅ Completed | Week 2 |
| 3️⃣ Week 3 | Data Warehouse (BigQuery, dbt) | ⏳ In Progress | Week 3 |
| 4️⃣ Week 4 | Batch Processing with Spark | ⏭️ Upcoming | |
| 5️⃣ Week 5 | Airflow & Orchestration | ⏭️ Upcoming | |
| 6️⃣ Week 6 | Kafka + Streaming | ⏭️ Upcoming | |
| 🔁 Week 7+ | Capstone Project | ⏭️ Upcoming |
| Tool/Platform | Purpose |
|---|---|
| ☁️ GCP | Cloud Platform & BigQuery |
| 📦 GCS | Data Lake (Cloud Storage) |
| 🛠 Terraform | Infrastructure-as-Code |
| 🐍 Python | Scripting & Automation |
| 💡 dbt | Data Transformations |
| 🧬 Spark | Distributed Processing |
| 🕹 Airflow | Orchestration |
| 🔊 Kafka & Flink | Streaming Pipelines |
| 🗄 SQL | Queries & Transformation Logic |
data-engineering-zoomcamp/
├── 01_tools_resorces_setup/ # GCP project, service account, auth keys
├── 02_terraform_docker/ # Ingest data into GCS and BigQuery
├── 03_workflow_ochestration/ # BigQuery schema & transformations
├── 04_data_warehouse/ # Spark jobs and cluster setup
├── 05_batch/ # Airflow pipeline workflows
├── 06_streaming/ # Kafka, Spark Streaming, Flink
├── images/ # Images and screenshots and workflow images
├── notes/ # Markdown-based summaries and key learnings
├── projects/ # final projects
└── README.md # This file
This project:
- 📘 Documents my learning with hands-on practice
- 💼 Builds a professional portfolio for cloud/data roles
- 🤝 Shares reusable templates, scripts, and patterns
“Document the journey. Build in public. Share the value.” 🧠✨