Skip to content

pushpakumale/AW_Data-Engineering-Project

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

29 Commits
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

⚑ Azure End-to-End Data Engineering Project

This repository contains a comprehensive end-to-end data engineering pipeline implemented using the Azure ecosystem. The project simulates a real-world data flowβ€”from ingestion to transformation to analytics-ready servingβ€”using enterprise-grade tools and best practices.

πŸ” Overview

The project is divided into three core phases:

🟦 Phase 1: Dynamic Data Ingestion

        Ingest data from various sources (e.g., blob storage, API (Github)) using Azure Data Factory.
        Schema evolution and metadata handling with Data Lake Storage Gen2.
        Parameterized and reusable pipeline design for flexibility and automation.

πŸ”§ Phase 2: Data Transformation using Databricks

        Cleanse and transform raw data using Azure Databricks (Apache Spark).
        Apply business logic and data enrichment.
        Create bronze, silver, and gold layer architecture using Delta Lake for optimized querying and reliability.

πŸ”· Phase 3: Serving with Synapse

        Serve transformed datasets to business users through Azure Synapse Analytics.
        Enable SQL-based reporting and dashboarding using Power BI.
        Leverage Synapse serverless and dedicated pools for optimized performance and cost.

πŸ“‚ Project Structure

        πŸ“ azure-end-to-end-project
        β”œβ”€β”€ ingestion
        β”‚   └── data-factory-pipelines
        β”œβ”€β”€ transformation
        β”‚   └── databricks-notebooks
        β”œβ”€β”€ serving
        β”‚   └── synapse-scripts
        β”œβ”€β”€ Data
        β”‚   └── sample datasets
        β”œβ”€β”€ Assets
        β”‚   └── Azure assets used
        └── README.md

βš™οΈ Tech Stack

        Azure Data Factory
        Azure Databricks
        Azure Synapse Analytics
        Azure Data Lake Gen2
        Power BI

🎯 Key Features

        Modular and scalable architecture
        Supports dynamic data sources and schema variations
        Follows medallion architecture for transformation
        End-to-end orchestration with monitoring and logging

πŸ§‘β€πŸ’» Author

Pushpak Umale
Data Analyst
LinkdIn | Portfolio

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published