Qubole Streaminglens tool for tuning Spark Structured Streaming Pipelines
-
Updated
Jan 21, 2020 - Scala
Qubole Streaminglens tool for tuning Spark Structured Streaming Pipelines
ecommerce GCP Streaming pipeline ― Cloud Storage, Compute Engine, Pub/Sub, Dataflow, Apache Beam, BigQuery and Tableau; GCP Batch pipeline ― Cloud Storage, Dataproc, PySpark, Cloud Spanner and Tableau
Real Time Data Streaming Pipeline
Stream data directly from an API using Apache Beam to BigQuery.
Collecting highlights from the Quix community and social media in the form of interesting questions, comments, challenges, solutions and insights
Collecting highlights from the Quix community and social media in the form of interesting questions, comments, challenges, solutions and insights
This project implements a modern data engineering pipeline using Databricks, PySpark, DBT, and Delta Live Tables. It follows the Medallion Architecture, supports realtime data ingestion with Autoloader, and models data with fact and dimension tables, including Slowly Changing Dimensions (SCD Type 2), all orchestrated in a scalable cloud environment
Streaming pipeline using AWS MSK and AWS EMR with Spark, retrieving the data from Twitter Streams API
AI-powered data sanitizer with schema detection, dedupe, outlier detection, and LLM enrichment.
Docs-only case study – Compliance Reporting data platform on Azure for a Big-4 Audit & Consulting Firm (BFSI, healthcare-style datasets) using Streaming Pipeline (ETL) + Batch Pipeline (ELT) with Snowflake, Synapse, ADF, Power BI, ML risk scoring, DQ, governance, and lineage.
Docs-only case study of a compliance & anomaly detection platform on Azure + Databricks (Streaming ETL + Batch ELT + ML).
Data Engineer Training Using Google Cloud Platform
Masters degree | Data Engineering | Final course projects | goit-de-fp
Add a description, image, and links to the streaming-pipeline topic page so that developers can more easily learn about it.
To associate your repository with the streaming-pipeline topic, visit your repo's landing page and select "manage topics."