This repository contains the artifacts for Cloud-Based YouTube Trending Analytics Pipeline on AWS, a serverless data and ML workflow that ingests daily trending-video metadata and comments, curates analytics-ready datasets, and produces next-day trending predictions. The system emphasizes managed AWS services to minimize operations overhead while keeping costs predictable.
- Daily ingestion: EventBridge triggers Lambda functions that call the YouTube Data API and land raw JSON into partitioned S3 buckets by region and ingest date.
- Curated ETL layers: AWS Glue jobs flatten, type-cast, deduplicate, and partition trending and comment data into Parquet, enabling efficient Athena queries and downstream feature computation.
- Sentiment + feature engineering: Comments are scored with AWS Comprehend; trending metrics are joined with sentiment aggregates to build labeled feature sets capturing engagement momentum and audience tone.
- Model training and predictions: Glue jobs retrain models on updated features and emit next-day view-growth estimates and stay-trending probabilities for analytics dashboards.
- Secure, modular infrastructure: A dedicated VPC, Secrets Manager–backed credentials, and clearly separated scripts/notebooks keep ingestion, ETL, ML, and presentation assets organized.
Python/ETL/: Glue scripts for trending ingestion, comments processing, sentiment integration, and feature labeling used throughout the workflow.Notebooks/: Exploratory analysis, validation, and visualization notebooks.Diagrams/: Architecture and network diagrams referenced in the presentation materials.Resources/: Branding assets displayed in this README.FinalReport.md: Full project write-up covering scope, architecture, results, and cost considerations.PresentationScript.mdandPresentationassets: Slide narrative outlining workflow stages and infrastructure.Notes.mdandTODO.md: Working design notes, schemas, and milestone tracking.
- Review
FinalReport.mdfor the full architecture description, operational flow, and future work ideas. - Browse
Python/ETL/to see Glue job implementations for trending ingestion, comments curation, and sentiment/feature engineering. - Open
Notebooks/for exploratory analyses and validation steps used during model development. - Check
Diagrams/alongsidePresentationScript.mdfor visual references to the pipeline and network layout. - Track outstanding tasks or design decisions in
TODO.mdandNotes.mdwhen iterating on the pipeline.

