YouTube Trending Videos Pipeline

Overview

This project builds an automated data pipeline to extract, store, process, and visualize trending YouTube videos data. It fetches daily trending video data using the YouTube Data API v3, stores it in a PostgreSQL database, processes it with Pandas, and visualizes insights using Streamlit. The pipeline is scheduled using cron jobs for automation, making it a lightweight, cost-free solution for tracking YouTube trends.

Goals

Extract daily trending YouTube videos (titles, views, likes, categories, etc.).
Store and manage data in a structured PostgreSQL database.
Clean and transform data for analysis.
Automate the pipeline for daily updates.
Visualize trends via an interactive dashboard.
Create a portfolio-worthy project with clear documentation.

Tools Used

Python: Core scripting language.
YouTube Data API v3: Data source for trending videos (free with Google Cloud account).
PostgreSQL: Relational database for data storage.
Pandas: Data cleaning and transformation.
Streamlit: Interactive dashboard for visualization.
Cron Jobs: Scheduling daily pipeline runs.
Google Colab / Local Machine: Development environment.

Project Structure

youtube-trending-pipeline/
├── scripts/
│   ├── youtube_data_ingestion.py        # Fetches data from YouTube API
│   ├── postgres_setup.py      # Loads data into PostgreSQL
│   ├── data-cleaning-transform.py        # Cleans and enriches data
│   ├── dashboard.py        # Streamlit dashboard
│   ├── run_pipeline.sh         # Bash script for cron scheduling
├── dataset/                # Generated Dataset
├── requirements.txt        # Python dependencies
├── README.md              # Project documentation

Setup Instructions

Prerequisites

Python 3.8+
Google Cloud account with YouTube Data API v3 enabled
PostgreSQL installed locally or hosted (e.g., free tier on services like Heroku or Supabase)
Git

Installation

Clone the Repository:

git clone https://github.com/your-username/youtube-trending-pipeline.git
cd youtube-trending-pipeline

Install Dependencies:
```
pip install -r requirements.txt
```
Note: Ensure psycopg2 or psycopg2-binary is included in requirements.txt for PostgreSQL connectivity.
Set Up YouTube API:
- Create a Google Cloud project: Google Cloud Console.
- Enable the YouTube Data API v3.
- Generate an API key and store it securely in a .env file:
```
YOUTUBE_API_KEY=your-api-key-here
```
Set Up PostgreSQL:
- Install PostgreSQL locally or use a hosted service.
- Create a database (e.g., youtube_trending):
```
psql -U postgres -c "CREATE DATABASE youtube_trending;"
```
- Configure database credentials in a .env file:
```
DB_HOST=localhost
DB_NAME=youtube_trending
DB_USER=your-username
DB_PASSWORD=your-password
DB_PORT=5432
```
- Run postgres_setup.py to create tables (videos, categories, fetch_log) and load data.
Optional: Scheduling:
- Set up a cron job to run the pipeline daily:
```
crontab -e
```
- Add the following line to run schedule.sh daily at 1 AM:
```
0 1 * * * /path/to/youtube-trending-pipeline/scripts/schedule.sh
```

Usage

Fetch Data:
- Run the ingestion script to fetch trending videos:
```
python scripts/youtube_data_ingestion.py
```
- Output: Raw data saved in dataset/.
Load Data:
- Load raw data into the PostgreSQL database:
```
python scripts/postgres_setup.py
```
Transform Data:
- Clean and enrich data using Pandas:
```
python scripts/data-cleaning-transform.py
```
- Output: Processed data saved in dataset/.
Visualize Data:
- Launch the Streamlit dashboard:
```
streamlit run scripts/dashboard.py
```
- View insights like top 10 trending videos, category trends, and view/like ratios.
Automate Pipeline:
- Use schedule.sh to run all scripts sequentially:
```
bash scripts/run_pipeline.sh
```

Sample Output

Database: PostgreSQL database youtube_trending contains tables (videos, categories, fetch_log) with structured data.
Visualizations: Charts in show trends like:
- Top 10 videos by views.
- Trending categories over time.
- View/like ratio analysis.
Dashboard: Interactive Streamlit app at http://localhost:8501 (default).

Example Visualizations

Top 10 Trending Videos

Video and Age Distribution

Future Improvements

Add support for multiple regions using the regionCode API parameter.
Implement error handling for API rate limits with exponential backoff.
Enhance the dashboard with filters for date ranges or categories.
Store historical trends for long-term analysis.

Contributing

Feel free to fork this repository, submit issues, or create pull requests with improvements. Ensure to follow the coding style and include tests for new features.

License

This project is licensed under the MIT License. See the LICENSE file for details.

Contact

For questions or feedback, reach out via GitHub Issues.

Name		Name	Last commit message	Last commit date
Latest commit History 30 Commits
.devcontainer		.devcontainer
.idea		.idea
__pycache__		__pycache__
dataset		dataset
notebook		notebook
output/charts		output/charts
.gitignore		.gitignore
README.md		README.md
dashboard.py		dashboard.py
data-cleaning-transform.py		data-cleaning-transform.py
main.py		main.py
postgres_setup.py		postgres_setup.py
requirements.txt		requirements.txt
run_pipeline.sh		run_pipeline.sh
video_analysis.py		video_analysis.py
youtube_data_ingestion.py		youtube_data_ingestion.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

YouTube Trending Videos Pipeline

Overview

Goals

Tools Used

Project Structure

Setup Instructions

Prerequisites

Installation

Usage

Sample Output

Example Visualizations

Future Improvements

Contributing

License

Contact

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

YouTube Trending Videos Pipeline

Overview

Goals

Tools Used

Project Structure

Setup Instructions

Prerequisites

Installation

Usage

Sample Output

Example Visualizations

Future Improvements

Contributing

License

Contact

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages