Skip to content

Portfolio of Data Analytics & Machine Learning projects: EDA, Segmentation, Sentiment Analysis, Fraud Detection.

Notifications You must be signed in to change notification settings

Azoa126/data_analyticsportfolio

Repository files navigation

📊 Data Analytics & Machine Learning Portfolio

Welcome to my Data Analytics Portfolio, where I showcase hands-on projects across EDA, Customer Segmentation, Sentiment Analysis, and Fraud Detection.
These projects demonstrate my skills in Python, Machine Learning, and Data Visualization applied to real-world datasets.


🚀 Projects Overview

1️⃣ McDonald’s Nutrition Facts – Exploratory Data Analysis (EDA)

  • Goal: Understand nutrition patterns in menu items.
  • Techniques: Pandas profiling, histograms, scatter plots.
  • Key Skills: Data cleaning, descriptive statistics, visualization.
  • 📂 Code | 🖼️ Sample Plots in outputs/

2️⃣ Retail Customer Segmentation

  • Goal: Group customers by purchasing behavior.
  • Techniques: Feature Engineering, K-Means Clustering.
  • Visuals: Scatter plots of clusters.
  • Outcome: Identified distinct customer groups (e.g., High-value vs Budget shoppers).
  • 📂 Code | 🖼️ Cluster Maps in outputs/

3️⃣ Data Cleaning Pipeline

  • Goal: Ensure data integrity before analysis.
  • Steps:
    • Handle missing values
    • Remove duplicates
    • Detect outliers (IQR method)
    • Standardize formats
  • 📂 Code

4️⃣ Sentiment Analysis (Twitter + Play Store Reviews)

  • Goal: Analyze opinions (Positive, Neutral, Negative).
  • Pipeline:
    • Text Cleaning
    • TF-IDF Vectorization → convert text to numbers
    • Logistic Regression → classification
    • Evaluation: Accuracy, Confusion Matrix, ROC-AUC
  • Extras: WordClouds for Positive/Negative terms
  • 📂 Code | 🖼️ WordClouds in outputs/
  • 🔗 Kaggle Notebook Demo (replace with your link)

5️⃣ Credit Card Fraud Detection

  • Goal: Detect fraudulent transactions from highly imbalanced data.
  • Techniques:
    • Logistic Regression (baseline)
    • Random Forest (non-linear patterns)
    • Isolation Forest (anomaly detection)
    • SMOTE (oversampling)
  • Evaluation: ROC-AUC, Precision, Recall
  • 📂 Code | 🖼️ Confusion Matrix in outputs/
  • 🔗 Kaggle Notebook Demo (replace with your link)

📦 Tech Stack

  • Languages: Python
  • Libraries:
    • pandas, numpy → data manipulation
    • matplotlib, seaborn → visualization
    • scikit-learn → ML algorithms (Logistic Regression, Random Forest, KMeans, etc.)
    • imblearn (SMOTE) → handle imbalanced data
    • wordcloud → text visualization

🖥️ Repository Structure

1️⃣ McDonald’s Nutrition Facts – Exploratory Data Analysis (EDA)

2️⃣ Retail Customer Segmentation

3️⃣ Data Cleaning Pipeline

4️⃣ Sentiment Analysis (Twitter + Play Store Reviews)

5️⃣ Credit Card Fraud Detection

About

Portfolio of Data Analytics & Machine Learning projects: EDA, Segmentation, Sentiment Analysis, Fraud Detection.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages