Welcome to my Data Analytics Portfolio, where I showcase hands-on projects across EDA, Customer Segmentation, Sentiment Analysis, and Fraud Detection.
These projects demonstrate my skills in Python, Machine Learning, and Data Visualization applied to real-world datasets.
- Goal: Understand nutrition patterns in menu items.
- Techniques: Pandas profiling, histograms, scatter plots.
- Key Skills: Data cleaning, descriptive statistics, visualization.
- 📂 Code | 🖼️ Sample Plots in
outputs/
- Goal: Group customers by purchasing behavior.
- Techniques: Feature Engineering, K-Means Clustering.
- Visuals: Scatter plots of clusters.
- Outcome: Identified distinct customer groups (e.g., High-value vs Budget shoppers).
- 📂 Code | 🖼️ Cluster Maps in
outputs/
- Goal: Ensure data integrity before analysis.
- Steps:
- Handle missing values
- Remove duplicates
- Detect outliers (IQR method)
- Standardize formats
- 📂 Code
- Goal: Analyze opinions (Positive, Neutral, Negative).
- Pipeline:
- Text Cleaning
- TF-IDF Vectorization → convert text to numbers
- Logistic Regression → classification
- Evaluation: Accuracy, Confusion Matrix, ROC-AUC
- Extras: WordClouds for Positive/Negative terms
- 📂 Code | 🖼️ WordClouds in
outputs/ - 🔗 Kaggle Notebook Demo (replace with your link)
- Goal: Detect fraudulent transactions from highly imbalanced data.
- Techniques:
- Logistic Regression (baseline)
- Random Forest (non-linear patterns)
- Isolation Forest (anomaly detection)
- SMOTE (oversampling)
- Evaluation: ROC-AUC, Precision, Recall
- 📂 Code | 🖼️ Confusion Matrix in
outputs/ - 🔗 Kaggle Notebook Demo (replace with your link)
- Languages: Python
- Libraries:
pandas,numpy→ data manipulationmatplotlib,seaborn→ visualizationscikit-learn→ ML algorithms (Logistic Regression, Random Forest, KMeans, etc.)imblearn (SMOTE)→ handle imbalanced datawordcloud→ text visualization