I'm a second-year IT student who loves turning raw data into meaningful stories. Currently diving deep into data analysis, visualisation & ML β making numbers make sense.
- π BE in Information Technology (2nd Year)
- π Interested in Data Analysis, Visualisation & AI/ML
- π± Currently learning: PySpark, NLP, and data pipeline engineering
- π Always curious, always building
Languages
Data & ML
Tools
End-to-end PySpark ETL pipeline on the Olist Brazilian E-Commerce dataset with 100K+ orders across multiple relational tables. Features modular architecture, feature engineering for delivery performance, late order flagging, and Parquet-format master dataset for optimized downstream reads.
PySpark ETL Parquet Big Data
Retrieval-based AI chatbot using TF-IDF vectorization and Cosine Similarity to map user queries to a predefined knowledge base. Includes a hybrid engine for FAQs + small talk, confidence threshold gating, and performance analytics visualized with Matplotlib & Seaborn.
Python NLP TF-IDF scikit-learn Matplotlib Seaborn
Performed exploratory data analysis on 20,000+ global air quality records, identifying pollution trends across 50+ cities and visualizing seasonal PM2.5 patterns.
Python Pandas Matplotlib
OOP-based inventory system in Java utilizing inheritance and polymorphism for product categorization, file-based persistence via serialization, and automated daily stock monitoring & report generation.
Java OOP Serialization
