This repository contains Python and SQL projects relating to data cleaning and feature engineering.
Language(s): Python, SQL
Package(s): (Python) Pandas, Seaborn
Software: Jupyter Notebooks
- Pandas 2021 Nutrition Data Cleaning: Extensively cleans a 2021 Nutrition dataset and feature engineers 11 new columns with Python's Pandas module to prepare for future visualization. Cleaning process includes Exploratory Data Analysis, removing columns, handling duplicates, data type recasting, and filling missing values. Feature Engineering process creates columns adhering to predefined goals using grouping, aggregation, merging, and more.
- SQL 2021 Nutrition Data Cleaning: Will use MySQL queries to accomplish the same data cleaning and feature engineering process followed in Pandas 2021 Nutrition Data Cleaning, producing the same final CSV files of transformed data.