This project demonstrates how to clean and prepare raw data using SQL.
The focus is on identifying common data quality issues and resolving them using
industry-standard SQL techniques.
The cleaned data can be directly used for analytics, dashboards, and reporting.
- Handle missing and NULL values
- Remove duplicate records
- Standardize text and categorical data
- Fix incorrect data types
- Validate and clean invalid records
- Data Type: Structured layyed-off employee data
- Format: Relational table
- Common Issues Found:
- Missing values
- Duplicate rows
- Inconsistent text formats
- Incorrect data types
- Invalid age and salary values
- SQL
- MySQL
- SQL IDE (MySQL Workbench)
- Converted blank strings to NULL
- Removed records with critical missing data
- Used
ROW_NUMBER()withPARTITION BY - Retained only unique records
- Trimmed extra spaces
- Standardized text using
UPPER()andINITCAP()
- Converted string dates to DATE
- Converted age and salary to numeric values
- Removed invalid age values
- Removed negative or zero salary records
- SELECT, WHERE, UPDATE, DELETE
- CASE WHEN
- CTEs (WITH clause)
- Window Functions
- Data type casting
- String functions
- Cleaned and standardized dataset
- Improved data accuracy and consistency
- Ready for analysis and visualization
- Automate cleaning using stored procedures
- Add data validation checks
- Integrate with Power BI / Tableau
- Schedule cleaning jobs
Pushpak Umale
Data Analyst | SQL | Power BI | Business Analytics