This project contains a Python script for preprocessing the original Titanic dataset (titanic_original.csv). The goal is to clean and prepare the data for further analysis or machine learning tasks.
- Input file: 
titanic_original.csv - Output file: 
titanic_cleaned.xlsx(cleaned version saved in Excel format) 
- 
Drop Unnecessary Columns The following columns were removed as they contain many missing values or are not relevant for modeling:
cabinboatbodyhome.dest
 - 
Handle Missing Values in Age
- Missing values in the 
agecolumn were filled using the mean age. - All age values were rounded to the nearest integer.
 
 - Missing values in the 
 - 
Fix Missing Embarked Values
- Missing values in the 
embarkedcolumn were filled with'S', the most frequent port of embarkation. 
 - Missing values in the 
 - 
Correct Fare Values
- Zero or negative 
farevalues were replaced with the mean fare of positive fares only. - Negative fare values were clamped to 
0. 
 - Zero or negative 
 - 
Remove Duplicates
- Duplicate records were identified and removed from the dataset.
 
 
- Cleaned dataset saved as: 
titanic_cleaned.xlsx - Format: Excel (uses 
openpyxlengine) 
pip install pandas openpyxlEnsure the titanic_original.csv file is in the same directory, then run:
python titanic_cleaning.pyAfter execution, titanic_cleaned.xlsx will be generated in the same directory.
- The script is designed to be a lightweight and simple preprocessor for Titanic data.
 - It’s easily extensible for further cleaning or feature engineering.