This project is a Streamlit-based exploratory data analysis (EDA) application designed for educational purposes in Data Science and Machine Learning courses.
The app provides a complete walkthrough from raw data inspection to correlation analysis, standardization, and Principal Component Analysis (PCA).
The application includes the following steps:
-
Dataset Overview
- Dataset shape (rows & columns)
- Column names
-
Raw Data Preview
- Interactive row selection
- Tabular visualization
-
Descriptive Statistics
- Full summary using
df.describe(include="all") - Covers numeric and categorical attributes
- Full summary using
-
Data Type Inspection
- Column-wise data type listing
-
Missing Value Analysis
- Missing value counts per column
-
Numeric Feature Detection
- Automatic detection of numeric attributes
-
Correlation Analysis
- Pearson correlation matrix
- Heatmap visualization using Seaborn
-
Standardization
- Z-score normalization applied to numeric features
- Separate standardized dataset preview
-
Principal Component Analysis (PCA)
- PCA applied on standardized numeric features
- First two principal components (PC1 & PC2)
- Explained variance ratio
- 2D scatter plot visualization