A reproducible PyTorch MLP pipeline for the Kaggle Titanic dataset.
This project demonstrates:
- Modular feature engineering
- Clean training loop separation
- Ensemble inference
- Reproducible submission generation
- src/
- model.py # MLP architecture definition
- train.py # training loop (optimizer, loss, logging)
- features.py # feature engineering (fit / transform pipeline)
- scripts/
- make_submission.py # production entry point
- data/
- raw/
- train.csv
- test.csv
- raw/
- outputs/
- submission/
- submission.csv
- submission/
- requirements.txt
- README.md
Install dependencies:
pip install -r requirements.txt
-
Download Titanic dataset from Kaggle: titanic
-
Place the files here: data/raw/train.csv data/raw/test.csv
-
From project root: python scripts/make_submission.py
-
Output file: outputs/submissions/submission.csv
-
Preprocessing is fitted on training data only.
-
All statistics (median, quantile bins, encoders) are stored as artifacts.
-
The same artifacts are reused for validation and test transformation.
-
This ensures: No data leakage Stable feature space Reproducible inference
- PyTorch MLP
- Configurable depth
- Dropout regularization
- Adam optimizer
- Ensemble across multiple random seeds
- Randomness is controlled via: Numpy seed PyTorch seed Deterministic train / validation split
- The submission script is intentionally clean (no plotting or experiment logic).