- Overview
- Dataset Description
- Project Structure
- Installation & Setup
- Augmentation and Preprocessing
- Model and Training Details
- Performance Metrics Across Epochs
- Training Metrics Visualization
- Inference
- Future Work
- Summary
This project focuses on classification of surface defects in steel manufacturing images as part of a semantic segmentation pipeline. Classification here acts as a preliminary screening mechanism to filter out images with no defects (label 0), thereby reducing computation required during segmentation inference.
We train a ResNet18 model to classify images into 5 classes:
0– No Defect1 to 4– Corresponding to defect classes in the segmentation dataset
The dataset is sourced from a steel surface defect detection competition:
Link: Severstal: Steel Defect Detection on Kaggle
train_images/– Folder containing training imagestest_images/– Folder containing inference imagestrain.csv– ImageId and defect class mapping
- Images not present in
train.csvare considered non-defective and assigned label0 - Multi-class classification is reduced to single-label by selecting the highest defect class per image
project/
│
├── train.py # Training entry point
├── inference.py # Placeholder for inference logic
├── data.py # Dataset loading & augmentation
├── evaluation.py # Evaluation functions and metrics
├── models/
│ └── resnet.py # ResNet18 architecture wrapper
├── utils/
│ └── helpers.py # Logging, visualization utilities
├── outputs/
│ ├── training\_metrics.png
│ ├── conf\_matrix.png
│ └── roc\_curve.png
├── report.md # This report
└── requirements.txt # All dependencies
- Python 3.8+
- PyTorch >= 1.10
- CUDA GPU recommended for faster training
git clone https://github.com/iampratyusht/l0-iampratyusht.git
cd l0-iampratyusht
pip install -r requirements.txtDownload the dataset from Kaggle and place folders as:
project/
├── train_images/
├── test_images/
└── train.csv
Note: You do not need to explicitly create DataLoaders when using
train.py— the script handles this internally. This section is only for debugging, testing augmentations, or exploring dataset behavior.
from data import get_dataloaders
train_loader, val_loader = get_dataloaders(
data_dir="./train_images",
label_file="./train.csv",
batch_size=16,
img_size=(224, 1568),
num_workers=4
)This function will:
- Prepare image-level labels (including class
0for defect-free images) - Apply augmentations (RandomCrop, Flips, Blackout, etc.)
- Return PyTorch
DataLoaderobjects for training and validation sets
During training, we use random crops of size 224x1568. For inference, full-resolution images are used.
RandomCropHorizontalFlip,VerticalFlipRandomBrightnessContrast(Albumentations)- Defect Blackout: Known defect pixels are blacked out. If all are removed, label becomes
0.
This simulates natural defects and improves generalization on defect-free images.
We used ResNet18 as the classifier backbone, modified for multi-label classification.
- Batch Size: 16 (accumulated to 32)
- Epochs: 10
- Loss Function:
BCEWithLogitsLoss - Optimizer:
SGDwith momentum
python train.py \
--model resnet18 \
--epochs 10 \
--batch-size 16 \
--lr 0.01 \
--data-dir ./train_images \
--label-file train.csv \
--save-dir ./outputs| Epoch | Train Loss | Train F1 | Train mAP | Train Acc | Train AUC | Val Loss | Val F1 | Val mAP | Val Acc | Val AUC |
|---|---|---|---|---|---|---|---|---|---|---|
| 1 | 0.2770 | 0.3183 | 0.4294 | 0.6190 | 0.8161 | 0.2313 | 0.3339 | 0.5347 | 0.7080 | 0.8973 |
| 2 | 0.2181 | 0.4052 | 0.5369 | 0.7184 | 0.9005 | 0.2332 | 0.4358 | 0.5695 | 0.6810 | 0.8916 |
| 3 | 0.1879 | 0.4983 | 0.6315 | 0.7583 | 0.9293 | 0.1787 | 0.4662 | 0.6527 | 0.7876 | 0.9383 |
| 4 | 0.1667 | 0.5655 | 0.6921 | 0.7843 | 0.9471 | 0.1810 | 0.5887 | 0.7390 | 0.7566 | 0.9472 |
| 5 | 0.1514 | 0.6492 | 0.7419 | 0.8099 | 0.9567 | 0.1369 | 0.6416 | 0.8630 | 0.8329 | 0.9691 |
| 6 | 0.1440 | 0.6879 | 0.7732 | 0.8147 | 0.9608 | 0.1888 | 0.6289 | 0.8420 | 0.7677 | 0.9666 |
| 7 | 0.1377 | 0.7124 | 0.7904 | 0.8288 | 0.9665 | 0.1555 | 0.7275 | 0.8234 | 0.8202 | 0.9630 |
| 8 | 0.1290 | 0.7233 | 0.7971 | 0.8330 | 0.9695 | 0.1444 | 0.6952 | 0.8238 | 0.8353 | 0.9612 |
| 9 | 0.1214 | 0.7630 | 0.8316 | 0.8488 | 0.9737 | 0.1826 | 0.7178 | 0.8401 | 0.7979 | 0.9614 |
| 10 | 0.1098 | 0.7686 | 0.8426 | 0.8608 | 0.9782 | 0.1255 | 0.7934 | 0.8975 | 0.8632 | 0.9793 |
Displays training and validation loss, F1, mAP, and AUC across epochs.
Visualizes true vs. predicted labels for each class.
Class-wise and macro-average AUC curve visualization.
To be filled after inference.py implementation
This section will document:
- Loading the trained model checkpoint
- Preprocessing test images
- Batch-wise prediction with optional TTA (horizontal/vertical flips)
- Saving predicted probabilities
Command:
python inference.py \
--weights model.pth \
--image-dir ./test_images \
--tta hflip,vflip \
--save-path ./outputs/inference_preds.csvIf time and compute resources allow, we plan to extend this work through:
- Transformer Models for better long-range feature modeling
- Self-Supervised Learning for pretraining on unlabeled industrial images
- Model Ensembling – Combine multiple architectures
This project successfully demonstrates a robust ResNet18-based surface defect classifier trained with meaningful augmentations and defect-aware strategies. The classifier improves pipeline efficiency by filtering out defect-absent images and providing high-confidence predictions.


