Representation Learning Trade-offs in Gastrointestinal Endoscopy Imaging (Kvasir v2)

A controlled experimental study comparing supervised transfer learning and contrastive self-supervised pretraining (SimCLR) for gastrointestinal endoscopy image classification.

This repository emphasizes quantitative evaluation, reproducibility, and critical analysis of representation learning strategies in medical computer vision.

Overview

This study evaluates representation learning paradigms on the Kvasir v2 dataset under identical preprocessing, data splits, and optimization settings to isolate architectural and training effects.

Benchmarked approaches:

Supervised transfer learning (ResNet50, EfficientNet-B0, ViT-B/16)
Contrastive self-supervised pretraining (SimCLR)
Exploratory feature-injected segmentation prototype (U-Net variant)

Contributions

Abdul Rafay Mohd

Designed and implemented supervised benchmarking pipeline
Implemented SimCLR pretraining and fine-tuning workflow
Developed stratified 64/16/20 split protocol
Built unified evaluation pipeline (Accuracy, Macro F1, ROC, PR)
Executed controlled cross-architecture comparisons

Ziya Mubeen Ahmed Mohammed

Experimental result analysis and interpretation
Comparative performance reporting
Documentation and presentation development

Dataset

Kvasir v2 — A Gastrointestinal Tract Endoscopy Dataset

Source: https://www.kaggle.com/datasets/plhalvorsen/kvasir-v2-a-gastrointestinal-tract-dataset
Total images: 8,000
Classes: 8 GI findings
Images per class: 1,000 (perfectly balanced)
Input resolution: 224×224
Normalization: ImageNet mean/std

Class Distribution

dyed-lifted-polyps (1000)
dyed-resection-margins (1000)
esophagitis (1000)
normal-cecum (1000)
normal-pylorus (1000)
normal-z-line (1000)
polyps (1000)
ulcerative-colitis (1000)

Data Split

A stratified split was used:

64% Training
16% Validation
20% Test

This ensures stable macro-F1 evaluation and prevents data leakage.

Experimental Setup

Supervised Transfer Learning

Pretrained ImageNet models were fully fine-tuned.

Input size: 224×224
Optimizer: Adam
Learning rate: 1e-4
Loss: CrossEntropyLoss
Batch size: 32
Metrics: Accuracy, Macro F1

Backbones evaluated:

ResNet50
EfficientNet-B0
ViT-B/16

Self-Supervised Learning (SimCLR)

Contrastive pretraining was implemented to evaluate representation quality without labels.

Encoder: ResNet18
Projection head: 128-dimensional MLP
Loss: NT-Xent
Temperature: 0.5
Pretraining epochs: 5
Fine-tuning epochs: 5

After pretraining, a linear classifier was trained and evaluated on the same held-out test set.

Results

Supervised vs Contrastive Learning

Test Performance

Model	Accuracy	F1 Score
EfficientNet-B0	0.8919	0.8917
ResNet50	0.8888	0.8879
ViT-B/16	0.7956	0.7680
SimCLR (Fine-tuned)	0.5672	0.5615

Key Insights

EfficientNet-B0 achieved the strongest performance.
ResNet50 performed comparably, reinforcing the value of convolutional inductive bias in moderate-scale medical datasets.
ViT underperformed relative to CNNs in this data regime.
Short-horizon SimCLR pretraining did not surpass supervised transfer learning.
Strong ImageNet initialization provided richer inductive bias than limited contrastive pretraining under dataset constraints.

These findings highlight the importance of pretraining scale, training duration, and model inductive bias in medical imaging workflows.

Segmentation Prototype (Exploratory)

A feature-injected U-Net variant using ResNet50 encodings was prototyped on Kvasir-SEG.

Mean IoU: 39.23%

This experiment explores cross-task representation reuse between classification and segmentation, but is not fully optimized.

Repository Structure

medical-representation-learning-kvasir/
│
├── notebooks/
│   └── kvasir_supervised_vs_simclr_pipeline.ipynb
│
├── results/
│   └── supervised_vs_simclr_comparison.png
│
├── requirements.txt
├── LICENSE
└── README.md

Installation

git clone https://github.com/Mohd-Abdul-Rafay/medical-representation-learning-kvasir.git
cd medical-representation-learning-kvasir
pip install -r requirements.txt

Reproducibility Notes

• Fixed preprocessing and stratified splits • Identical evaluation pipeline across models • All reported metrics are from held-out test data

Authors

Abdul Rafay Mohd
Ziya Mubeen Ahmed Mohammed

License

This project is licensed under the terms of the MIT License.

Citation

If this work is useful in your research, please cite:

@software{rafay2026medicalrepresentation,
  author  = {Abdul Rafay Mohd and Ziya Mubeen Ahmed Mohammed},
  title   = {Representation Learning Trade-offs in Gastrointestinal Endoscopy Imaging (Kvasir v2)},
  year    = {2026},
  url     = {https://github.com/Mohd-Abdul-Rafay/medical-representation-learning-kvasir}
}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Representation Learning Trade-offs in Gastrointestinal Endoscopy Imaging (Kvasir v2)

Overview

Contributions

Dataset

Class Distribution

Data Split

Experimental Setup

Supervised Transfer Learning

Self-Supervised Learning (SimCLR)

Results

Supervised vs Contrastive Learning

Test Performance

Key Insights

Segmentation Prototype (Exploratory)

Repository Structure

Repository Structure

Installation

Reproducibility Notes

Authors

License

Citation

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 16 Commits
notebooks		notebooks
results		results
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt

Folders and files

Latest commit

History

Repository files navigation

Representation Learning Trade-offs in Gastrointestinal Endoscopy Imaging (Kvasir v2)

Overview

Contributions

Dataset

Class Distribution

Data Split

Experimental Setup

Supervised Transfer Learning

Self-Supervised Learning (SimCLR)

Results

Supervised vs Contrastive Learning

Test Performance

Key Insights

Segmentation Prototype (Exploratory)

Repository Structure

Repository Structure

Installation

Reproducibility Notes

Authors

License

Citation

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages