Model Conversion, Optimization and Benchmarking: MobileNetV3 on Cats vs. Dogs

Overview

This project explores the process of converting a pre-trained MobileNetV3 model (trained on a cats vs. dogs dataset) into different formats and optimizing it through quantization, pruning, and distillation. It benchmarks each format and optimization in terms of model size, inference speed, and classification quality.

Objectives

Use a pre-trained model
Evaluate its size, quality, and speed on test data
Convert the model to the following formats:
- Keras (.keras, .h5)
- ONNX (.onnx)
- TensorFlow SavedModel (.pb)
- TensorFlow Lite (.tflite)
- TensorFlowJS (.js)
Benchmark every format (size, speed, accuracy, precision, recall, F1)
Quantize to INT8 and FLOAT16 (tflite)
Prune (weight thinning) the model
Distill the model to a smaller student network
Tabulate and compare results
Draw conclusions on practical applications

Workflow

1. Dataset Preparation

Dataset: Kaggle Cats vs. Dogs (custom split into train/val/test)
Preprocessing: Resized images to 224x224, preprocessed as per MobileNetV3 requirements.
Batching: Implemented tf.data pipeline with augmentation (outside model graph for TFJS compatibility).

2. Base Model

Architecture: MobileNetV3 (pre-trained, fine-tuned on dataset)
Baseline Metrics: Model size, accuracy, precision, recall, F1-score, inference time measured on test set.

3. Conversion to Multiple Formats

Saved the original model as:

Keras (.keras, .h5)
TensorFlow SavedModel
ONNX
TensorFlow Lite (.tflite)
TensorFlowJS

4. Benchmarking per Format

For each format:

Measured file size
Evaluated classification quality on test data
Calculated average inference time per image

5. Quantization

Converted TFLite model to:
- INT8 quantization - FLOAT16 quantization
Benchmarked metrics post-quantization

6. Pruning

Applied magnitude-based pruning using TensorFlow Model Optimization Toolkit
Retrained and stripped pruning wrappers
Measured size/speed/accuracy post-pruning

7. Distillation

Distilled a compact “student” model from the original “teacher” MobileNetV3
Trained the student on soft targets from teacher network at different temperatures
Compared speed/size/performance

8. Tabulation

All metrics for each model/format/optimization are summarized in a comparison table.

Key Findings

Format Suitability: Conversion to Keras, ONNX, TF SavedModel, TFLite, and TFJS enables deployment on various platforms—cloud, edge, mobile, and web respectively.
Quantization: Reduces size and increases speed (especially INT8), but with some accuracy/F1 loss.
Pruning: Further compresses models with minimal loss in accuracy.
Distillation: Enables a much smaller and faster student network, maintaining reasonable performance compared to the original model.
ONNX & TFLite: Offer great interoperability and inference acceleration outside TensorFlow environments.
Combination: Maximum optimization is achieved by combining pruning, quantization, and distillation for edge/IoT deployment.
Trade-offs: The final choice depends on hardware constraints, inference speed requirements, and the acceptable drop in model quality.

Usage

Dataset Preparation

Split images and prepare CSVs for train/val/test sets as per the notebook scripts.

Training

Train or fine-tune MobileNetV3 using provided code.