This project explores the process of converting a pre-trained MobileNetV3 model (trained on a cats vs. dogs dataset) into different formats and optimizing it through quantization, pruning, and distillation. It benchmarks each format and optimization in terms of model size, inference speed, and classification quality.
- Use a pre-trained model
- Evaluate its size, quality, and speed on test data
- Convert the model to the following formats:
- Keras (
.keras,.h5) - ONNX (
.onnx) - TensorFlow SavedModel (
.pb) - TensorFlow Lite (
.tflite) - TensorFlowJS (
.js)
- Keras (
- Benchmark every format (size, speed, accuracy, precision, recall, F1)
- Quantize to INT8 and FLOAT16 (
tflite) - Prune (weight thinning) the model
- Distill the model to a smaller student network
- Tabulate and compare results
- Draw conclusions on practical applications
- Dataset: Kaggle Cats vs. Dogs (custom split into train/val/test)
- Preprocessing: Resized images to 224x224, preprocessed as per MobileNetV3 requirements.
- Batching: Implemented tf.data pipeline with augmentation (outside model graph for TFJS compatibility).
- Architecture: MobileNetV3 (pre-trained, fine-tuned on dataset)
- Baseline Metrics: Model size, accuracy, precision, recall, F1-score, inference time measured on test set.
Saved the original model as:
Keras (.keras, .h5)TensorFlow SavedModelONNXTensorFlow Lite (.tflite)TensorFlowJS
For each format:
- Measured file size
- Evaluated classification quality on test data
- Calculated average inference time per image
- Converted TFLite model to:
- INT8 quantization - FLOAT16 quantization
- Benchmarked metrics post-quantization
- Applied magnitude-based pruning using TensorFlow Model Optimization Toolkit
- Retrained and stripped pruning wrappers
- Measured size/speed/accuracy post-pruning
- Distilled a compact “student” model from the original “teacher” MobileNetV3
- Trained the student on soft targets from teacher network at different temperatures
- Compared speed/size/performance
All metrics for each model/format/optimization are summarized in a comparison table.
- Format Suitability: Conversion to Keras, ONNX, TF SavedModel, TFLite, and TFJS enables deployment on various platforms—cloud, edge, mobile, and web respectively.
- Quantization: Reduces size and increases speed (especially INT8), but with some accuracy/F1 loss.
- Pruning: Further compresses models with minimal loss in accuracy.
- Distillation: Enables a much smaller and faster student network, maintaining reasonable performance compared to the original model.
- ONNX & TFLite: Offer great interoperability and inference acceleration outside TensorFlow environments.
- Combination: Maximum optimization is achieved by combining pruning, quantization, and distillation for edge/IoT deployment.
- Trade-offs: The final choice depends on hardware constraints, inference speed requirements, and the acceptable drop in model quality.
Split images and prepare CSVs for train/val/test sets as per the notebook scripts.
Train or fine-tune MobileNetV3 using provided code.
Use provided scripts to export models to required formats.
Run benchmarking scripts for each format and fill in results table.
| Model/Format | Precision | Recall | Accuracy | F1-Score | Size (MB) | Inference Time (s/img) |
|---|---|---|---|---|---|---|
| Keras_original | 0.996667 | 0.996667 | 0.996667 | 0.996667 | 12.103346 | 0.008293 |
| Keras | 0.996667 | 0.996667 | 0.996667 | 0.996667 | 12.088532 | 0.007742 |
| TF Lite | 0.996667 | 0.996667 | 0.996667 | 0.996667 | 12.088532 | 0.007742 |
| ONNX | 0.996667 | 0.996667 | 0.996667 | 0.996667 | 12.345000 | 0.007307 |
| Save_pb | 0.996667 | 0.993333 | 0.995000 | 0.996667 | 11.399948 | 0.001448 |
| TensorFlowJS | NaN | NaN | NaN | NaN | 11.610000 | NaN |
| TFLite (int8) | 0.500000 | 0.500000 | 0.500000 | 0.500000 | 3.329346 | 0.009200 |
| TFLite (float16) | 0.500000 | 0.500000 | 0.500000 | 0.500000 | 5.714710 | 0.008900 |
| Distilled | 0.911667 | 0.911660 | 0.911660 | 0.911660 | 12.096642 | 0.051334 |
| Pruned | 0.999400 | 0.988200 | 0.990800 | 0.990800 | 12.096902 | 0.006943 |
Если нужны данные в другом формате, дай знать!
MIT License
Thanks to TensorFlow, ONNX, and TensorFlow Model Optimization open-source communities for frameworks/libraries.
For any questions, please contact (perinadaria19@gmail.com).