Object Recognition can solve many real-world problems around us but the current research in ML Domain happens to be focused on dataset that is standardised, clear and clean. In the Indian scenario context, there is a lot of uncertainty that we counter with because of non-standard practices that add more real challenge to understanding the scene, for better decision making. Example: Imagine a crowded two lane road in a metropolitan city. You will see lots of objects and its complex relationship in scene. All these interlinked relations makes it really hard to make decisions.
For better understanding of task, I have trained MaskRCNN-Model and created dataset from scratch using cvat tools. I am able to achieve 0.54 mAP.
- Make a 3 video in busy city of Bangalore, keeping mobile camera in hand over a bike. Got around: 15000 Image. After cleaning and clearing, It concludes to 6000 Images
- Load the Data in CVAT and Annotation it with auto-annotation model and track feature of tools. It takes around 1 hours to get 1000 Images
- Exported in label-me format
- 
Trained a Mask R-CNN Model for Object Detection and Segmentation: This is an implementation of Mask R-CNN on Python 3, Keras, and TensorFlow. Ref: Matterport MaskRCNN 
- 
Jupyter Notebook, Kaggle-training.ipyb: Model Trained and Inference in Kaggle GPU Notebook
- 
Config File 
class MathikereTrainConfig(Config):
    # define the name of the configuration
    NAME = "mathikere_cfg"
    # number of classes (background + no of class)
    NUM_CLASSES = 1 + 3
    
    GPU_COUNT = 1
    IMAGES_PER_GPU = 1
    
    # number of training steps per epoch
    STEPS_PER_EPOCH = 13
Python 3.7.8, TensorFlow 2.0, and other common packages listed in requirements.txt.
- Clone this repository
- Install dependencies
pip3 install -r requirements.txt
- Run setup from the repository root directory
python3 setup.py install


