Skip to content

premraot/Dynamically-Scaling-Multichannel-Video-Inference

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Dynamically Scaling Video Inference at the Edge using Kubernetes

Design Principle

This solution re-architect traditional monolithic inference pipeline to cloud native model. With OpenVINO library, the inference workload can be scaled vertically on heterogeneous hardware engine; while kubernetes also provide HPA(Horizontal POD Autoscaler) for horizontal scale according to the collected inference metrics from the whole system. The flexible scalability in this solution can help to meet with diverse requirements on Edge computing, such as diverse inference model size, diverse input source, etc.

Please get detail about

(NOTES: This project is only for demo purpose, please do not used in any production.)

Cloud Native Design Diagram

Architecture

Architecture Diagram

  1. Camera Stream Service/File Stream Service

    The input source could be from camera or video file. There are more than input sources to produce frames to different inference queues. For example, there are 3 cameras for face detection at the same time, then all frames from these 3 cameras will be produced to face frame queue.

  2. Frame Queue

    The frames are pushed into several frame queues according to inference type like face, people, car, object etc. The frame queue service is based on redis's RPUSH, LPOP functions.

  3. Openvino Inference Engine Service

    It pickup individual frame from the stream queue then do inference. For specific inference type (people/face/car/object), there is at least 1 replica. And it could be horizontally pod scaled(HPA) according to collected metrics like drop frame speed, infer FPS or CPU usage on kubernetes.

    With different models' input, the inference service can be used for any recognition or detection. Following models are used in this solution for demo purpose:

    Note: This project will not provide above models for downloading, but the container's build script will help to download when constructing the container image on your own.

  4. Stream Broker Service

    The inference result is sent to stream broker with its IP/name information for further actions like serverless function, dashboard etc. The stream broker also use redis and is the same one for frame queue by default.

  5. Stream Websocket Server

    The HTML5 SPA(Single Page Application) could only pull stream via websocket protocol. So this server subscribes all result stream from broker and setup individual websocket connection for each inference result stream.

  6. SPA Dashboard

    It is based on HTML5 and VUE framework. THe front-end will query stream information from gateway via RESTful API http://<gateway address>/api/stream, then render all streams by establishing the connection to websocket server ws://<gateway address>/<stream name>

  7. Gateway Server

    Gateway provides a unified interface for the backend servers:

    • http://<gateway>: Dashboard SPA web server
    • http://<gateway>/api/: Restful API server
    • ws://<gateway>/<stream_name>: Stream websocket server.

Getting Start

Prerequisite

This project does not provide the container image, so you need have your own docker registry to build container image for testing and playing. It is easy to get your own registry from http:/hub.docker.com

Build container image

The build script helps to create all required container images and publish to your own docker registry as follows:

./container/build.sh -r <your own registry name>

NOTE: Please get detail options and arguments for build.sh via ./container/build.sh -h

Deploy & Test on kubernetes cluster

Note: This project has been tested on minikube cluster with kubernetes at versions 1.15.0, 1.16.0, 1.17.0.

  1. Generate kubernetes yaml file with your own registry name like:
tools/tools/gen-k8s-yaml.sh -f kubernetes/elastic-inference.yaml.template -r <your container registry>
  1. Deploy the core services as:
kubectl apply -f kubernetes/elastic-inference.yaml -n <your prefer namespace>

Note: -n <your prefer namespace is optional. Default namespace will be adopted without -n.

  1. Test by sample video file as:
kubectl apply -f kubernetes/sample-infer/ -n <your prefer namespace>

Note: -n <your prefer namespace is optional. Default namespace will be adopted without -n.

After the above steps, the kubernete cluster will expose two services via NodePort:

  • <k8s cluster IP>:31003 Frame queue service to accept any external frame producer from IP cameras.
  • <k8s cluster IP>:31002 Dashboard SPA web for result preview as follows:

You can also run INT8 and FP32 inference model at same time as follows:

  1. Test camera stream producing for inference
tools/run-css.sh -v 0 -q <kubernetes cluster address> -p 31002
  • -v 0: for /dev/video0
  • -q <kubernetes cluster address>: Kubernete cluster external address
  • -p 31002: By default, the redis base frame queue service is at this port. Note: Please get detail options and arguments for run-css.sh script via ./tools/run-css.sh -h.

Monitor Inference Metrics

After deployed on kubernetes clusters, you can monitor following metrics

  • Inference FPS from individual inference engine = ei_infer_fps
  • Total inference FPS:
  • Drop FPS = ei_drop_fps
  • Total drop FPS:
  • Scale Ratio value used to do horizontal scale:

Please get detail at Inference Metrics

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors