Dynamically Scaling Video Inference at the Edge using Kubernetes

Design Principle

This solution re-architect traditional monolithic inference pipeline to cloud native model. With OpenVINO library, the inference workload can be scaled vertically on heterogeneous hardware engine; while kubernetes also provide HPA(Horizontal POD Autoscaler) for horizontal scale according to the collected inference metrics from the whole system. The flexible scalability in this solution can help to meet with diverse requirements on Edge computing, such as diverse inference model size, diverse input source, etc.

Please get detail about

(NOTES: This project is only for demo purpose, please do not used in any production.)

Architecture

Camera Stream Service/File Stream Service

The input source could be from camera or video file. There are more than input sources to produce frames to different inference queues. For example, there are 3 cameras for face detection at the same time, then all frames from these 3 cameras will be produced to face frame queue.
Frame Queue

The frames are pushed into several frame queues according to inference type like face, people, car, object etc. The frame queue service is based on redis's RPUSH, LPOP functions.
Openvino Inference Engine Service

It pickup individual frame from the stream queue then do inference. For specific inference type (people/face/car/object), there is at least 1 replica. And it could be horizontally pod scaled(HPA) according to collected metrics like drop frame speed, infer FPS or CPU usage on kubernetes.

With different models' input, the inference service can be used for any recognition or detection. Following models are used in this solution for demo purpose:
- people/body detection: SqueezeNetSSD-5Class
- face detection (INT8/FP32): uses face-detection-retail-0005
- car detection (INT8/FP32): uses person-vehicle-bike-detection-crossroad-0078
Note: This project will not provide above models for downloading, but the container's build script will help to download when constructing the container image on your own.
Stream Broker Service

The inference result is sent to stream broker with its IP/name information for further actions like serverless function, dashboard etc. The stream broker also use redis and is the same one for frame queue by default.
Stream Websocket Server

The HTML5 SPA(Single Page Application) could only pull stream via websocket protocol. So this server subscribes all result stream from broker and setup individual websocket connection for each inference result stream.
SPA Dashboard

It is based on HTML5 and VUE framework. THe front-end will query stream information from gateway via RESTful API http://<gateway address>/api/stream, then render all streams by establishing the connection to websocket server ws://<gateway address>/<stream name>
Gateway Server

Gateway provides a unified interface for the backend servers:
- http://<gateway>: Dashboard SPA web server
- http://<gateway>/api/: Restful API server
- ws://<gateway>/<stream_name>: Stream websocket server.

Getting Start

Prerequisite

This project does not provide the container image, so you need have your own docker registry to build container image for testing and playing. It is easy to get your own registry from http:/hub.docker.com

Build container image

The build script helps to create all required container images and publish to your own docker registry as follows:

./container/build.sh -r <your own registry name>

NOTE: Please get detail options and arguments for build.sh via ./container/build.sh -h

Deploy & Test on kubernetes cluster

Note: This project has been tested on minikube cluster with kubernetes at versions 1.15.0, 1.16.0, 1.17.0.

Generate kubernetes yaml file with your own registry name like:

tools/tools/gen-k8s-yaml.sh -f kubernetes/elastic-inference.yaml.template -r <your container registry>

Deploy the core services as:

kubectl apply -f kubernetes/elastic-inference.yaml -n <your prefer namespace>

Note: -n <your prefer namespace is optional. Default namespace will be adopted without -n.

Test by sample video file as:

kubectl apply -f kubernetes/sample-infer/ -n <your prefer namespace>

Note: -n <your prefer namespace is optional. Default namespace will be adopted without -n.

After the above steps, the kubernete cluster will expose two services via NodePort:

<k8s cluster IP>:31003 Frame queue service to accept any external frame producer from IP cameras.
<k8s cluster IP>:31002 Dashboard SPA web for result preview as follows:

You can also run INT8 and FP32 inference model at same time as follows:

Test camera stream producing for inference

tools/run-css.sh -v 0 -q <kubernetes cluster address> -p 31002

-v 0: for /dev/video0
-q <kubernetes cluster address>: Kubernete cluster external address
-p 31002: By default, the redis base frame queue service is at this port. Note: Please get detail options and arguments for run-css.sh script via ./tools/run-css.sh -h.

Monitor Inference Metrics

After deployed on kubernetes clusters, you can monitor following metrics

Inference FPS from individual inference engine = ei_infer_fps
Total inference FPS:
Drop FPS = ei_drop_fps
Total drop FPS:
Scale Ratio value used to do horizontal scale:

Please get detail at Inference Metrics

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Dynamically Scaling Video Inference at the Edge using Kubernetes

Design Principle

Architecture

Getting Start

Prerequisite

Build container image

Deploy & Test on kubernetes cluster

Monitor Inference Metrics

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
apps		apps
container		container
doc		doc
kubernetes		kubernetes
spa		spa
tools		tools
README.md		README.md

Folders and files

Latest commit

History

Repository files navigation

Dynamically Scaling Video Inference at the Edge using Kubernetes

Design Principle

Architecture

Getting Start

Prerequisite

Build container image

Deploy & Test on kubernetes cluster

Monitor Inference Metrics

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages