This solution re-architect traditional monolithic inference pipeline to cloud native model. With OpenVINO library, the inference workload can be scaled vertically on heterogeneous hardware engine; while kubernetes also provide HPA(Horizontal POD Autoscaler) for horizontal scale according to the collected inference metrics from the whole system. The flexible scalability in this solution can help to meet with diverse requirements on Edge computing, such as diverse inference model size, diverse input source, etc.
Please get detail about
(NOTES: This project is only for demo purpose, please do not used in any production.)
-
Camera Stream Service/File Stream Service
The input source could be from camera or video file. There are more than input sources to produce frames to different inference queues. For example, there are 3 cameras for face detection at the same time, then all frames from these 3 cameras will be produced to face frame queue.
-
Frame Queue
The frames are pushed into several frame queues according to inference type like face, people, car, object etc. The frame queue service is based on redis's RPUSH, LPOP functions.
-
Openvino Inference Engine Service
It pickup individual frame from the stream queue then do inference. For specific inference type (people/face/car/object), there is at least 1 replica. And it could be horizontally pod scaled(HPA) according to collected metrics like drop frame speed, infer FPS or CPU usage on kubernetes.
With different models' input, the inference service can be used for any recognition or detection. Following models are used in this solution for demo purpose:
- people/body detection: SqueezeNetSSD-5Class
- face detection (INT8/FP32): uses face-detection-retail-0005
- car detection (INT8/FP32): uses person-vehicle-bike-detection-crossroad-0078
Note: This project will not provide above models for downloading, but the container's build script will help to download when constructing the container image on your own.
-
Stream Broker Service
The inference result is sent to stream broker with its IP/name information for further actions like serverless function, dashboard etc. The stream broker also use redis and is the same one for frame queue by default.
-
The HTML5 SPA(Single Page Application) could only pull stream via websocket protocol. So this server subscribes all result stream from broker and setup individual websocket connection for each inference result stream.
-
It is based on HTML5 and VUE framework. THe front-end will query stream information from gateway via RESTful API
http://<gateway address>/api/stream, then render all streams by establishing the connection to websocket serverws://<gateway address>/<stream name> -
Gateway provides a unified interface for the backend servers:
http://<gateway>: Dashboard SPA web serverhttp://<gateway>/api/: Restful API serverws://<gateway>/<stream_name>: Stream websocket server.
This project does not provide the container image, so you need have your own docker registry to build container image for testing and playing. It is easy to get your own registry from http:/hub.docker.com
The build script helps to create all required container images and publish to your own docker registry as follows:
./container/build.sh -r <your own registry name>
NOTE: Please get detail options and arguments for build.sh via ./container/build.sh -h
Note: This project has been tested on minikube cluster with kubernetes at versions 1.15.0, 1.16.0, 1.17.0.
- Generate kubernetes yaml file with your own registry name like:
tools/tools/gen-k8s-yaml.sh -f kubernetes/elastic-inference.yaml.template -r <your container registry>
- Deploy the core services as:
kubectl apply -f kubernetes/elastic-inference.yaml -n <your prefer namespace>
Note: -n <your prefer namespace is optional. Default namespace will be adopted without -n.
- Test by sample video file as:
kubectl apply -f kubernetes/sample-infer/ -n <your prefer namespace>
Note: -n <your prefer namespace is optional. Default namespace will be adopted without -n.
After the above steps, the kubernete cluster will expose two services via NodePort:
<k8s cluster IP>:31003Frame queue service to accept any external frame producer from IP cameras.<k8s cluster IP>:31002Dashboard SPA web for result preview as follows:
You can also run INT8 and FP32 inference model at same time as follows:

- Test camera stream producing for inference
tools/run-css.sh -v 0 -q <kubernetes cluster address> -p 31002
-v 0: for /dev/video0-q <kubernetes cluster address>: Kubernete cluster external address-p 31002: By default, the redis base frame queue service is at this port. Note: Please get detail options and arguments for run-css.sh script via./tools/run-css.sh -h.
After deployed on kubernetes clusters, you can monitor following metrics
- Inference FPS from individual inference engine = ei_infer_fps
- Total inference FPS:

- Drop FPS = ei_drop_fps
- Total drop FPS:

- Scale Ratio value used to do horizontal scale:

Please get detail at Inference Metrics


