VMware Private AI with Intel is a very interesting support for OpenVINO. To illustrate what all OpenVINO enables, here is a summary of the thesis ACCELERATION OF FACE RECOGNITION ALGORITHM WITH NEURAL COMPUTE STICK 2 by my daughter Eva Micankova, with her permission here is a brief summary. The acceleration speed using NCS2, the Facenet model accuracy 97,08 % achieved a frame rate of 15.35 FPS and a latency of 65.164 ms – NUC Maxtang NX6412.

OpenVINO tools

Intel’s suite of OpenVINO tools. This open-source set of tools offers development tools for the optimization and deployment of deep learning models. It delivers better performance for vision, audio, and language models from popular frameworks such as TensorFlow, Caffe, PyTorch, and others. OpenVINO optimizes deep learning pipelines through memory reuse, graph fusion, load balancing, and inference parallelism across CPUs, GPUs, VPUs, and others, as seen in the figure. Accelerators can have additional operations for pre-processing and post-processing transferred or integrated to reduce latency between endpoints and improve throughput.

Popular algorithms FaceNet, SphereFace, and ArcFace, which differ in architecture and training procedures, all aim to learn a vector representation of the face that is robust to changes in conditions.


The FaceNet model was developed by Google’s research group in 2015. The model maps faces of individuals into clusters of geometric points (Euclidean spaces) referred to as an embedding, which is obtained from the measure of similarity and difference of faces.


The authors of SphereFace introduced a loss function A-Softmax, derived from the softmax loss function, in their work published in 2017. The A-Softmax (Angular Softmax) loss function was designed to learn discriminative facial features with a clear geometric interpretation, which no available face recognition algorithm offered until then.


ArcFace (Additive Angular Margin loss) is a loss function first introduced in 2018. It builds on the previous work of SphereFace, which introduced the concept of angular margin, which helps improve class separability and thereby the performance of face recognition. However, their loss function required a series of approximations, which led to unstable network training. In addition, the standard softmax loss function dominated training, meaning that the concept of angular margin was not fully utilized. ArcFace introduces a new loss function that aims to address these shortcomings. It introduced the Additive Angular Margin loss function, which allows for better class separability and more stable training without the need for approximations used in SphereFace.


Vision Processing Unit (VPU) accelerators are chips created to accelerate image processing using computer vision and deep learning algorithms. The Intel Neural Compute Stick 2 is a powerful, affordable, and compact solution, with low power consumption, for accelerating neural networks. It is designed to run deep neural networks at high speeds with low energy consumption without losing accuracy, enabling real-time computer vision processing.


One Intel NSC2 was used for accelerating face detection and a second Intel NSC2 was utilized for face recognition

Figure A.4

Graph showing the accuracy of all validated models depending on the changing threshold. ArcFace achieved the highest accuracy of 0.84 at a threshold value of 0.79. SphereFace achieved the highest accuracy of 0.77 for a threshold of 0.74. FaceNet achieved the best accuracy of all the compared models, at 0.982 for a threshold value of 0.57.

Figure 6.13

Graph comparing the achieved frame rate for each series.

  • 1st series – ArcFace model, one person
  • 2nd series – ArcFace model, two people
  • 3rd series – FaceNet model, one person
  • 4th series – FaceNet model, two people
  • 5th series – SphereFace model, one person
  • 6th series – SphereFace model, two people


The best results were achieved by the FaceNet model, which reached an accuracy of 97.08%. The second part of the experiments focused on evaluating the speed of recognition on different platforms. The experiments provided answers to questions about the accuracy achieved by each model, the results of system acceleration using CPU, GPU, and NCS2, which configuration is most suitable for each model, and which configuration achieves the highest frame rate and lowest latency among the compared models. The best frame rates and latency were achieved by the SphereFace model, accelerated using NCS2, with 17.15 FPS and a latency of 58.293 ms. The FaceNet model achieved a frame rate of 15.35 FPS and a latency of 65.164 ms. For achieving a balance between accuracy and speed, the best system configuration is with the FaceNet model, accelerated using NCS2.


ACCELERATION OF FACE RECOGNITION ALGORITHM WITH NEURAL COMPUTE STICK 2 thesis focuses on the issue of facial recognition in a face image using neural networks and its acceleration. It provides an overview of previously used techniques and addresses the use of currently dominant convolutional neural networks to solve this issue. The work also focuses on acceleration mechanisms that can be used in this area. Based on the knowledge of the issue, a system based on the concept of edge computing was created, which can be used as a home security system connected to an IP camera, which sends a notification about the presence of an unknown person in a guarded area.

Author: Daniel Micanek

Senior Service Architect, SAP Platform Services Team at Tietoevry | SUSE SCA | vExpert ⭐⭐⭐⭐⭐ | vExpert NSX | VCIX-DCV/NV | VCAP-DCV/NV Design+Deploy | VCP-DCV/NV/CMA/TKO/DTM | NCIE-DP | OCP | Azure Solutions Architect | Certified Kubernetes Administrator (CKA)