share article

Share on facebook
Share on twitter
Share on linkedin

The evolution of dedicated inference accelerators

Series

By Mark Patrick, Mouser Electronics

Deep-learning neural networks (DNNs) are already around us: our personal assistants and smart speakers handle our questions, customer services are increasingly digitalised, and our email client helps compose replies based on message context. Then, there are our cars, where cameras are used for image detection to read road signs or detect obstacles, and which rely on automated driver assistance systems (ADAS), with some models responding to voice commands.

All these applications – and many more – use artificial intelligence (AI) techniques in the form of artificial neural networks (ANNs) that have been programmed to behave like the human brain.

Whatever the task or neural network, however, there are distinct processes involved, principally consisting of two phases: training, where the algorithm learns about the data it will work with, and deployment – or inference – where the network predicts the result.

The hardware at the heart of inference

The majority of computing resources required for training a neural network is found in the lab environment, where many high-performance devices can be used to speed up the algorithm training process. However, once trained, the algorithm’s deployment can end up anywhere. This requires carefully developing the hardware platform on which the trained model will be used.

Traditionally, high-performance CPUs, FPGAs and GPUs have been the devices of choice for training a neural network; however, now there are new inference-dedicated devices to consider. FPGAs and GPUs now coexist with high-performance SoC CPUs and high-bandwidth memory on a standard single-slot rack-mounted card.

One recently-launched neural network accelerator is the Intel Neural Compute Stick 2 (NCS2), designed as a dedicated hardware-based inference accelerator optimised for computer vision applications. Capable of delivering up to 4 trillion operations per second (TOPS), it is packaged into a standard USB thumb-drive format, crammed with computing power.

NCS2 is based around an Intel Movidius Myriad vision processing unit (VPU) that has 16 x 128-bit very long instruction word (VLIW) streaming hybrid architecture vector engine (SHAVE) cores. A separate on-chip low-power hardware block provides high throughput capabilities for neural network acceleration. Once a model is thoroughly tested and ready for full-scale inference deployment, there are some single-board computer modules, such as the Aaeon UP AI core, that use the Intel Movidius Myriad VPU on which the application could run.

Supporting open-source software toolkits

In addition to hardware innovations, there is also a wide range of open-source software available that further speed the application deployment. An example of this is Intel’s OpenVINO (open visual inference and neural network optimisation) toolkit that supports the Intel NCS2 acceleration platform. Aimed at machine-vision CNN-based designs, OpenVINO supports heterogeneous deployment across CPUs, FPGAs and GPUs using a common API.

OpenVINO comprises a comprehensive library of functions, pre-optimised kernels and example pre-trained models, in addition to a model optimiser that converts models from popular neural network frameworks such as Caffe and TensorFlow.

Henceforth, deploying neural networks is becoming much easier thanks to off-the-shelf AI-optimised hardware platforms, software frameworks and inference-based software tool chains.

Share this article

Share on facebook
Share on twitter
Share on linkedin

Member Login