Giles Peckham, Regional Marketing Director at Xilinx
Adam Taylor CEng FIET, Embedded Systems Consultant
So far in this series, we have examined how we can implement image processing and machine learning applications both at the edge and within the cloud. When it comes to implementing machine learning applications we need to train the neural network before we can use it. This training provides the weights and biases used by the inference engine to implement the neural network.
For image processing applications, the most common network type is a Convolutional Neural Network (CNN) as it has the ability to process two-dimensional inputs. There exist several different CNN implementations which have evolved over the years, from AlexNet to GoogleNet, SDD and FCN. However, they are formed of the same basic functional blocks, although with different parameterisations, and these stages are Convolution, Rectified Linear Unit (reLU), Max Pooling and Fully Connected. Training defines parameters for the Convolution (filter weights) and Fully Connected layers (weights and bias). While the Max Pool and reLU elements require no weights or biases, although both require parameterisation to define the size of filter kernel and the stride.
GPU Centred Training
Training introduces an additional layer to the network which performs the loss function. This enables the training algorithm to determine if the network correctly identified, or not, the input image from the image set. For an image processing application, we can obtain image sets from places sources such as www.image-net.org. During training, both forward and backward propagation is used to determine if the image is correctly classified, and update the weights and biases based on the error signals and calculated gradients. To apply the image sets and calculate the network coefficients as quickly and as efficiently as possible, large farms of Graphical Processing Units (GPUs) are used. GPU farms are used as the goal of training is to generate the weights and biases within the minimal time frame. Therefore, power efficiency, real time response and determinism for each image frame is not as critical as it is within the actual application. Therefore, GPU farms provide the most efficient mechanism for determining the CNN weights and biases. The generation of these weights and biases does however need to consider the number system which is to be used by the inference engine implementation.
Machine learning applications are increasingly using more efficient, reduced precision fixed-point number systems, such as INT8 representation. The use of fixed-point, reduced precision number systems comes without a significant loss in accuracy when compared with a traditional floating point 32-bit (FP32) approach. However, fixed-point mathematics is considerably easier to implement than floating point; this move to INT8 provides for more efficient, faster solutions in many implementations. To support this generation of reduced precision number systems, the automated Ristretto tool for Caffe enables the training of networks using fixed-point number systems. This removes the need and associated potential performance impact of converting floating point training data into fixed-point training data.
While GPU farms offer a benefit when training neural networks, heterogeneous System on Chips like the Zynq-7000 SoC and Zynq UltraScale+ MPSoC, offer significant advantages when a neural network inference engine is to be deployed. The SoC architecture provides a more responsive and power efficient solution, thanks to the inherent parallel nature of the programmable logic and ease with which the fixed-point number system can be implemented. The ability to develop such an implementation with industry standard frameworks like Caffe using the reVISION stack removes the need for the developers to be HDL specialists.
For more information, please visit: http://www.xilinx.com/products/design-tools/embedded-vision-zone.html