Convolutional neural networks in image classification

Series

4 October 2019

By Mark Patrick, Mouser Electronics

A convolutional neural network (CNN) is a deep learning artificial intelligence neural network designed to work with inputs structured in a grid format, such as a two-dimensional image for example. As with all artificial neural networks (ANNs), the concept comes from a biological neural network, where specific cells within the visual cortex become active during shape detection. The scientists involved in biological research of this kind developed a model that would become the basis of a CNN algorithm for image classification.

Detection and classification

Before any convolutional neural network is properly functional, it must be trained, i.e. it must learn the items it needs to detect and classify. This training involves using pairs of input and output information; for example, if the application is to identify different crop seed types, the input would be a high-resolution image of a seed, with its name as output. This might appear to be a simple exercise, but, as we already know, in nature there are many small variations; say, for surfaces, there’s texture, colour and size. The addition of a third output value, that of seed quality, means that if the CNN could predict not only the seed type but also its quality and condition, this would add more commercial value to the deep learning approach.

Training, or learning, involves the neural network receiving input/output pairs for every possible seed type to be detected. This self-learning phase is computationally-intensive, and to increase the accuracy of seed and quality identification means the CNN needs to “see” a large number of samples (possibly hundreds) of each seed of varying quality and in different light conditions. Not only does training require access to a lot of computing power, but the process of photographing hundreds of images is extremely time-consuming and requires seed experts to annotate each seed image with its quality. For some applications, neural network developers can use a publicly-available database of images that can speed up this process, such as ImageNet which contains over 14 million images covering everything from animals to geological features, people and vehicles.

Figure 1: Convolutional neural network (CNN) architecture

Training a convolutional network

A CNN consists of several different convolution- and pooling-layers and a final connected layer; see Figure 1. Each layer has multiple activation neuron inputs – electronic equivalents of biological neurons. Weights strengthen the value of each neuron, placing more emphasis on an individual neuron’s contribution, which is the essential element during training.

The pixel values within a given section of an image to be identified are “convoluted” into a single value, which are then summarised within the pooling layers. Filters are applied to these values to detect edges and features, such as curves and lines, in a given image. The progressive pooling layers increase the level of abstraction of the image, starting with edges, then shapes and finally objects. The final connected layer outputs what it believes to be the correct result.

The training phase uses a technique called “back propagation”, where the CNN sets the weight values mentioned earlier.

Training the neural network is not a single-stage process. Once the initial training has taken place, the results need careful evaluation. Typically, this evaluation phase might be undertaken several times during the training using a number of the sample images to determine how the network is operating, and, if required, changes will be made to the algorithm’s operation. The error rate in particular is an important metric of neural network performance, indicating how confident it is in identifying an image of a specific seed and its quality that it has not seen before, reliably and repeatably.

MRB195Fig2 Convolutional neural networks in image classification

Figure 2: Training the convolutional neural network

The final step

Once the trained CNN has been thoroughly evaluated, it is ready for use. To aid this and the overall development process, there are several deep learning neural network frameworks available, including TensorFlow and Caffe. Each comprises a collection of software libraries, APIs, model examples and application tools to make the development task easier.

The frameworks are also helpful to determine the hardware resources and environment required for use in the production systems.

Inference is the term used to describe an applied and running neural network. Training will typically take place on high-performance workstation and datacentre server installations, with standalone FPGA- or GPU-based equipment running the applications. Increasingly the need to provide inference on small, battery-powered or low-compute resource devices has led to so-called “inference engines”, which are highly optimised and low-power systems designed for these specific inference tasks.

A host of industrial and commercial applications now rely on CNNs, from machine-vision systems detecting defective parts on a fast-moving production line, through car license-plate recognition for car park payment systems and toll roads, to facial recognition for company security purposes. But we expect to see many more applications very soon.