Aaron Behman, Director of Strategic Marketing, Embedded Vision, Xilinx, Inc.
Adam Taylor CEng FIET, Embedded Systems Consultant.
A growing number of smart systems in the automotive, medical, industrial and scientific spaces are dependent on high-quality image capture and processing, often at high speed and in full colour. The preceding article of this series discussed selection criteria for image sensors. This article examines key challenges and decisions encountered when developing the image-processing system.
Time to market can often be a critical pressure that can determine which aspects of the system are developed in-house, representing value-added activity, and which are purchased as Commercial Off The Shelf (COTS) blocks or subcontracted for development. Focusing on value-added activities and leveraging IP modules at the hardware, software and FPGA levels are key enabling factors to meeting time to market.
As far as the technical challenges are concerned, embedded vision systems are typically developed for applications where size, weight, power and cost – often called SWaP-C – are driving factors. One way to improve SWaP-C is through tighter system integration, particularly in the processing system.
Image-Processing Pipeline and Algorithms
Almost all embedded vision systems incorporate an image-processing pipeline that interfaces with the selected sensor and performs the operations required to produce an image suitable for either further processing or transmission over a network.
Within this image processing pipeline, various algorithms are applied to the received images depending upon the application being implemented. There are a number of commonly used algorithms for processes such as sharpening the image, improving contrast, or detecting features, objects or movement.
These algorithms should be developed within a framework that allows the shortest possible time to market and promotes re-use of proven IP, while reducing non-recurring and recurring engineering costs. A number of frameworks are worth considering.
* OpenVX – Open-source application for development of image-processing applications
* OpenCV – Open-source Computer Vision, which comprises a number of libraries aimed at real-time computer vision based on C / C++
* OpenCL – Open-source Computer Language based upon C++ for developing applications for parallel processed applications as seen in GPU, FPGA, etc.
* SDSoC -Xilinx design environment that allows developers to initially implement algorithms written in C / C++ in the ARM(r) processing system of a Zynq(r) or UltraScale+ MPSoC device, profile the code base to identify performance bottlenecks, and then using Xilinx High Level Synthesis (HLS) to translate those bottlenecks into hardware-enabled IP that run in the programmable logic (PL) portion of the device.
Use of these frameworks coupled with HLS in a FPGA or All Programmable SoC design flow allows for efficient development of embedded vision applications which can be quickly demonstrated with hardware in the loop.
Once the image completes the processing pipeline how the data is output from the system is also important. At the highest level there are three broad choices. One of these is to output the image to a display using a standard like VGA, HDMI, SDI or DisplayPort. On the other hand, the image (or information extracted from it) may be transmitted elsewhere, such as to The Cloud, for further processing. A third option is to store the images on non-volatile media to be accessed at a later date.
For the majority of these high-level choices at the completion of the imaging chain, it is important to consider the image format to be used. This presents the choice of encoding the image using an industry-standard compression algorithm such as H.264 (MPEG-4 Part 10 Advanced Video Coding) or H.265 (High Efficiency Video Coding). Implementations of these algorithms are often called Codecs, and allow for more efficient utilisation of communication and network bandwidth or a reduction in the storage footprint, at the cost of a small loss of fidelity. N applications where such a trade-off is not acceptable, the image can be transmitted or stored in its raw format or encoded in a lossless format.
Most codec implementations use a different colour space to that which is output by typical colour image sensors. The most commonly used colour spaces within embedded vision are:
* Red, Green, Blue – This contains the RGB information as output from the image sensor, it is commonly used as an output for simple interfaces like VGA
* YUV – This contains Luma (Y) and the chrominance (U & V), and is used for most codecs and some display standards. Commonly used formats are YUV4:4:4 and YUV4:2:2. With 4:4:4 each pixel is represented by eight bits making for a 24-bit pixel. With a 4:2:2 format the U and V values are shared between pixels allowing for a more memory-efficient 16-bit pixel depth.
One further decision that has a considerable impact on the image-processing chain and SWAP-C is the choice of where the majority of the image processing is to be implemented. This may be within the embedded vision system itself, which enables faster response times but also requires higher processing and memory resources, leading to higher power demand. This will be the most common approach for embedded applications like ADAS or machine vision.