share article

Share on facebook
Share on twitter
Share on linkedin

Vision-Based System Design Part 8 – Accelerated Cloud Computing for High-Performance Image Recognition


Stack Streamlines FPGA Development
To fully leverage the capabilities provided by programmable logic, an ecosystem is needed that enables development using current industry-standard frameworks and libraries. The Xilinx Reconfigurable Acceleration Stack (RAS) answers this need, by streamlining FPGA-hardware creation, application development and integration. Hyperscale data centres can use these tools to jump-start development: several major operators are currently working with Xilinx to boost performance and service agility by introducing FPGA acceleration in their server farms, making extreme high-performance compute capacity available to customers as a web service.

Like the reVISION™ stack for embedded vision development, which was described in the previous article in this series, the RAS leverages High Level Synthesis (HLS) for efficient development of programmable logic in C / C++ / OpenCL® and System C. This HLS capability is then combined with library support for industry-standard frameworks and libraries such as OpenCV, OpenVX, Caffe, FFmpeg and SQL, creating an ecosystem that can be extended in the future to add support for new frameworks and standards as they are introduced.

Also like the reVISION stack, the RAS is organised in three distinct layers to address hardware, application and provisioning challenges. The lowest layer of the stack, the platform layer, is concerned with the hardware platform comprising the selected FPGA or SoC upon which the remainder of the stack is to be implemented. The RAS includes a single slot PCIe® half‐length full-height development board and a reference design, which are created specifically to support machine learning and other computationally intensive applications like video transcoding and data analytics.
The second level of the RAS is the application layer. This uses the Vivado® Design Suite and SDAccel™ development environment, leveraging HLS to implement the application. SDAccel contains an architecturally optimising compiler for FPGA acceleration, which enables up to 25 times better performance per Watt compared with typical processing platforms comprising conventional x86 server CPUs and/or Graphics Processing Units (GPUs). The environment is featured to deliver CPU/GPU-like development and run-time experiences by ensuring easy application optimisation, providing CPU/GPU-like on-demand loadable compute units, maintaining consistency throughout program transitions and application execution, and handling the sharing of FPGA accelerators across multiple applications.

For machine learning applications, DNN (deep neural networking) and GEMM (general matrix multiplication) libraries are available on the Caffe framework, as shown in figure 1. Libraries for other frameworks such as deep learning TensorFlow, Torch, and Theano are expected to be added later. It is worth noting at this point that the scope of RAS is not limited to machine vision or deep learning: as figure 1 shows, other libraries are included that support MPEG processing using FFmpeg as well as data movers and compute kernels for data analytics on the SQL framework.
The third level of the RAS is the provisioning layer, and uses OpenStack to enable integration within the data centre. OpenStack is a free, open-source software platform that comprises multiple components for managing and controlling resources such as processing, storage and networking equipment from multiple vendors.

Performance Boost, with Power Savings
By using the RAS to streamline the creation of Cloud-class FPGA-based computing, a significant increase in compute capability can be achieved, compared with processing on conventional CPUs. Image processing algorithms can be accelerated by as much as 40 times, while deep machine learning can be up to 11 times faster. In addition, hardware requirements are reduced, which lowers power consumption thereby resulting in a dramatic increase in performance per Watt. Moreover, the FPGA-based engine has the important advantage of being reconfigurable and so can be quickly and repeatedly re-optimised for different types of algorithms as they are called to be executed.

Automatic image analysis and object recognition applications can benefit from the increased performance and reduced power consumption offered by highly optimised, reconfigurable FPGA-based processing engines. Whether the application is to run on an embedded system or in the Cloud, using an acceleration stack enables developers to overcome design and integration challenges, reduce time to market and maximise overall performance. Both the reVISION stack for embedded development and the Reconfigurable Acceleration Stack for building Cloud-based FPGA compute engines assemble the necessary hardware and software resources and can adapt to support frameworks and standards as they are introduced.

For more information, please visit:


Share this article

Share on facebook
Share on twitter
Share on linkedin

Related Posts

View Latest Magazine

Subscribe today

Member Login