By Yingxiu Chang, Ying Zhang, Liwei Liao, Jian Cao and Dunshan Yu, Peking University, Beijing, China
The development of neural network algorithms has gathered significant pace these days, as they become widely implemented in a variety of real-time image- and pattern-recognition applications.
Neural networks tend to use field-programmable gate arrays (FPGA), and to architect our algorithm accelerator, we turned to these devices first, but found some problems with them. For a start, they offer fewer pipelines and have redundant pipelines during convolution, which results in low performance. They also use smaller processing elements (PEs) to fit different size convolutional kernels and avoid computational bottlenecks, but these further delay the convolutional operations and reduce system peak performance. In 2017, FPGAs were placed at the heart of a design for frequency-domain acceleration of a convolutional neural network (CNN) by implementing an Overlap and Add (OaA) convolver, which consists of fixed-size FFTs that require zero padding during the convolutional procedure.
Now, we’ve devised a 2D Sliding Window of Convolutional Kernel (SWCK) IP softcore, and formed an Accelerator of a Bubbling Convolutional Layer (ABCL).ABCL contains n convolution kernels, each made of SWCKs to create depths for the input feature map.