Publications

BibTex format

@article{Liu:2018:10.1145/3242900,
author = {Liu, S and Fan, H and Niu, X and Ng, H-C and Chu, Y and Luk, W},
doi = {10.1145/3242900},
journal = {ACM Transactions on Reconfigurable Technology and Systems},
title = {Optimizing CNN-based segmentation with deeply customized convolutional and deconvolutional architectures on FPGA},
url = {http://dx.doi.org/10.1145/3242900},
volume = {11},
year = {2018}
}

Download

RIS format (EndNote, RefMan)

TY  - JOUR
AB  - Convolutional Neural Networks (CNNs) based algorithms have been successful in solving image recognitionproblems, showing very large accuracy improvement. In recent years, deconvolution layers are widely usedas key components in the state-of-the-art CNNs for end-to-end training and models to support tasks suchas image segmentation and super resolution. However, the deconvolution algorithms are computationallyintensive which limits their applicability to real time applications. Particularly, there has been little researchon the efficient implementations of deconvolution algorithms on FPGA platforms which have been widelyused to accelerate CNN algorithms by practitioners and researchers due to their high performance and powerefficiency. In this work, we propose and develop deconvolution architecture for efficient FPGA implementation.FPGA-based accelerators are proposed for both deconvolution and CNN algorithms. Besides, memory sharingbetween the computation modules is proposed for the FPGA-based CNN accelerator as well as for otheroptimization techniques. A non-linear optimization model based on the performance model is introduced toefficiently explore the design space in order to achieve optimal processing speed of the system and improvepower efficiency. Furthermore, a hardware mapping framework is developed to automatically generate thelow-latency hardware design for any given CNN model on the target device. Finally, we implement ourdesigns on Xilinx Zynq ZC706 board and the deconvolution accelerator achieves a performance of 90.1 GOPSunder 200MHz working frequency and a performance density of 0.10 GOPS/DSP using 32-bit quantization,which significantly outperforms previous designs on FPGAs. A real-time application of scene segmentationon Cityscapes Dataset is used to evaluate our CNN accelerator on Zynq ZC706 board, and the system achievesa performance of 107 GOPS and 0.12 GOPS/DSP using 16-bit quantization, and supports up to 17 frames persecond for 512x512 image in
AU  - Liu,S
AU  - Fan,H
AU  - Niu,X
AU  - Ng,H-C
AU  - Chu,Y
AU  - Luk,W
DO  - 10.1145/3242900
PY  - 2018///
SN  - 1936-7406
TI  - Optimizing CNN-based segmentation with deeply customized convolutional and deconvolutional architectures on FPGA
T2  - ACM Transactions on Reconfigurable Technology and Systems
UR  - http://dx.doi.org/10.1145/3242900
UR  - http://hdl.handle.net/10044/1/62876
VL  - 11
ER  -

Download

ProfessorWayneLuk

Contact

Location

Summary

Citation

BibTex format

RIS format (EndNote, RefMan)