Publications

BibTex format

@article{Fan:2018:10.1109/TVLSI.2018.2797600,
author = {Fan, X and Wu, D and Cao, W and Luk, W and Wang, L},
doi = {10.1109/TVLSI.2018.2797600},
journal = {IEEE Transactions on Very Large Scale Integration (VLSI) Systems},
pages = {1098--1111},
title = {Stream processing dual-track CGRA for object inference},
url = {http://dx.doi.org/10.1109/TVLSI.2018.2797600},
volume = {26},
year = {2018}
}

Download

RIS format (EndNote, RefMan)

TY  - JOUR
AB  - With the development of machine learning technology, the exploration of energy-efficient and flexible architectures for object inference algorithms is of growing interest in recent years. However, not many publications concentrate on a coarse-grained reconfigurable architecture (CGRA) for object inference algorithms. This paper provides a stream processing, dual-track programming CGRA-based approach to address the inherent computing characteristics of algorithms in object inference. Based on the proposed approach, an architecture called stream dual-track CGRA (SDT-CGRA) is presented as an implementation prototype. To evaluate the performance, the SDT-CGRA is realized in Verilog HDL and implemented in Semiconductor Manufacturing International Corporation 55-nm process, with the footprint of 5.19 mm & #x00B2; at 450 MHz. Seven object inference algorithms, including convolutional neural network (CNN), k-means, principal component analysis (PCA), spatial pyramid matching (SPM), linear support vector machine (SVM), Softmax, and Joint Bayesian, are selected as benchmarks. The experimental results show that the SDT-CGRA can gain on average 343.8 times and 17.7 times higher energy efficiency for Softmax, PCA, and CNN, 621.0 times and 1261.8 times higher energy efficiency for k-means, SPM, linear-SVM, and Joint-Bayesian algorithms when compared with the Intel Xeon E5-2637 CPU and the Nvidia TitanX graphics processing unit. When compared with the state-of-the-art solutions of AlexNet on field-programmable gate array and CGRA, the proposed SDT-CGRA can achieve a 1.78 times increase in energy efficiency and a 13 times speedup, respectively.
AU  - Fan,X
AU  - Wu,D
AU  - Cao,W
AU  - Luk,W
AU  - Wang,L
DO  - 10.1109/TVLSI.2018.2797600
EP  - 1111
PY  - 2018///
SN  - 1063-8210
SP  - 1098
TI  - Stream processing dual-track CGRA for object inference
T2  - IEEE Transactions on Very Large Scale Integration (VLSI) Systems
UR  - http://dx.doi.org/10.1109/TVLSI.2018.2797600
UR  - http://hdl.handle.net/10044/1/58712
VL  - 26
ER  -

Download

ProfessorWayneLuk

Contact

Location

Summary

Citation

BibTex format

RIS format (EndNote, RefMan)