Imperial College London

ProfessorWayneLuk

Faculty of EngineeringDepartment of Computing

Professor of Computer Engineering
 
 
 
//

Contact

 

+44 (0)20 7594 8313w.luk Website

 
 
//

Location

 

434Huxley BuildingSouth Kensington Campus

//

Summary

 

Publications

Citation

BibTex format

@article{Fan:2018:10.1109/TVLSI.2018.2797600,
author = {Fan, X and Wu, D and Cao, W and Luk, W and Wang, L},
doi = {10.1109/TVLSI.2018.2797600},
journal = {IEEE Transactions on Very Large Scale Integration (VLSI) Systems},
pages = {1098--1111},
title = {Stream processing dual-track CGRA for object inference},
url = {http://dx.doi.org/10.1109/TVLSI.2018.2797600},
volume = {26},
year = {2018}
}

RIS format (EndNote, RefMan)

TY  - JOUR
AB - With the development of machine learning technology, the exploration of energy-efficient and flexible architectures for object inference algorithms is of growing interest in recent years. However, not many publications concentrate on a coarse-grained reconfigurable architecture (CGRA) for object inference algorithms. This paper provides a stream processing, dual-track programming CGRA-based approach to address the inherent computing characteristics of algorithms in object inference. Based on the proposed approach, an architecture called stream dual-track CGRA (SDT-CGRA) is presented as an implementation prototype. To evaluate the performance, the SDT-CGRA is realized in Verilog HDL and implemented in Semiconductor Manufacturing International Corporation 55-nm process, with the footprint of 5.19 mm & #x00B2; at 450 MHz. Seven object inference algorithms, including convolutional neural network (CNN), k-means, principal component analysis (PCA), spatial pyramid matching (SPM), linear support vector machine (SVM), Softmax, and Joint Bayesian, are selected as benchmarks. The experimental results show that the SDT-CGRA can gain on average 343.8 times and 17.7 times higher energy efficiency for Softmax, PCA, and CNN, 621.0 times and 1261.8 times higher energy efficiency for k-means, SPM, linear-SVM, and Joint-Bayesian algorithms when compared with the Intel Xeon E5-2637 CPU and the Nvidia TitanX graphics processing unit. When compared with the state-of-the-art solutions of AlexNet on field-programmable gate array and CGRA, the proposed SDT-CGRA can achieve a 1.78 times increase in energy efficiency and a 13 times speedup, respectively.
AU - Fan,X
AU - Wu,D
AU - Cao,W
AU - Luk,W
AU - Wang,L
DO - 10.1109/TVLSI.2018.2797600
EP - 1111
PY - 2018///
SN - 1063-8210
SP - 1098
TI - Stream processing dual-track CGRA for object inference
T2 - IEEE Transactions on Very Large Scale Integration (VLSI) Systems
UR - http://dx.doi.org/10.1109/TVLSI.2018.2797600
UR - http://hdl.handle.net/10044/1/58712
VL - 26
ER -