Imperial College London

ProfessorWayneLuk

Faculty of EngineeringDepartment of Computing

Professor of Computer Engineering
 
 
 
//

Contact

 

+44 (0)20 7594 8313w.luk Website

 
 
//

Location

 

434Huxley BuildingSouth Kensington Campus

//

Summary

 

Publications

Citation

BibTex format

@inproceedings{Zhao:2016:10.1109/ASAP.2016.7760779,
author = {Zhao, W and Fu, H and Luk, W and Yu, T and Wang, S and Feng, B and Ma, Y and Yang, G},
doi = {10.1109/ASAP.2016.7760779},
pages = {107--114},
publisher = {IEEE},
title = {F-CNN: An FPGA-based Framework for Training Convolutional Neural Networks},
url = {http://dx.doi.org/10.1109/ASAP.2016.7760779},
year = {2016}
}

RIS format (EndNote, RefMan)

TY  - CPAPER
AB - This paper presents a novel reconfigurable framework for training Convolutional Neural Networks (CNNs). The proposed framework is based on reconfiguring a streaming datapath at runtime to cover the training cycle for the various layers in a CNN. The streaming datapath can support various parameterized modules which can be customized to produce implementations with different trade-offs in performance and resource usage. The modules follow the same input and output data layout, simplifying configuration scheduling. For different layers, instances of the modules contain different computation kernels in parallel, which can be customized with different layer configurations and data precision. The associated models on performance, resource and bandwidth can be used in deriving parameters for the datapath to guide the analysis of design trade-offs to meet application requirements or platform constraints. They enable estimation of the implementation specifications given different layer configurations, to maximize performance under the constraints on bandwidth and hardware resources. Experimental results indicate that the proposed module design targeting Maxeler technology can achieve a performance of 62.06 GFLOPS for 32-bit floating-point arithmetic, outperforming existing accelerators. Further evaluation based on training LeNet-5 shows that the proposed framework achieves about 4 times faster than CPU implementation of Caffe and about 7.5 times more energy efficient than the GPU implementation of Caffe.
AU - Zhao,W
AU - Fu,H
AU - Luk,W
AU - Yu,T
AU - Wang,S
AU - Feng,B
AU - Ma,Y
AU - Yang,G
DO - 10.1109/ASAP.2016.7760779
EP - 114
PB - IEEE
PY - 2016///
SN - 2160-052X
SP - 107
TI - F-CNN: An FPGA-based Framework for Training Convolutional Neural Networks
UR - http://dx.doi.org/10.1109/ASAP.2016.7760779
UR - http://gateway.webofknowledge.com/gateway/Gateway.cgi?GWVersion=2&SrcApp=PARTNER_APP&SrcAuth=LinksAMR&KeyUT=WOS:000392195600014&DestLinkType=FullRecord&DestApp=ALL_WOS&UsrCustomerID=1ba7043ffcc86c417c072aa74d649202
UR - http://hdl.handle.net/10044/1/50990
ER -