Imperial College London

ProfessorWayneLuk

Faculty of EngineeringDepartment of Computing

Professor of Computer Engineering
 
 
 
//

Contact

 

+44 (0)20 7594 8313w.luk Website

 
 
//

Location

 

434Huxley BuildingSouth Kensington Campus

//

Summary

 

Publications

Citation

BibTex format

@article{Gan:2015:10.1145/2629581,
author = {Gan, L and Fu, H and Luk, W and Yang, C and Xue, W and Huang, X and Zhang, Y and Yang, G},
doi = {10.1145/2629581},
journal = {ACM Transactions on Reconfigurable Technology and Systems},
title = {Solving the global atmospheric equations through heterogeneous reconfigurable platforms},
url = {http://dx.doi.org/10.1145/2629581},
volume = {8},
year = {2015}
}

RIS format (EndNote, RefMan)

TY  - JOUR
AB - One of the most essential and challenging components in climate modeling is the atmospheric model. To solve multiphysical atmospheric equations, developers have to face extremely complex stencil kernels that are costly in terms of both computing and memory resources. This article aims to accelerate the solution of global shallow water equations (SWEs), which is one of the most essential equation sets describing atmospheric dynamics. We first design a hybrid methodology that employs both the host CPU cores and the field-programmable gate array (FPGA) accelerators to work in parallel. Through a careful adjustment of the computational domains, we achieve a balanced resource utilization and a further improvement of the overall performance. By decomposing the resource-demanding SWE kernel, we manage to map the double-precision algorithm into three FPGAs. Moreover, by using fixed-point and reduced-precision floating point arithmetic, we manage to build a fully pipelined mixed-precision design on a single FPGA, which can perform 428 floating-point and 235 fixed-point operations per cycle. The mixed-precision design with four FPGAs running together can achieve a speedup of 20 over a fully optimized design on a CPU rack with two eight-core processorsand is 8 times faster than the fully optimized Kepler GPU design. As for power efficiency, the mixed-precision design with four FPGAs is 10 times more power efficient than a Tianhe-1A supercomputer node.
AU - Gan,L
AU - Fu,H
AU - Luk,W
AU - Yang,C
AU - Xue,W
AU - Huang,X
AU - Zhang,Y
AU - Yang,G
DO - 10.1145/2629581
PY - 2015///
SN - 1936-7414
TI - Solving the global atmospheric equations through heterogeneous reconfigurable platforms
T2 - ACM Transactions on Reconfigurable Technology and Systems
UR - http://dx.doi.org/10.1145/2629581
UR - http://hdl.handle.net/10044/1/23870
VL - 8
ER -