Imperial College London

Professor Peter Y. K. Cheung

Faculty of EngineeringDyson School of Design Engineering

Head of the Dyson School of Design Engineering
 
 
 
//

Contact

 

+44 (0)20 7594 6200p.cheung Website

 
 
//

Assistant

 

Mrs Wiesia Hsissen +44 (0)20 7594 6261

 
//

Location

 

910BElectrical EngineeringSouth Kensington Campus

//

Summary

 

Publications

Citation

BibTex format

@inproceedings{Davis:2016:10.1007/978-3-319-30481-6_31,
author = {Davis, JJ and Cheung, PYK},
doi = {10.1007/978-3-319-30481-6_31},
pages = {361--368},
publisher = {Springer},
title = {Reduced-precision Algorithm-based Fault Tolerance for FPGA-implemented Accelerators},
url = {http://dx.doi.org/10.1007/978-3-319-30481-6_31},
year = {2016}
}

RIS format (EndNote, RefMan)

TY  - CPAPER
AB - As the threat of fault susceptibility caused by mechanisms including variation and degradation increases, engineers must give growing consideration to error detection and correction. While the use of common fault tolerance strategies frequently causes the incursion of significant overheads in area, performance and/or power consumption, options exist that buck these trends. In particular, algorithm-based fault tolerance embodies a proven family of low-overhead error mitigation techniques able to be built upon to create self-verifying circuitry. In this paper, we present our research into the application of algorithm-based fault tolerance (ABFT) in FPGA-implemented accelerators at reduced levels of precision. This allows for the introduction of a previously unexplored tradeoff: sacrificing the observability of faults associated with low-magnitude errors for gains in area, performance and efficiency by reducing the bit-widths of logic used for error detection. We describe the implementation of a novel checksum truncation technique, analysing its effects upon overheads and allowed error. Our findings include that bit-width reduction of ABFT circuitry within a fault-tolerant accelerator used for multiplying pairs of 32 x 32 matrices resulted in the reduction of incurred area overhead by 16.7% and recovery of 8.27% of timing model Fmax. These came at the cost of introducing average and maximum absolute output errors of 0.430% and 0.927%, respectively, of the maximum absolute output value under transient fault injection.
AU - Davis,JJ
AU - Cheung,PYK
DO - 10.1007/978-3-319-30481-6_31
EP - 368
PB - Springer
PY - 2016///
SN - 0302-9743
SP - 361
TI - Reduced-precision Algorithm-based Fault Tolerance for FPGA-implemented Accelerators
UR - http://dx.doi.org/10.1007/978-3-319-30481-6_31
UR - https://link.springer.com/chapter/10.1007/978-3-319-30481-6_31
UR - http://hdl.handle.net/10044/1/31158
ER -