Imperial College London

ProfessorPaulKelly

Faculty of EngineeringDepartment of Computing

Professor of Software Technology
 
 
 
//

Contact

 

+44 (0)20 7594 8332p.kelly Website

 
 
//

Location

 

Level 3 (upstairs), William Penney Building, room 304William Penney LaboratorySouth Kensington Campus

//

Summary

 

Publications

Citation

BibTex format

@inproceedings{Strout:2014:10.1109/IPDPS.2014.118,
author = {Strout, MM and Luporini, F and Krieger, CD and Bertolli, C and Bercea, GT and Olschanowsky, C and Ramanujam, J and Kelly, PHJ},
doi = {10.1109/IPDPS.2014.118},
pages = {1136--1145},
publisher = {IEEE Press},
title = {Generalizing Run-Time Tiling with the Loop Chain Abstraction},
url = {http://dx.doi.org/10.1109/IPDPS.2014.118},
year = {2014}
}

RIS format (EndNote, RefMan)

TY  - CPAPER
AB - Many scientific applications are organized in a data parallel way: as sequences of parallel and/or reduction loops. This exposes parallelism well, but does not convert data reuse between loops into data locality. This paper focuses on this issue in parallel loops whose loop-to-loop dependence structure is data-dependent due to indirect references such as A[B[i]]. Such references are a common occurrence in sparse matrix computations, molecu- lar dynamics simulations, and unstructured-mesh computational fluid dynamics (CFD). Previously, sparse tiling approaches were developed for individual benchmarks to group iterations across such loops to improve data locality. These approaches were shown to benefit applications such as moldyn, Gauss-Seidel, and the matrix powers kernel, however the run-time routines for performing sparse tiling were hand coded per application. In this paper, we present a generalized full sparse tiling algorithm that uses the newly developed loop chain abstraction as input, improves inter-loop data locality, and creates a task graph to expose shared-memory parallelism at runtime. We evaluate the overhead and performance impact of the generalized full sparse tiling algorithm on two codes: a sparse Jacobi iterative solver and the Airfoil CFD benchmark.
AU - Strout,MM
AU - Luporini,F
AU - Krieger,CD
AU - Bertolli,C
AU - Bercea,GT
AU - Olschanowsky,C
AU - Ramanujam,J
AU - Kelly,PHJ
DO - 10.1109/IPDPS.2014.118
EP - 1145
PB - IEEE Press
PY - 2014///
SN - 1530-2075
SP - 1136
TI - Generalizing Run-Time Tiling with the Loop Chain Abstraction
UR - http://dx.doi.org/10.1109/IPDPS.2014.118
ER -