Imperial College London

Fabio Luporini

Faculty of EngineeringDepartment of Earth Science & Engineering

 
 
 
//

Contact

 

f.luporini12 Website

 
 
//

Location

 

301William Penney LaboratorySouth Kensington Campus

//

Summary

 

Publications

Citation

BibTex format

@inproceedings{Ekanayake:2023:10.1145/3605573.3605604,
author = {Ekanayake, SD and Reguly, IZ and Luporini, F and Mudalige, GR},
doi = {10.1145/3605573.3605604},
pages = {380--391},
title = {Communication-Avoiding Optimizations for Large-Scale Unstructured-Mesh Applications with OP2},
url = {http://dx.doi.org/10.1145/3605573.3605604},
year = {2023}
}

RIS format (EndNote, RefMan)

TY  - CPAPER
AB - In this paper, we investigate data movement-reducing and communication-avoiding optimizations and their practicable implementation for large-scale unstructured-mesh applications. Utilizing the high-level abstraction of the OP2 DSL for the unstructured-mesh class of codes, we reason about techniques for reduced communications across a consecutive sequence of loops - a loop-chain. The careful trade-off with increased redundant computation in place of data movement is analyzed for distributed-memory parallelization. A new communication-avoiding (CA) back-end for OP2 is designed, codifying these techniques such that they can be applied automatically to any OP2 application. The back-end is extended to operate on a cluster of GPUs, integrating GPU-to-GPU communication with CUDA, in combination with MPI. The new CA back-end is applied automatically to two non-trivial applications, including the OP2 version of Rolls-Royce's production CFD application, Hydra. Performance is investigated on both CPU and GPU clusters on representative problems of 8M and 24M node mesh sizes. Results demonstrate how for select configurations the new CA back-end provides between 30 - 65% runtime reductions for the loop-chains in these applications for the mesh sizes on both an HPE Cray EX system and an NVIDIA V100 GPU cluster. We model and examine the determinants and characteristics of a given unstructured-mesh loop-chain that can lead to performance benefits with CA techniques, providing insights into the general feasibility and profitability of using the optimizations for this class of applications.
AU - Ekanayake,SD
AU - Reguly,IZ
AU - Luporini,F
AU - Mudalige,GR
DO - 10.1145/3605573.3605604
EP - 391
PY - 2023///
SP - 380
TI - Communication-Avoiding Optimizations for Large-Scale Unstructured-Mesh Applications with OP2
UR - http://dx.doi.org/10.1145/3605573.3605604
ER -