Imperial College London

ProfessorGerardGorman

Faculty of EngineeringDepartment of Earth Science & Engineering

Professor of Computational Science and Engineering
 
 
 
//

Contact

 

+44 (0)20 7594 9985g.gorman Website

 
 
//

Location

 

R4.92Royal School of MinesSouth Kensington Campus

//

Summary

 

Publications

Citation

BibTex format

@unpublished{Rodrigues:2019,
author = {Rodrigues, VHM and Cavalcante, L and Pereira, MB and Luporini, F and Reguly, I and Gorman, G and Souza, SXD},
publisher = {arXiv},
title = {GPU support for automatic generation of finite-differences stencil Kernels},
url = {http://arxiv.org/abs/1912.00695v1},
year = {2019}
}

RIS format (EndNote, RefMan)

TY  - UNPB
AB - The growth of data to be processed in the Oil & Gas industry matches therequirements imposed by evolving algorithms based on stencil computations, suchas Full Waveform Inversion and Reverse Time Migration. Graphical processingunits (GPUs) are an attractive architectural target for stencil computationsbecause of its high degree of data parallelism. However, the rapidarchitectural and technological progression makes it difficult for even themost proficient programmers to remain up-to-date with the technologicaladvances at a micro-architectural level. In this work, we present an extensionfor an open source compiler designed to produce highly optimized finitedifference kernels for use in inversion methods named Devito. We embed it withthe Oxford Parallel Domain Specific Language (OP-DSL) in order to enableautomatic code generation for GPU architectures from a high-levelrepresentation. We aim to enable users coding in a symbolic representationlevel to effortlessly get their implementations leveraged by the processingcapacities of GPU architectures. The implemented backend is evaluated on aNVIDIA GTX Titan Z, and on a NVIDIA Tesla V100 in terms of operationalintensity through the roof-line model for varying space-order discretizationlevels of 3D acoustic isotropic wave propagation stencil kernels with andwithout symbolic optimizations. It achieves approximately 63% of V100's peakperformance and 24% of Titan Z's peak performance for stencil kernels overgrids with 256 points. Our study reveals that improving memory usage should bethe most efficient strategy for leveraging the performance of the implementedsolution on the evaluated architectures.
AU - Rodrigues,VHM
AU - Cavalcante,L
AU - Pereira,MB
AU - Luporini,F
AU - Reguly,I
AU - Gorman,G
AU - Souza,SXD
PB - arXiv
PY - 2019///
TI - GPU support for automatic generation of finite-differences stencil Kernels
UR - http://arxiv.org/abs/1912.00695v1
UR - http://hdl.handle.net/10044/1/76683
ER -