Publications

Cueto C, Bates O, Strong G, Cudeiro J, Robins TC, Luporini F, Agudo OC, Gorman G, Guasch L, Tang M-Xet al., 2023, A flexible software platform for high-performance ultrasound computed tomography Computer Methods and Programs in Biomedicine (vol 221, 106855, 2022), COMPUTER METHODS AND PROGRAMS IN BIOMEDICINE, Vol: 240, ISSN: 0169-2607

Journal article

Louboutin M, Yin Z, Orozco R, Grady TJ, Siahkoohi A, Rizzuti G, Witte PA, Møyner O, Gorman GJ, Herrmann FJet al., 2023, Learned multiphysics inversion with differentiable programming and machine learning, The Leading Edge, Vol: 42, Pages: 474-486, ISSN: 1938-3789

We present the Seismic Laboratory for Imaging and Modeling/Monitoring open-source software framework for computational geophysics and, more generally, inverse problems involving the wave equation (e.g., seismic and medical ultrasound), regularization with learned priors, and learned neural surrogates for multiphase flow simulations. By integrating multiple layers of abstraction, the software is designed to be both readable and scalable, allowing researchers to easily formulate problems in an abstract fashion while exploiting the latest developments in high-performance computing. The design principles and their benefits are illustrated and demonstrated by means of building a scalable prototype for permeability inversion from time-lapse crosswell seismic data, which, aside from coupling of wave physics and multiphase flow, involves machine learning.

Journal article

Cueto C, Bates O, Strong G, Cudeiro J, Luporini F, Calderón Agudo Ò, Gorman G, Guasch L, Tang M-Xet al., 2022, Stride: a flexible software platform for high-performance ultrasound computed tomography, Computer Methods and Programs in Biomedicine, Vol: 221, ISSN: 0169-2607

BACKGROUND AND OBJECTIVE: Advanced ultrasound computed tomography techniques like full-waveform inversion are mathematically complex and orders of magnitude more computationally expensive than conventional ultrasound imaging methods. This computational and algorithmic complexity, and a lack of open-source libraries in this field, represent a barrier preventing the generalised adoption of these techniques, slowing the pace of research, and hindering reproducibility. Consequently, we have developed Stride, an open-source Python library for the solution of large-scale ultrasound tomography problems. METHODS: On one hand, Stride provides high-level interfaces and tools for expressing the types of optimisation problems encountered in medical ultrasound tomography. On the other, these high-level abstractions seamlessly integrate with high-performance wave-equation solvers and with scalable parallelisation routines. The wave-equation solvers are generated automatically using Devito, a domain-specific language, and the parallelisation routines are provided through the custom actor-based library Mosaic. RESULTS: We demonstrate the modelling accuracy achieved by our wave-equation solvers through a comparison (1) with analytical solutions for a homogeneous medium, and (2) with state-of-the-art modelling software applied to a high-contrast, complex skull section. Additionally, we show through a series of examples how Stride can handle realistic numerical and experimental tomographic problems, in 2D and 3D, and how it can scale robustly from a local multi-processing environment to a multi-node high-performance cluster. CONCLUSIONS: Stride enables researchers to rapidly and intuitively develop new imaging algorithms and to explore novel physics without sacrificing performance and scalability. This will lead to faster scientific progress in this field and will significantly ease clinical translation.

Journal article

Kukreja N, Huckelheim J, Louboutin M, Washbourne J, Kelly PHJ, Gorman GJet al., 2022, Lossy checkpoint compression in full waveform inversion: a case study with ZFPv0.5.5 and the overthrust model, Geoscientific Model Development, Vol: 15, Pages: 3815-3829, ISSN: 1991-959X

This paper proposes a new method that combines checkpointing methods with error-controlled lossy compression for large-scale high-performance full-waveform inversion (FWI), an inverse problem commonly used in geophysical exploration. This combination can significantly reduce data movement, allowing a reduction in run time as well as peak memory.In the exascale computing era, frequent data transfer (e.g., memory bandwidth, PCIe bandwidth for GPUs, or network) is the performance bottleneck rather than the peak FLOPS of the processing unit.Like many other adjoint-based optimization problems, FWI is costly in terms of the number of floating-point operations, large memory footprint during backpropagation, and data transfer overheads. Past work for adjoint methods has developed checkpointing methods that reduce the peak memory requirements during backpropagation at the cost of additional floating-point computations.Combining this traditional checkpointing with error-controlled lossy compression, we explore the three-way tradeoff between memory, precision, and time to solution. We investigate how approximation errors introduced by lossy compression of the forward solution impact the objective function gradient and final inverted solution. Empirical results from these numerical experiments indicate that high lossy-compression rates (compression factors ranging up to 100) have a relatively minor impact on convergence rates and the quality of the final solution.

Journal article

Pyles C, Schalkwyk FV, Gorman GJ, Beg M, Stott L, Levy N, Gilad-Bachrach Ret al., 2021, PyBryt: auto-assessment and auto-grading for computational thinking, Publisher: arXiv

We continuously interact with computerized systems to achieve goals andperform tasks in our personal and professional lives. Therefore, the ability toprogram such systems is a skill needed by everyone. Consequently, computationalthinking skills are essential for everyone, which creates a challenge for theeducational system to teach these skills at scale and allow students topractice these skills. To address this challenge, we present a novel approachto providing formative feedback to students on programming assignments. Ourapproach uses dynamic evaluation to trace intermediate results generated bystudent's code and compares them to the reference implementation provided bytheir teachers. We have implemented this method as a Python library anddemonstrate its use to give students relevant feedback on their work whileallowing teachers to challenge their students' computational thinking skills.

Working paper

Zhang Q, Iordanescu G, Tok WH, Brandsberg-Dahl S, Srinivasan HK, Chandra R, Kukreja N, Gorman Get al., 2021, Hyperwavve: a cloud-native solution for hyperscale seismic imaging on Azure, First International Meeting for Applied Geoscience & Energy, Publisher: Society of Exploration Geophysicists, ISSN: 1052-3812

As cloud-computing becomes more and more popular lately, we explore its potential for hyperscale seismic imaging workloads on Azure. We introduce our cloud-native fault-tolerant solution named Hyperwavve which is based on advanced cloud technologies including Docker/Container, Kubernetes and Dask. We demonstrate a large-scale 3D FWI using 1000 VMs/nodes on Azure, where Hyperwavve uses distributed containerized processes to successfully invert for the full 3D (20x20x5 km3) overthrust velocity model. We also further validate that our Hyperwavve can distribute FWI work onto 6000 (or more) VMs/nodes concurrently. Last, we show that our Python-based FWI runs on both Azure CPUs and GPUs including various architectures.

Conference paper

Bisbas G, Luporini F, Louboutin M, Nelson R, Gorman GJ, Kelly PHJet al., 2021, Temporal blocking of finite-difference stencil operators with sparse "off-the-grid" sources, 35th IEEE International Parallel and Distributed Processing Symposium (IPDPS), Publisher: IEEE COMPUTER SOC, Pages: 497-506, ISSN: 1530-2075

Stencil kernels dominate a range of scientific applications, including seismic and medical imaging, image processing, and neural networks. Temporal blocking is a performance optimization that aims to reduce the required memory bandwidth of stencil computations by re-using data from the cache for multiple time steps. It has already been shown to be beneficial for this class of algorithms. However, applying temporal blocking to practical applications' stencils remains challenging. These computations often consist of sparsely located operators not aligned with the computational grid (“off-the-grid”). Our work is motivated by modelling problems in which source injections result in wavefields that must then be measured at receivers by interpolation from the grided wavefield. The resulting data dependencies make the adoption of temporal blocking much more challenging. We propose a methodology to inspect these data dependencies and reorder the computation, leading to performance gains in stencil codes where temporal blocking has not been applicable. We implement this novel scheme in the Devito domain-specific compiler toolchain. Devito implements a domain-specific language embedded in Python to generate optimized partial differential equation solvers using the finite-difference method from high-level symbolic problem definitions. We evaluate our scheme using isotropic acoustic, anisotropic acoustic, and isotropic elastic wave propagators of industrial significance. After auto-tuning, performance evaluation shows that this enables substantial performance improvement through temporal blocking over highly-optimized vectorized spatially-blocked code of up to 1.6x.

Conference paper

Caunt E, Nelson R, Luporini F, Gorman Get al., 2021, GENERALISED ALGORITHM AND IMPLEMENTATION OF TOPOGRAPHY WITHIN FINITE DIFFERENCE WAVE SOLVERS, Pages: 1152-1156

Partial differential equation solvers based on the finite-difference method have for many years been a keystone of seismic processing and modelling applications, commonly found in practical migration and full-waveform inversion methods. Sharp density contrasts within the computational domain have potential to introduce numerical error: problematic when introducing topography to a seismic model. Including a simple step change in density to approximate an air layer compromises both stability and numerical accuracy, often requiring smoothing of the contrast, and inducing both first and second order errors in space. Topography can instead be implemented via an immersed boundary conforming to the surface. This is achieved by extrapolating the wavefield across the boundary to find solution values at necessary external nodes. As this process is confined to the pre-processing step, it has negligible effect on the computational cost of the simulation. Devitoboundary is a tool in its early stages of development, intended to compliment Devito as a user-friendly means of including immersed boundaries in practical applications. 3D immersed boundaries can be constructed from irregularly sampled topography point clouds, via Delaunay triangulation coupled with a 1D extrapolation scheme. The result is a stable, error-free boundary which can be readily integrated with Devito models.

Abstract
Cite

Conference paper

Kukreja N, Hückelheim J, Louboutin M, Washbourne J, Kelly PHJ, Gorman GJet al., 2020, Lossy Checkpoint Compression in Full Waveform Inversion, Geoscientific Model Development, ISSN: 1991-959X

This paper proposes a new method that combines check- pointing methods with error-controlled lossy compression for large-scale high-performance Full-Waveform Inversion (FWI), an inverse problem commonly used in geophysical exploration. This combination can signif- icantly reduce data movement, allowing a reduction in run time as well as peak memory.In the Exascale computing era, frequent data transfer (e.g., memory bandwidth, PCIe bandwidth for GPUs, or network) is the performance bottleneck rather than the peak FLOPS of the processing unit.Like many other adjoint-based optimization problems, FWI is costly in terms of the number of floating-point operations, large memory foot- print during backpropagation, and data transfer overheads. Past work for adjoint methods has developed checkpointing methods that reduce the peak memory requirements during backpropagation at the cost of additional floating-point computations.Combining this traditional checkpointing with error-controlled lossy compression, we explore the three-way tradeoff between memory, precision, and time to solution. We investigate how approximation errors introduced by lossy compression of the forward solution impact the objective function gradient and final inverted solution. Empirical results from these numerical experiments indicate that high lossy-compression rates (compression factors ranging up to 100) have a relatively minor impact on convergence rates and the quality of the final solution.

Journal article

Kramer S, Wilson C, Davies R, Funke SW, Greaves T, Avdis A, Lange M, Candy A, Cotter CJ, Pain C, Percival J, Mouradian S, Bhutani G, Gorman G, Gibson A, Duvernay T, Guo X, Maddison JR, Rathgeber F, Farrell P, Weiland M, Robinson D, Ham DA, Goffin M, Piggott M, Gomes J, Dargaville S, Everett A, Jacobs CT, Cavendish ABet al., 2020, FluidityProject/fluidity: New test cases "Analytical solutions for mantle flow in cylindrical and spherical shells"

This release adds new test cases described in the GMD paper "Analytical solutions for mantle flow in cylindrical and spherical shells"

Abstract
Cite

Software

Luporini F, Louboutin M, Lange M, Kukreja N, Bisbas G, Pandolfo V, Cavalcante L, Gorman G, Mickus V, Bruno M, Kazakas P, Dinneen C, Mojica O, von Conta GS, Greaves T, Freire de Souza J, Speglich JH, Allam Jr T, Witte P, Hester K, Rami L, Washbourne Ret al., 2020, devitocodes/devito: v4.2.3

SynopsisPerformance optimizations in the symbolic layer and generated code for x86, GPU and MPI.Various minor correctness and performance bug fixes.Improvements to application developer API.Added new tutorial notebooks.Increased test coverage - particularly for MPI and GPU's.

Abstract
Cite

Software

Louboutin M, Luporini F, Bisbas G, Herrmann F, Gorman G, Witte Pet al., 2020, mloubout/SC20Paper: First release

SC20 in Atlanta submission

Abstract
Cite

Software

Luporini F, Lange M, Louboutin M, Kukreja N, Hückelheim J, Yount C, Witte P, Kelly PHJ, Herrmann FJ, Gorman Get al., 2020, Architecture and performance of Devito, a system for automated stencil computation, ACM Transactions on Mathematical Software, Vol: 46, Pages: 1-24, ISSN: 0098-3500

Stencil computations are a key part of many high-performance computing applications, such as imageprocessing, convolutional neural networks, and finite-difference solvers for partial differential equations. Devitois a framework capable of generating highly-optimized code given symbolic equations expressed in Python,specialized in, but not limited to, affine (stencil) codes. The lowering process—from mathematical equations down to C++ code—is performed by the Devito compiler through a series of intermediate representations.Several performance optimizations are introduced, including advanced common sub-expressions elimination, tiling and parallelization. Some of these are obtained through well-established stencil optimizers, integratedin the back-end of the Devito compiler. The architecture of the Devito compiler, as well as the performance optimizations that are applied when generating code, are presented. The effectiveness of such performanceoptimizations is demonstrated using operators drawn from seismic imaging applications.

Journal article

Louboutin M, Luporini F, Witte P, Nelson R, Bisbas G, Thorbecke J, Herrmann FJ, Gorman Get al., 2020, Scaling through abstractions -- high-performance vectorial wave simulations for seismic inversion with Devito, Publisher: arXiv

[Devito] is an open-source Python project based on domain-specific languageand compiler technology. Driven by the requirements of rapid HPC applicationsdevelopment in exploration seismology, the language and compiler have evolvedsignificantly since inception. Sophisticated boundary conditions, tensorcontractions, sparse operations and features such as staggered grids andsub-domains are all supported; operators of essentially arbitrary complexitycan be generated. To accommodate this flexibility whilst ensuring performance,data dependency analysis is utilized to schedule loops and detectcomputational-properties such as parallelism. In this article, the generationand simulation of MPI-parallel propagators (along with their adjoints) for thepseudo-acoustic wave-equation in tilted transverse isotropic media and theelastic wave-equation are presented. Simulations are carried out on industryscale synthetic models in a HPC Cloud system and reach a performance of28TFLOP/s, hence demonstrating Devito's suitability for production-gradeseismic inversion problems.

Working paper

Luporini F, Louboutin M, Lange M, Kukreja N, Witte P, Huckelheim J, Yount C, Kelly PHJ, Herrmann FJ, Gorman GJet al., 2020, Architecture and Performance of Devito, a System for Automated Stencil Computation, Publisher: ASSOC COMPUTING MACHINERY

Working paper

Luporini F, Nelson R, Burgess T, St-Cyr A, Gorman Get al., 2020, Automated distributed-memory parallelism from symbolic specification in devito

Automated Distributed-memory Parallelism has been added to Devito, a rapidly evolving framework adopted by a dynamic, heterogeneous and fast-growing community. The key innovations are the abstractions provided to the user and the compiler- based implementation approach, which we consider invaluable for long-term sustainable software to replace (partly or fully) obsolete, impenetrable, hardly extendable and often inefficient legacy code. The auto-tuner, which determines, among the other things, the best block shape for each tiled loop nest in an Operator, has already been tweaked to support DMP. Single-node multi-socket (one MPI process per socket) as well as Multi-node experiments, both weak and strong scaling, are planned for the near future.

Abstract
Cite

Conference paper

Luporini F, Louboutin M, Lange M, Kukreja N, Bisbas G, Pandolfo V, Cavalcante L, Gorman G, Mickus V, Kazakas P, von Conta GS, Greaves T, Bruno M, Freire de Souza J, Astic T, Rasal R, Picetti F, Mosser L, Amaral da Silva EG, McCormick D, Wolff C, Giannotta Aet al., 2019, opesci/devito: Devito-4.0

Tensor algebra support (#873):VectorFunction and VectorTimeFunction(2nd order) TensorFunction and TensorTimeFunctionFull support for FD and related operations (derivatives, shortcuts, solve, ...)Differential operators such as div, grad and curlFD extensions:Custom FD with user-supplied coefficients as Function (#964)Extended and more rigorous support for staggered grids (#873):True half-grid staggering (u(x + h_x/2))Automatic evaluation at half-nodes (averaging only)Automatic staggered FD of any order

Abstract
Cite

Software

Rodrigues VHM, Cavalcante L, Pereira MB, Luporini F, Reguly I, Gorman G, Souza SXDet al., 2019, GPU support for automatic generation of finite-differences stencil Kernels, Publisher: arXiv

The growth of data to be processed in the Oil & Gas industry matches therequirements imposed by evolving algorithms based on stencil computations, suchas Full Waveform Inversion and Reverse Time Migration. Graphical processingunits (GPUs) are an attractive architectural target for stencil computationsbecause of its high degree of data parallelism. However, the rapidarchitectural and technological progression makes it difficult for even themost proficient programmers to remain up-to-date with the technologicaladvances at a micro-architectural level. In this work, we present an extensionfor an open source compiler designed to produce highly optimized finitedifference kernels for use in inversion methods named Devito. We embed it withthe Oxford Parallel Domain Specific Language (OP-DSL) in order to enableautomatic code generation for GPU architectures from a high-levelrepresentation. We aim to enable users coding in a symbolic representationlevel to effortlessly get their implementations leveraged by the processingcapacities of GPU architectures. The implemented backend is evaluated on aNVIDIA GTX Titan Z, and on a NVIDIA Tesla V100 in terms of operationalintensity through the roof-line model for varying space-order discretizationlevels of 3D acoustic isotropic wave propagation stencil kernels with andwithout symbolic optimizations. It achieves approximately 63% of V100's peakperformance and 24% of Titan Z's peak performance for stencil kernels overgrids with 256 points. Our study reveals that improving memory usage should bethe most efficient strategy for leveraging the performance of the implementedsolution on the evaluated architectures.

Working paper

Witte PA, Louboutin M, Luporini F, Gorman GJ, Herrmann FJet al., 2019, Compressive least-squares migration with on-the-fly Fourier transforms, Geophysics, Vol: 84, Pages: R655-R672, ISSN: 0016-8033

Least-squares reverse time migration is a powerful approach for true-amplitude seismic imaging of complex geologic structures, but the successful application of this method is currently hindered by its enormous computational cost, as well as its high memory requirements for computing the gradient of the objective function. We have tackled these problems by introducing an algorithm for low-cost sparsity-promoting least-squares migration using on-the-fly Fourier transforms. We formulate the least-squares migration objective function in the frequency domain (FD) and compute gradients for randomized subsets of shot records and frequencies, thus significantly reducing data movement and the number of overall wave equations solves. By using on-the-fly Fourier transforms, we can compute an arbitrary number of monochromatic FD wavefields with a time-domain (TD) modeling code, instead of having to solve individual Helmholtz equations for each frequency, which becomes computationally infeasible when moving to high frequencies. Our numerical examples demonstrate that compressive imaging with on-the-fly Fourier transforms provides a fast and memory-efficient alternative to TD imaging with optimal checkpointing, whose memory requirements for a fixed background model and source wavelet are independent of the number of time steps. Instead, the memory and additional computational costs grow with the number of frequencies and determine the amount of subsampling artifacts and crosstalk. In contrast to optimal checkpointing, this offers the possibility to trade the memory and computational costs for image quality or a larger number of iterations and is advantageous in new computing environments such as the cloud, where computing is often cheaper than memory and data movement.

Journal article

Kukreja N, Hückelheim J, Louboutin M, Hovland P, Gorman Get al., 2019, Combining checkpointing and data compression to accelerate adjoint-based optimization problems, Euro-Par 2019: Parallel Processing 25th International Conference on Parallel and Distributed Computing, Publisher: Springer International Publishing, Pages: 87-100, ISSN: 0302-9743

Seismic inversion and imaging are adjoint-based optimization problems that process up to terabytes of data, regularly exceeding the memory capacity of available computers. Data compression is an effective strategy to reduce this memory requirement by a certain factor, particularly if some loss in accuracy is acceptable. A popular alternative is checkpointing, where data is stored at selected points in time, and values at other times are recomputed as needed from the last stored state. This allows arbitrarily large adjoint computations with limited memory, at the cost of additional recomputations.In this paper, we combine compression and checkpointing for the first time to compute a realistic seismic inversion. The combination of checkpointing and compression allows larger adjoint computations compared to using only compression, and reduces the recomputation overhead significantly compared to using only checkpointing.

Conference paper

Luporin F, Lange M, Louboutin M, Kukreja N, Pandolfo V, Bisbas G, rhodrin, Cavalcante L, tjb900, Gorman G, Kazakas P, Greaves T, SSHz, vmickus, Jan, gamdow, vkrGitHub, pp1336, Astic T, Rasal R, Picetti F, Mosser L, dugeo2, McCormick D, Wolff Cet al., 2019, opesci/devito: Devito-3.5

Release notesMPI support:Python-level: MPI-distributed NumPy arrays.C-level: code generation for sub-domains, staggered grids, operators with coupled PDEs.C-level: performance optimizations (e.g., computation-communication overlap).Lazy evaluation of derivatives.Revisited staggered grids API (now Dimension-based, previously mask-based).Re-engineered clustering (which means smarter loop fusion/fission).DSE: Improved aliases detection.DLE: OpenMP nested parallelism; hierarchical loop blocking.Auto-padding for Functions/TimeFunctions.Improved data dependency analysis.Smarter Operator auto-tuning.New tutorials: Operator application, MPI, new propagators, custom stencils, and more.Revisited benchmarking scripts.Revisited examples, new models and propagators (e.g., visco-elastic).Smarter continuous integration: now Travis sided by Azure Pipelines; dropped Jenkins.Misc bug fixes.Hundreds of tests added.More sophisticated platform auto-detection.

Abstract
Cite

Software

Hückelheim J, Kukreja N, Narayanan SHK, Luporini F, Gorman G, Hovland Pet al., 2019, Automatic differentiation for adjoint stencil loops, Publisher: arXiv

Stencil loops are a common motif in computations including convolutional neural networks, structured-mesh solvers for partial differential equations, and image processing. Stencil loops are easy to parallelise, and their fast execution is aided by compilers, libraries, and domain-specific languages. Reverse-mode automatic differentiation, also known as algorithmic differentiation, autodiff, adjoint differentiation, or back-propagation, is sometimes used to obtain gradients of programs that contain stencil loops. Unfortunately, conventional automatic differentiation results in a memory access pattern that is not stencil-like and not easily parallelisable.In this paper we present a novel combination of automatic differentiation and loop transformations that preserves the structure and memory access pattern of stencil loops, while computing fully consistent derivatives. The generated loops can be parallelised and optimised for performance in the same way and using the same tools as the original computation. We have implemented this new technique in the Python tool PerforAD, which we release with this paper along with test cases derived from seismic imaging and computational fluid dynamics applications.

Working paper

Luporini F, Lange M, Jacobs CT, Gorman GJ, Ramanujam J, Kelly PHJet al., 2019, Automated tiling of unstructured mesh computations with application to seismological modeling, ACM Transactions on Mathematical Software, Vol: 45, ISSN: 0098-3500

Publication rights licensed to ACM. Sparse tiling is a technique to fuse loops that access common data, thus increasing data locality. Unlike traditional loop fusion or blocking, the loops may have different iteration spaces and access shared datasets through indirect memory accesses, such as A[map[i]]-hence the name “sparse.” One notable example of such loops arises in discontinuous-Galerkin finite element methods, because of the computation of numerical integrals over different domains (e.g., cells, facets). The major challenge with sparse tiling is implementation-not only is it cumbersome to understand and synthesize, but it is also onerous to maintain and generalize, as it requires a complete rewrite of the bulk of the numerical computation. In this article, we propose an approach to extend the applicability of sparse tiling based on raising the level of abstraction. Through a sequence of compiler passes, the mathematical specification of a problem is progressively lowered, and eventually sparse-tiled C for-loops are generated. Besides automation, we advance the state-of-the-art by introducing a revisited, more efficient sparse tiling algorithm; support for distributed-memory parallelism; a range of fine-grained optimizations for increased runtime performance; implementation in a publicly available library, SLOPE; and an in-depth study of the performance impact in Seigen, a real-world elastic wave equation solver for seismological problems, which shows speed-ups up to 1.28× on a platform consisting of 896 Intel Broadwell cores.

Journal article

Witte PA, Louboutin M, Kukreja N, Luporini F, Lange M, Gorman GJ, Herrmann FJet al., 2019, A large-scale framework for symbolic implementations of seismic inversion algorithms in Julia, Geophysics, Vol: 84, Pages: F57-F71, ISSN: 0016-8033

Writing software packages for seismic inversion is a very challenging task because problems such as full-waveform inversion or least-squares imaging are algorithmically and computationally demanding due to the large number of unknown parameters and the fact that waves are propagated over many wavelengths. Therefore, software frameworks need to combine versatility and performance to provide geophysicists with the means and flexibility to implement complex algorithms that scale to exceedingly large 3D problems. Following these principles, we have developed the Julia Devito Inversion framework, an open-source software package in Julia for large-scale seismic modeling and inversion based on Devito, a domain-specific language compiler for automatic code generation. The framework consists of matrix-free linear operators for implementing seismic inversion algorithms that closely resemble the mathematical notation, a flexible resilient parallelization, and an interface to Devito for generating optimized stencil code to solve the underlying wave equations. In comparison with many manually optimized industry codes written in low-level languages, our software is built on the idea of independent layers of abstractions and user interfaces with symbolic operators. Through a series of numerical examples, we determined that this allows users to implement a series of increasingly complex algorithms for waveform inversion and imaging as simple Julia scripts that scale to large-scale 3D problems. This illustrates that software based on the paradigms of abstract user interfaces and automatic code generation and makes it possible to manage the complexity of the algorithms and performance optimizations, thus providing a high-performance research and production framework.

Journal article

Louboutin M, Lange M, Luporini F, Kukreja N, Witte PA, Herrmann FJ, Velesko P, Gorman GJet al., 2019, Devito (v3.1.0): An embedded domain-specific language for finite differences and geophysical exploration, Geoscientific Model Development, Vol: 12, Pages: 1165-1187, ISSN: 1991-959X

© Author(s) 2019. We introduce Devito, a new domain-specific language for implementing high-performance finite-difference partial differential equation solvers. The motivating application is exploration seismology for which methods such as full-waveform inversion and reverse-time migration are used to invert terabytes of seismic data to create images of the Earth's subsurface. Even using modern supercomputers, it can take weeks to process a single seismic survey and create a useful subsurface image. The computational cost is dominated by the numerical solution of wave equations and their corresponding adjoints. Therefore, a great deal of effort is invested in aggressively optimizing the performance of these wave-equation propagators for different computer architectures. Additionally, the actual set of partial differential equations being solved and their numerical discretization is under constant innovation as increasingly realistic representations of the physics are developed, further ratcheting up the cost of practical solvers. By embedding a domain-specific language within Python and making heavy use of SymPy, a symbolic mathematics library, we make it possible to develop finite-difference simulators quickly using a syntax that strongly resembles the mathematics. The Devito compiler reads this code and applies a wide range of analysis to generate highly optimized and parallel code. This approach can reduce the development time of a verified and optimized solver from months to days.

Journal article

Kukreja N, Shilova A, Beaumont O, Huckelheim J, Ferrier N, Hovland P, Gorman Get al., 2019, Training on the Edge: The why and the how, Publisher: arXiv

Edge computing is the natural progression from Cloud computing, where, instead of collecting all data and processing it centrally, like in a cloud computing environment, we distribute the computing power and try to do as much processing as possible, close to the source of the data. There are various reasons this model is being adopted quickly, including privacy, and reduced power and bandwidth requirements on the Edge nodes. While it is common to see inference being done on Edge nodes today, it is much less common to do training on the Edge. The reasons for this range from computational limitations, to it not being advantageous in reducing communications between the Edge nodes. In this paper, we explore some scenarios where it is advantageous to do training on the Edge, as well as the use of checkpointing strategies to save memory.

Working paper

Kukreja N, Luporini F, Lange M, Louboutin M, Pandolfo V, Cavalcante L, Gorman G, Kazakas P, Greaves T, Bisbas G, Astic T, Rasal R, McCormick D, Wolff Cet al., 2018, opesci/devito: Devito-3.4

Release notesPreliminary support for MPI (no changes to user code requested)Support for staggered gridsImproved compilation technologyImproved Operator autotuningMore powerful DSL (e.g., take derivatives of entire expressions such as (u+v).dx)More efficient picklingMisc bug fixesNew modeling examples based on the elastic wave equationNew examples describing aspects of the compilation technology

Abstract
Cite

Software

Kukreja N, Hückelheim J, Gorman GJ, 2018, Backpropagation for long sequences: beyond memory constraints with constant overheads

Naive backpropagation through time has a memory footprint that grows linearlyin the sequence length, due to the need to store each state of the forwardpropagation. This is a problem for large networks. Strategies have beendeveloped to trade memory for added computations, which results in a sublineargrowth of memory footprint or computation overhead. In this work, we present alibrary that uses asynchronous storing and prefetching to move data to and fromslow and cheap stor- age. The library only stores and prefetches states asfrequently as possible without delaying the computation, and uses the optimalRevolve backpropagation strategy for the computations in between. The memoryfootprint of the backpropagation can thus be reduced to any size (e.g. to fitinto DRAM), while the computational overhead is constant in the sequencelength, and only depends on the ratio between compute and transfer times on agiven hardware. We show in experiments that by exploiting asyncronous datatransfer, our strategy is always at least as fast, and usually faster than thepreviously studied "optimal" strategies.

Working paper

Witte P, Louboutin M, Lensink K, Lange M, Kukreja N, Luporini F, Gorman G, Herrmann FJet al., 2018, Full-waveform inversion, Part 3: Optimization, Leading Edge, Vol: 37, Pages: 142-145, ISSN: 1070-485X

This tutorial is the third part of a full-waveform inversion (FWI) tutorial series with a step-by-step walkthrough of setting up forward and adjoint wave equations and building a basic FWI inversion framework. For discretizing and solving wave equations, we use Devito (http://www.opesci.org/devito-public), a Python-based domain-specific language for automated generation of finite-difference code (Lange et al., 2016). The first two parts of this tutorial (Louboutin et al., 2017, 2018) demonstrated how to solve the acoustic wave equation for modeling seismic shot records and how to compute the gradient of the FWI objective function using the adjoint-state method. With these two key ingredients, we will now build an inversion framework that can be used to minimize the FWI least-squares objective function.

Journal article

Louboutin M, Witte P, Lange M, Kukreja N, Luporini F, Gorman G, Herrmann FJet al., 2018, Full-waveform inversion, Part 2: Adjoint modeling, Leading Edge, Vol: 37, Pages: 69-72, ISSN: 1070-485X

This is the second part of a three-part tutorial series on full-waveform inversion (FWI) in which we provide a step-by-step walk through of setting up forward and adjoint wave equation solvers and an optimization framework for inversion. In Part 1 (Louboutin et al., 2017), we showed how to use Devito (http://www.opesci.org/devito-public) to set up and solve acoustic wave equations with (impulsive) seismic sources and sample wavefields at the receiver locations to forward model shot records. Here in Part 2, we will discuss how to set up and solve adjoint wave equations with Devito and, from that, how we can calculate gradients and function values of the FWI objective function.

Journal article

ProfessorGerardGorman

Contact

Location

Summary