Publications
51 results found
McRae ATT, Mitchell L, Bercea, et al., 2016, Automated Generation and Symbolic Manipulation of Tensor Product Finite Elements, SIAM Journal on Scientific Computing, Vol: 38, Pages: S25-S47, ISSN: 1064-8275
We describe and implement a symbolic algebra for scalar and vector-valued finite elements, enabling the computer generation of elements with tensor product structure on quadrilateral, hexahedral, and triangular prismatic cells. The algebra is implemented as an extension to the domain-specific language UFL, the Unified Form Language. This allows users to construct many finite element spaces beyond those supported by existing software packages. We have made corresponding extensions to FIAT, the FInite element Automatic Tabulator, to enable numerical tabulation of such spaces. This tabulation is consequently used during the automatic generation of low-level code that carries out local assembly operations, within the wider context of solving finite element problems posed over such function spaces. We have done this work within the code-generation pipeline of the software package Firedrake; we make use of the full Firedrake package to present numerical examples.
Mitchell L, Mueller EH, 2016, High level implementation of geometric multigrid solvers for finite element problems: Applications in atmospheric modelling, Journal of Computational Physics, Vol: 327, Pages: 1-18, ISSN: 1090-2716
The implementation of efficient multigrid preconditioners for elliptic partial differential equations (PDEs) is a challenge due to the complexity of the resulting algorithms and corresponding computer code. For sophisticated (mixed) finite element discretisations on unstructured grids an efficient implementation can be very time consuming and requires the programmer to have in-depth knowledge of the mathematical theory, parallel computing and optimisation techniques on manycore CPUs.In this paper we show how the development of bespoke multigrid preconditioners can be simplified significantly by using a framework which allows the expression of the each component of the algorithm at the correct abstraction level. Our approach (1) allows the expression of the finite element problem in a language which is close to the mathematical formulation of the problem, (2) guarantees the automatic generation and efficient execution of parallel optimised low-level computer code and (3) is flexible enough to support different abstraction levels and give the programmer control over details of the preconditioner. We use the composable abstractions of the Firedrake/PyOP2 package to demonstrate the efficiency of this approach for the solution of strongly anisotropic PDEs in atmospheric modelling. The weak formulation of the PDE is expressed in Unified Form Language (UFL) and the lower PyOP2 abstraction layer allows the manual design of computational kernels for a bespoke geometric multigrid preconditioner. We compare the performance of this preconditioner to a single-level method and hypre's BoomerAMG algorithm. The Firedrake/PyOP2 code is inherently parallel and we present a detailed performance analysis for a single node (24 cores) on the ARCHER supercomputer. Our implementation utilises a significant fraction of the available memory bandwidth and shows very good weak scaling on up to 6,144 compute cores.
Rathgeber F, Mitchell L, 2016, firedrake-bench: firedrake bench optimality paper release
A repository of Firedrake benchmarks
Guo X, Lange M, Gorman G, et al., 2015, Developing a scalable hybrid MPI/OpenMP unstructured finite element model, COMPUTERS & FLUIDS, Vol: 110, Pages: 227-234, ISSN: 0045-7930
- Author Web Link
- Cite
- Citations: 19
Lange M, Gorman G, Weiland M, et al., 2013, Benchmarking mixed-mode PETSc performance on high-performance architectures
The trend towards highly parallel multi-processing is ubiquitous in allmodern computer architectures, ranging from handheld devices to large-scale HPCsystems; yet many applications are struggling to fully utilise the multiplelevels of parallelism exposed in modern high-performance platforms. In order torealise the full potential of recent hardware advances, a mixed-mode betweenshared-memory programming techniques and inter-node message passing can beadopted which provides high-levels of parallelism with minimal overheads. Forscientific applications this entails that not only the simulation code itself,but the whole software stack needs to evolve. In this paper, we evaluate themixed-mode performance of PETSc, a widely used scientific library for thescalable solution of partial differential equations. We describe the additionof OpenMP threaded functionality to the library, focusing on sparsematrix-vector multiplication. We highlight key challenges in achieving goodparallel performance, such as explicit communication overlap using task-basedparallelism, and show how to further improve performance by explicitly loadbalancing threads within MPI processes. Using a set of matrices extracted fromFluidity, a CFD application code which uses the library as its linear solverengine, we then benchmark the parallel performance of mixed-mode PETSc acrossmultiple nodes on several modern HPC architectures. We evaluate the parallelscalability on Uniform Memory Access (UMA) systems, such as the FujitsuPRIMEHPC FX10 and IBM BlueGene/Q, as well as a Non-Uniform Memory Access (NUMA)Cray XE6 platform. A detailed comparison is performed which highlights thecharacteristics of each particular architecture, before demonstrating efficientstrong scalability of sparse matrix-vector multiplication with significantspeedups over the pure-MPI mode.
Markall GR, Rathgeber F, Mitchell L, et al., 2013, Performance-Portable Finite Element Assembly Using PyOP2 and FEniCS, International Supercomputing Conference (ISC), Publisher: Springer, Pages: 279-289, ISSN: 0302-9743
We describe a toolchain that provides a fully automated compilation pathway from a finite element domain-specific language to low-level code for multicore and GPGPU platforms. We demonstrate that the generated code exceeds the performance of the best available alternatives, without requiring manual tuning or modification of the generated code. The toolchain can easily be integrated with existing finite element solvers, providing a means to add performance portable methods without having to rebuild an entire complex implementation from scratch.
Lange M, Gorman G, Weiland M, et al., 2013, Acieving efficient strong scaling with PETSc using hybrid MPI/OpenMP optimisations, Publisher: Springer Berlin Heidelberg, Pages: 97-108
Guo X, Gorman G, Lange M, et al., 2013, Exploring the Thread-level Parallelisms for the Next Generation Geophysical Fluid Modelling Framework Fluidity-ICOM, Procedia Engineering, Vol: 61, Pages: 251 - 257-251 - 257, ISSN: 1877-7058
Plank G, Neic A, Liebmann M, et al., 2012, Accelerating cardiac bidomain simulations using Graphics Processing Units, Biomedical Engineering, IEEE Transactions on, Vol: 59, Pages: 2281-2290, ISSN: 0018-9294
Rathgeber F, Markall GR, Mitchell L, et al., 2012, PyOP2: A High-Level Framework for Performance-Portable Simulations on Unstructured Meshes, High Performance Computing, Networking Storage and Analysis, SC Companion, Publisher: IEEE Computer Society, Pages: 1116-1123
Emerging many-core platforms are very difficult to program in a performance portable manner whilst achieving high efficiency on a diverse range of architectures. We present work in progress on PyOP2, a high-level embedded domain-specific language for mesh-based simulation codes that executes numerical kernels in parallel over unstructured meshes. Just-in-time kernel compilation and parallel scheduling are delayed until runtime, when problem-specific parameters are available. Using generative metaprogramming, performance portability is achieved, while details of the parallel implementation are abstracted from the programmer. PyOP2 kernels for finite element computations can be generated automatically from equations given in the domain-specific Unified Form Language. Interfacing to the multi-phase CFD code Fluidity through a very thin layer on top of PyOP2 yields a general purpose finite element solver with an input notation very close to mathematical formulae. Preliminary performance figures show speedups of up to 3.4x compared to Fluidity's built-in solvers when running in parallel.
Weiland M, Mitchell L, Gorman G, et al., 2012, Mixed-mode implementation of PETSc for scalable linear algebra on multi-core processors
Mitchell L, Sloan TM, Mewissen M, et al., 2012, Parallel classification and feature selection in microarray data using SPRINT, Concurrency and Computation: Practice and Experience
Piotrowski M, McGilvary G, Sloan T, et al., 2012, Exploiting Parallel R in the Cloud with SPRINT, Methods of Information in Medicine
Niederer S, Mitchell L, Smith N, et al., 2011, Simulating human cardiac physiology on clinical time-scales, Frontiers in Physiology, Vol: 2, Pages: 1-7, ISSN: 1664-042X
In this study, the feasibility of conducting in silico experiments in near-realtime with anatomically realistic, biophysically detailed models of human cardiac electrophysiology is demonstrated using a current national high-performance computing facility. The required performance is achieved by integrating and optimizing load balancing and parallel I/O, which lead to strongly scalable simulations up to 16,384 compute cores. This degree of parallelization enables computer simulations of human cardiac electrophysiology at 240 times slower than real time and activation times can be simulated in approximately 1 min. This unprecedented speed suffices requirements for introducing in silico experimentation into a clinical workflow.
Mitchell L, Sloan TM, Mewissen M, et al., 2011, A parallel random forest classifier for R, Pages: 1-6
Piotrowski M, Sloan TM, Mewsissen M, et al., 2011, Optimisation and parallelisation of the partitioning around medoids function in R, Pages: 707-713
Mitchell L, Cates ME, 2010, Hawkes process as a model of social interactions: a view on video dynamics, Journal of Physics A: Mathematical and Theoretical, Vol: 43, Pages: 045101-045101
Mitchell L, Bishop M, Hötzl E, et al., 2010, Modeling Cardiac Electrophysiology at the Organ Level in the Peta FLOPS Computing Age, AIP Conference Proceedings, Vol: 1281, Pages: 407-410
Mitchell L, 2009, Competition in an evolving stochastic market
Mitchell L, Ackland GJ, 2009, Boom and bust in a continuous time evolving economic model, European Physical Journal B, Vol: 70, Pages: 567-573
Mitchell L, Ackland GJ, 2007, Strategy bifurcation and spatial inhomogeneity in a simple model of competing sellers, Europhysics Letters, Vol: 79, Pages: 48003-48003
This data is extracted from the Web of Science and reproduced under a licence from Thomson Reuters. You may not copy or re-distribute this data in whole or in part without the written consent of the Science business of Thomson Reuters.