54 results found
Witherden F, Vincent P, On nodal point sets for Flux Reconstruction, Journal of Computational and Applied Mathematics, ISSN: 0377-0427
Nodal point sets, and associated collocation projections, play an important role in a range of high-order methods, including Flux Reconstruction (FR) schemes. Historically, efforts have focused on identifying nodal point sets that aim to minimise the L ∞ error of an associated interpolating polynomial. The present work combines a comprehensive review of known approximation theory results, with new results, and numerical experiments, to motivate that in fact point sets for FR should aim to minimise the L² error of an associated interpolating polynomial. New results include identification of a nodal point set that minimises the L² norm of an interpolating polynomial, and a proof of the equivalence between such an interpolating polynomial and an L² approximating polynomial with coefficients obtained using a Gauss–Legendre quadrature rule. Numerical experiments confirm that FR errors can be reduced by an order-ofmagnitude by switching from popular point sets such as Chebyshev, Chebyshev–Lobatto and Legendre–Lobatto to Legendre point sets.
Loppi NA, Witherden FD, Jameson A, et al., 2019, Locally adaptive pseudo-time stepping for high-order Flux Reconstruction, Journal of Computational Physics, Vol: 399, ISSN: 0021-9991
This paper proposes a novel locally adaptive pseudo-time stepping convergence acceleration technique for dual time stepping which is a common integration method for solving unsteady low-Mach preconditioned/incompressibleNavier-Stokes formulations. In contrast to standard local pseudo-time stepping techniques that are based on computing the local pseudo-time stepsdirectly from estimates of the local Courant-Friedrichs-Lewy limit, the proposed technique controls the local pseudo-time steps using local truncationerrors which are computed with embedded pair RK schemes. The approachhas three advantages. First, it does not require an expression for the characteristic element size, which are difficult to obtain reliably for curved mixedelement meshes. Second, it allows a finer level of locality for high-ordernodal discretisations, such as FR, since the local time-steps can vary between solution points and field variables. Third, it is well-suited to beingcombined with P-multigrid convergence acceleration. Results are presentedfor a laminar 2D cylinder test case at Re = 100. A speed-up factor of 4.16is achieved compared to global pseudo-time stepping with an RK4 scheme,while maintaining accuracy. When combined with P-multigrid convergenceacceleration a speed-up factor of over 15 is achieved. Detailed analysis ofthe results reveals that pseudo-time steps adapt to element size/shape, solution state, and solution point location within each element. Finally, resultsare presented for a turbulent 3D SD7003 airfoil test case at Re = 60, 000.Speed-ups of similar magnitude are observed, and the flow physics is foundto be in good agreement with previous studies.
Zhou X, Vincent P, Zhou X, et al., 2019, Optimization of 3-D Divergence-Free Flow Field Reconstruction Using 2-D Ultrasound Vector Flow Imaging, Ultrasound in Medicine and Biology, Vol: 45, Pages: 3042-3055, ISSN: 0301-5629
Abstract- 3D blood Vector Flow Imaging (VFI) is of great value for understanding and detecting cardiovascular diseases. Currently 3D Ultrasound (US) VFI requires 2D matrix probes which are expensive and suffer from sub-optimal image quality. Our recent study proposed an interpolation algorithm to obtain a divergence free reconstruction of the 3D flow field from 2D velocities obtained by High Frame Rate US Particle Imaging Velocimetry (HFR echo-PIV, also known as HFR UIV), using a 1D array transducer. This work aims to significantly improve the accuracy and reduce the time-to-solution of our previous approach thereby paving the way for clinical translation. More specifically, accuracy was improved by optimising the divergence free basis to reduce Runge-phenomena near domain boundaries, and time-to-solution was reduced by demonstrating that under certain conditions the resulting system could be solved using widely available and highly optimized Generalized Minimum Residual (GMRES) algorithms. To initially demonstrate the utility of the approach, coarse 2D sub-samplings of an analytical unsteady Womersely flow solution and a steady helical flow solution obtained using Computational Fluid Dynamics (CFD) were used to successfully reconstruct full flow solutions, with 0.82% and 4.8% average relative errors in the velocity field respectively. Subsequently, multi-plane 2D velocity fields were obtained through HFR UIV for a straight tube phantom and a carotid bifurcation phantom, from which full 3D flow fields were reconstructed. These were then compared with flow fields obtained via CFD in each of the two configurations, and average relative errors of 6.01% and 12.8% in the velocity field were obtained. These results reflect 15%-75% improvements in accuracy and 53~874 fold acceleration of reconstruction speeds for the four cases, when compared with the previous divergence free flow reconstruction method. In conclusion the proposed method provides an effective and fast metho
Iyer A, Witherden F, Chernyshenko S, et al., 2019, Identifying eigenmodes of averaged small-amplitude perturbations to turbulent channel flow, Journal of Fluid Mechanics, Vol: 875, Pages: 758-780, ISSN: 0022-1120
Eigenmodes of averaged small-amplitude perturbations to a turbulent channel flow — which is one of the most fundamental canonical flows — are identified for the first time via an extensive set of high-fidelity GPU-accelerated direct numerical simulations. While the system governing averaged small-amplitude perturbations to turbulent channel flow remains unknown, the fact such eigenmodes can be identified constitutes direct evidence that it is linear. Moreover, while the eigenvalue associated with the slowest-decaying anti-symmetric eigenmode mode is found to be real, the eigenvalue associated with the slowest-decaying symmetric eigenmode mode is found to be complex. This indicates that the unknown linear system governing the evolution of averaged small-amplitude perturbations cannot be self-adjoint, even for the case of a uni-directional flow. In addition to elucidating aspects of the flow physics, the findings provide guidance for development of new unsteady Reynolds-averaged Navier-Stokes turbulence models, and constitute a new and accessible benchmark problem for assessing the performance of existing models,which are used widely throughout industry.
Koch MK, Kelly PHJ, Vincent PE, 2019, Towards in-situ vortex identification for peta-scale CFD using contour trees, 8th IEEE Symposium on Large-Scale Data Analysis and Visualization (LDAV), Publisher: Institute of Electrical and Electronics Engineers, Pages: 104-105
Turbulent flows exist in many fields of science and occur in a wide range of engineering applications. While in the past broad knowledge has been established regarding the statistical properties of turbulence at a range of Reynolds numbers, there is a lack of under-standing of the detailed structure of these flows. Since the physical processes involve a vast number of structures, extremely large data sets are required to fully resolve a flow field in both space and time. To make the analysis of such data sets possible, we propose a frame-work that uses state-of-the-art contour tree construction algorithms to identify, classify and track vortices in turbulent flow fields produced by large-scale high-fidelity massively-parallel computational fluid dynamics solvers such as PyFR. Since disk capacity and I/O have become a bottleneck for such large-scale simulations, the proposed framework will be applied in-situ, while relevant data is still in device memory.
Vermeire B, Loppi N, Vincent P, 2019, Optimal Runge-Kutta schemes for pseudo time-stepping with high-order unstructured methods, Journal of Computational Physics, Vol: 383, Pages: 55-71, ISSN: 0021-9991
In this study we generate optimal Runge-Kutta (RK) schemes forconverging the Artificial Compressibility Method (ACM) using dualtime-stepping with high-order unstructured spatial discretizations. Wepresent optimal RK schemes with between s = 2 and s = 7 stages forSpectral Difference (SD) and Discontinuous Galerkin (DG) discretizations obtained using the Flux Reconstruction (FR) approach withsolution polynomial degrees of k = 1 to k = 8. These schemes are optimal in the context of linear advection with predicted speedup factors inexcess of 1.80× relative to a classical RK4,4 scheme. Speedup factors ofbetween 1.89× and 2.11× are then observed for incompressible ImplicitLarge Eddy Simulation (ILES) of turbulent flow over an SD7003 airfoil.Finally, we demonstrate the utility of the schemes for incompressibleILES of a turbulent jet, achieving good agreement with experimental data. The results demonstrate that the optimized RK schemesare suitable for simulating turbulent flows and can achieve significantspeedup factors when converging the ACM using dual time-steppingwith high-order unstructured spatial discretizations.
Zhou X, Papadopoulou V, Leow CH, et al., 2019, 3-D flow reconstruction using divergence-free interpolation of multiple 2-D contrast-enhanced ultrasound particle imaging velocimetry measurements, Ultrasound in Medicine and Biology, Vol: 45, Pages: 795-810, ISSN: 0301-5629
Quantification of 3-D intravascular flow is valuable for studying arterial wall diseases but currently there is a lack of effective clinical tools for this purpose. Divergence-free interpolation (DFI) using radial basis function (RBF) is an emerging approach for full-field flow reconstruction using experimental sparse flow field samples. Previous DFI reconstructs full-field flow from scattered 3-D velocity input obtained using phase-contrast magnetic resonance imaging with low temporal resolution. In this study, a new DFI algorithm is proposed to reconstruct full-field flow from scattered 2-D in-plane velocity vectors obtained using ultrafast contrast-enhanced ultrasound (>1000 fps) and particle imaging velocimetry. The full 3-D flow field is represented by a sum of weighted divergence-free RBFs in space. Because the acquired velocity vectors are only in 2-D and hence the problem is ill-conditioned, a regularized solution of the RBF weighting is achieved through singular value decomposition (SVD) and the L-curve method. The effectiveness of the algorithm is determined via numerical experiments for Poiseuille flow and helical flow with added noise, and it is found that an accuracy as high as 95.6% can be achieved for Poiseuille flow (with 5% input noise). Experimental feasibility is also determined by reconstructing full-field 3-D flow from experimental 2-D ultrasound image velocimetry measurements in a carotid bifurcation phantom. The method is typically faster for a range of problems compared with computational fluid dynamics, and has been found to be effective for the three flow cases.
Zhou X, Zhou X, Leow CH, et al., 2018, 3D Flow Reconstruction and Wall Shear Stress Evaluation with 2D Ultrafast Ultrasound Particle Imaging Velocimetry, IEEE International Ultrasonics Symposium (IUS), Publisher: IEEE, ISSN: 1948-5719
Loppi N, Witherden F, Jameson A, et al., 2018, A high-order cross-platform incompressible Navier-Stokes solver via artificial compressibility with application to a turbulent jet, Computer Physics Communications, Vol: 233, Pages: 193-205, ISSN: 0010-4655
Modern hardware architectures such as GPUs and manycore processors are characterised by an abundance of compute capability relative to memory bandwidth. This makes them well-suited to solving temporally explicit and spatially compact discretisations of hyperbolic conservation laws. However, classical pressure-projection-based incompressible Navier–Stokes formulations do not fall into this category. One attractive formulation for solving incompressible problems on modern hardware is the method of artificial compressibility. When combined with explicit dual time stepping and a high-order Flux Reconstruction discretisation, the majority of operations can be cast as compute bound matrix–matrix multiplications that are well-suited for GPU acceleration and manycore processing. In this work, we develop a high-order cross-platform incompressible Navier–Stokes solver, via artificial compressibility and dual time stepping, in the PyFR framework. The solver runs on a range of computer architectures, from laptops to the largest supercomputers, via a platform-unified templating approach that can generate/compile CUDA, OpenCL and C/OpenMP code at runtime. The extensibility of the cross-platform templating framework defined within PyFR is clearly demonstrated, as is the utility of -multigrid for convergence acceleration. The platform independence of the solver is verified on Nvidia Tesla P100 GPUs and Intel Xeon Phi 7210 KNL manycore processors with a 3D Taylor–Green vortex test case. Additionally, the solver is applied to a 3D turbulent jet test case at , and strong scaling is reported up to 144 GPUs. The new software constitutes the first high-order accurate cross-platform implementation of an incompressible Navier–Stokes solver via artificial compressibility and -multigrid accelerated dual time stepping to be published in the literature. The technology has applications in a range of sectors, including the maritime and automotive industries. Moreove
Corbett RW, Grechy L, Iori F, et al., 2018, Heterogeneity in the non-planarity and arterial curvature of arteriovenous fistulae in vivo, Journal of Vascular Surgery, Vol: 68, Pages: 152s-163s, ISSN: 0741-5214
Objective: Native arteriovenous fistulae (AVF) for haemodialysis are susceptible to non-maturation. Adverse features of local blood flow have been implicated in the formation of peri-anastomotic neointimal hyperplasia which may underpin non-maturation. While computational fluid dynamic simulations of idealised models highlight the importance of geometry on fluid and vessel wall interactions, little is known in vivo about AVF geometry and its role in adverse clinical outcomes. This study set out to examine the three-dimensional geometry of native AVF and the geometric correlates of AVF failure.Methods: As part of an observational study between 2013 and 2016, patients underwent creation of an upper limb AVF according to current surgical best practice. Phase-contrast MRI was performed on the day of surgery to obtain luminal geometry along with ultrasound measurements of flow. MRI datasets were segmented and reconstructed for quantitative and qualitative analysis of local geometry. Clinical maturation was evaluated at six weeks. Results: 60 patients were successfully imaged on the day of surgery. Radiocephalic (n=17), brachiocephalic (n=40) and brachiobasilic (n=3) fistulae were all included in the study. Centrelines extracted from segmented vessel lumen exhibited significant heterogeneity in arterial non-planarity and curvature. Furthermore, these features are more marked in brachiocephalic as compared to radiocephalic fistulae. Across the cohort, the projected bifurcation angle was was 73° (±16°) mean (±sd). Geometry was preserved at two weeks in 20 patients who underwent repeat imaging. A greater degree of arterial non-planarity (log odds ratio (logOR) 0.95 per 0.1/vessel diameter (95% CI 0.22 to 1.90, P= .03)) along with a larger bifurcation angle (logOR 0.05 per degree (95% CI 0.01 to 0.09, P= .02)) are associated with a great rate of maturation, as is fistula location (upper vs lower arm) logOR -1.9 (95% CI -3.2 to 0.7, P = .002) . Con
Park JS, Witherden FD, Vincent PE, 2017, High-order accurate implicit Large Eddy Simulations of flow over a NACA0021 aerofoil in deep stall, AIAA Journal, ISSN: 1533-385X
In this study the GPU-accelerated solver PyFR is used to simu-late flow over a NACA0021 aerofoil in deep stall at a Reynolds numberof 270, 000 using the high-order Flux Reconstruction (FR) approach.Wall-resolved Implicit Large Eddy Simulations (ILES) are undertakenon unstructured hexahedral meshes at fourth- and fifth-order accuracyin space. It was found that either modal filtering, or anti-aliasing viaan approximate L2 projection, is required in order to stabilise simu-lations. Time-span averaged pressure coefficient distributions on theaerofoil, and associated lift and drag coefficients, are seen to converge towards experimental data as the simulation setup is made more realis-tic by increasing the aerofoil span. Indeed, the lift and drag coefficientsobtained by fifth-order ILES with anti-aliasing via an approximateL2projection agree better with experimental data than a wide range ofprevious studies. Stabilisation via modal filtering, however, is foundto reduce solution accuracy. Finally, performance of various PyFRsimulations is compared, and it is found that fifth-order simulationswith anti-aliasing via an L2 projection are the most efficient. Resultsindicate that high-order FR schemes with anti-aliasing via anL2 projection are a good candidate for underpinning accurate wall-resolved ILES of separated, turbulent flows over complex engineering geometries.
Grechy L, Iori F, Corbett R, et al., 2017, Suppressing Unsteady Flow in Arterio-Venous Fistulae, Physics of Fluids, Vol: 29, ISSN: 1070-6631
Arterio-Venous Fistulae (AVF) are regarded as the “gold standard” method of vascular access forpatients with end-stage renal disease who require haemodialysis. However, a large proportion ofAVF do not mature, and hence fail, as a result of various pathologies such as Intimal Hyperplasia(IH). Unphysiological flow patterns, including high-frequency flow unsteadiness, associated with theunnatural and often complex geometries of AVF are believed to be implicated in the developmentof IH. In the present study, we employ a Mesh Adaptive Direct Search optimisation framework,computational fluid dynamics simulations, and a new cost function to design a novel non-planar AVFconfiguration that can suppress high-frequency unsteady flow. A prototype device for holding an AVFin the optimal configuration is then fabricated, and proof-of-concept is demonstrated in a porcinemodel. Results constitute the first use of numerical optimisation to design a device for suppressingpotentially pathological high-frequency flow unsteadiness in AVF.
Vincent P, Grechy L, Corbett R, 2017, A device for maintaining vascular connections
There is provided a device for maintaining a vascular connection comprising a vein- supporting section and an artery-supporting section. The centreline of the vein-supporting section and the centreline of the artery-supporting section meet at an intersection point which defines the origin of a right-handed Cartesian coordinate system. The centreline of the artery-supporting section is arcuate and lies in the region y<=0 and has a tangent parallel to the x axis at the origin and wherein the artery-supporting section is configured to carry blood flow in a direction from negative x towards positive x. A tangent of the centreline of the vein-supporting section at the origin has direction [cos (Θ) sin (Φ), sin (Θ) sin (Φ), ± cos (Φ)], where Φ is in the range 225 to 270 degrees and Θ is in the range 200 to 300 degrees.
grechy L, iori F, corbett R, et al., 2017, The Effect of Arterial Curvature on Blood Flow in Arterio-Venous Fistulae: Realistic Geometries and Pulsatile Flow, Cardiovascular Engineering and Technology, Vol: 8, Pages: 313-329, ISSN: 1869-4098
Arterio-Venous Fistulae (AVF) are regarded as the “gold standard” method of vascular access for patients with End-Stage Renal Disease (ESRD) who require haemodialysis. However, up to 60% of AVF do not mature, and hence fail, as a result of Intimal Hyperplasia (IH). Unphysiological flow and oxygen transport patterns, associated with the unnatural and often complex geometries of AVF, are believed to be implicated in the development of IH. Previous studies have investigated the effect of arterial curvature on blood flow in AVF using idealized planar AVF configurations and non-pulsatile inflow conditions. The present study takes an important step forwards by extending this work to more realistic non-planar brachiocephalic AVF configurations with pulsatile inflow conditions. Results show that forming an AVF by connecting a vein onto the outer curvature of an arterial bend does not, necessarily, suppress unsteady flow in the artery. This finding is converse to results from a previous more idealized study. However, results also show that forming an AVF by connecting a vein onto the inner curvature of an arterial bend can suppress exposure to regions of low wall shear stress and hypoxia in the artery. This finding is in agreement with results from a previous more idealized study. Finally, results show that forming an AVF by connecting a vein onto the inner curvature of an arterial bend can significantly reduce exposure to high WSS in the vein. The results are important, as they demonstrate that in realistic scenarios arterial curvature can be leveraged to reduce exposure to excessively low/high levels of WSS and regions of hypoxia in AVF. This may in turn reduce rates of IH and hence AVF failure.
Park JS, Witherden FD, Vincent PE, 2017, High-Order Implicit Large-Eddy Simulations of Flow over a NACA0021 Aerofoil, AIAA JOURNAL, Vol: 55, Pages: 2186-2197, ISSN: 0001-1452
In this study the graphical-processing-unit-accelerated solver PyFR is used to simulate flow over a NACA0021 aerofoil in deep stall at a Reynolds number of 270,000 using the high-order flux reconstruction approach.Wall-resolved implicit large-eddy simulations are undertaken on unstructured hexahedral meshes at fourth- and fifth-order accuracy in space. It was found that either modal filtering or antialiasing via an approximate L2 projection is required in order to stabilize simulations. Time-span-averaged pressure coefficient distributions on the aerofoil and associated lift and drag coefficients are seen to converge toward experimental data as the simulation setup is made more realistic by increasing the aerofoil span. Indeed, the lift and drag coefficients obtained by fifth-order implicit large-eddy simulation with antialiasing via an approximate L2 projection agree better with experimental data than a wide range of previous studies. Stabilization via modal filtering, however, is found to reduce solution accuracy. Finally, performance of various PyFR simulations is compared, and it is found that fifth-order simulations with antialiasing via an L2 projection are the most efficient. Results indicate that high-order flux reconstruction schemes with antialiasing via an L2 projection are a good candidate for underpinning accurate wall- resolved implicit large-eddy simulation of separated, turbulent flows over complex engineering geometries.
Vermeire BC, Witherden, Vincent PE, 2017, On the utility of GPU accelerated high-order methods for unsteady flow simulations: a comparison with industry-standard tools, Journal of Computational Physics, Vol: 334, Pages: 497-521, ISSN: 0021-9991
First- and second-order accurate numerical methods, implemented forCPUs, underpin the majority of industrial CFD solvers. Whilst this technologyhas proven very successful at solving steady-state problems via aReynolds Averaged Navier-Stokes approach, its utility for undertaking scaleresolvingsimulations of unsteady flows is less clear. High-order methods forunstructured grids and GPU accelerators have been proposed as an enablingtechnology for unsteady scale-resolving simulations of flow over complexgeometries. In this study we systematically compare accuracy and cost ofthe high-order Flux Reconstruction solver PyFR running on GPUs and theindustry-standard solver STAR-CCM+ running on CPUs when applied to arange of unsteady flow problems. Specifically, we perform comparisons ofaccuracy and cost for isentropic vortex advection (EV), decay of the Taylor-Green vortex (TGV), turbulent flow over a circular cylinder, and turbulent flowover an SD7003 aerofoil. We consider two configurations of STAR-CCM+: asecond-order configuration, and a third-order configuration, where the latterwas recommended by CD-Adapco for more effective computation of unsteadyflow problems. Results from both PyFR and Star-CCM+ demonstrate thatthird-order schemes can be more accurate than second-order schemes for agiven cost e.g. going from second- to third-order, the PyFR simulations of theEV and TGV achieve 75x and 3x error reduction respectively for the same orreduced cost, and STAR-CCM+ simulations of the cylinder recovered wakestatistics significantly more accurately for only twice the cost. Moreover,advancing to higher-order schemes on GPUs with PyFR was found to offereven further accuracy vs. cost benefits relative to industry-standard tools.
Vermeire BC, Vincent PE, 2016, On the behaviour of fully-discrete flux reconstruction schemes, Computer Methods in Applied Mechanics and Engineering, Vol: 315, Pages: 1053-1079, ISSN: 0045-7825
In this study we employ von Neumann analyses to investigate the disper-sion, dissipation, group velocity, and error properties of several fully discreteflux reconstruction (FR) schemes. We consider three FR schemes pairedwith two explicit Runge-Kutta (ERK) schemes and two singly diagonallyimplicit RK (SDIRK) schemes. Key insights include the dependence ofhigh-wavenumber numerical dissipation, relied upon for implicit large eddysimulation (ILES), on the choice of temporal scheme and time-step size.Also, the wavespeed characteristics of fully-discrete schemes and the relativedominance of temporal and spatial errors as a function of wavenumber andtime-step size are investigated. Salient properties from the aforementionedtheoretical analysis are then demonstrated in practice using linear advectiontest cases. Finally, a Burgers turbulence test case is used to demonstrate theimportance of the temporal discretisation when using FR schemes for ILES.
Witherden FD, Vincent PE, Jameson A, 2016, High-Order Flux Reconstruction Schemes, Handbook of Numerical Analysis, Pages: 227-263
© 2016 Elsevier B.V. There is an increasing desire among industrial practitioners of computational fluid dynamics to undertake high-fidelity scale-resolving simulations of unsteady flows within the vicinity of complex geometries. Such simulations require numerical methods that can operate on unstructured meshes with low numerical dissipation. The flux reconstruction (FR) approach describes one such family of numerical methods, which includes a particular type of collocation-based nodal discontinuous Galerkin method, and spectral difference methods, as special cases. In this chapter we describe the current state-of-the-art surrounding research into FR methods. To begin, FR is described in one dimension for both advection and advection–diffusion problems. This is followed by a description of its extension to multidimensional tensor product and simplex elements. Stability and accuracy issues are then discussed, including an overview of energy-stability proofs, von Neumann analysis results, and stability characteristics when the flux function of the governing system is nonlinear. Finally, implementation aspects are outlined in the context of modern hardware platforms, and three example applications of FR are presented, demonstrating the potential utility of FR schemes for scale resolving simulation of unsteady flow problems.
Vincent PE, Witherden FD, Vermeire, et al., Towards green aviation with Python at petascale, International Conference for High Performance Computing, Networking, Storage and Analysis, Publisher: IEEE
Accurate simulation of unsteady turbulentflow is critical for improved design of greener aircraftthat are quieter and more fuel-efficient. We demonstrateapplication of PyFR, a Python based computational fluiddynamics solver, to petascale simulation of such flowproblems. Rationale behind algorithmic choices, whichoffer increased levels of accuracy and enable sustainedcomputation at up to 58% of peak DP-FLOP/s on unstruc-tured grids, will be discussed in the context of modernhardware. A range of software innovations will also bedetailed, including use of runtime code generation, whichenables PyFR to efficiently target multiple platforms,including heterogeneous systems, via a single implemen-tation. Finally, results will be presented from a full-scale simulation of flow over a low-pressure turbine bladecascade, along with weak/strong scaling statistics from thePiz Daint and Titan supercomputers, and performancedata demonstrating sustained computation at up to 13.7DP-PFLOP/s.
Vermeire BC, Vincent PE, 2016, On the Properties of Energy Stable Flux Reconstruction Schemes for Implicit Large Eddy Simulation, Journal of Computational Physics, Vol: 327, Pages: 368-388, ISSN: 0021-9991
We begin by investigating the stability, order of accuracy, and dispersionand dissipation characteristics of the extended range of energy stable fluxreconstruction (E-ESFR) schemes in the context of implicit large eddy simulation(ILES). We proceed to demonstrate that subsets of the E-ESFR schemesare more stable than collocation nodal discontinuous Galerkin methods recoveredwith the flux reconstruction approach (FRDG) for marginally-resolvedILES simulations of the Taylor-Green vortex. These schemes are shown tohave reduced dissipation and dispersion errors relative to FRDG schemes ofthe same polynomial degree and, simultaneously, have increased CourantFriedrichs-Lewy(CFL) limits. Finally, we simulate turbulent flow over anSD7003 aerofoil using two of the most stable E-ESFR schemes identifiedby the aforementioned Taylor-Green vortex experiments. Results demonstratethat subsets of E-ESFR schemes appear more stable than the commonlyused FRDG method, have increased CFL limits, and are suitable for ILES ofcomplex turbulent flows on unstructured grids.
Witherden FD, Park JS, Vincent PE, 2016, An Analysis of Solution Point Coordinates for Flux Reconstruction Schemes on Tetrahedral Elements, Journal of Scientific Computing, Vol: 69, Pages: 905-920, ISSN: 0885-7474
The flux reconstruction (FR) approach offers an efficient route to high-order accuracy on unstructured grids. In this work we study the effect of solution point placement on the stability and accuracy of FR schemes on tetrahedral grids. To accomplish this we generate a large number of solution point candidates that satisfy various criteria at polynomial orders ℘=3,4,5℘=3,4,5 . We then proceed to assess their properties by using them to solve the non-linear Euler equations on both structured and unstructured meshes. The results demonstrate that the location of the solution points is important in terms of both the stability and accuracy. Across a range of cases it is possible to outperform the solution points of Shunn and Ham for specific problems. However, there appears to be a degree of problem-dependence with regards to the optimal point set, and hence overall it is concluded that the Shunn and Ham points offer a good compromise in terms of practical utility.
Wozniak BD, Witherden FD, Russell FP, et al., 2016, GiMMiK - Generating Bespoke Matrix Multiplication Kernels for Accelerators: Application to High-Order Computational Fluid Dynamics, Computer Physics Communications, Vol: 202, Pages: 12-22, ISSN: 0010-4655
Matrix multiplication is a fundamental linear algebra routineubiquitous in all areas of science and engineering. Highly optimised BLAS libraries (cuBLAS and clBLAS on GPUs) are the most popular choices for an implementation of the General Matrix Multiply (GEMM) in software. In this paper we present GiMMiK - a generator of bespoke matrix multiplication kernels for the CUDA and OpenCL platforms. GiMMiK exploits a prior knowledge of the operator matrix to generate highly performant code. The performance of GiMMiK's kernels is particularly apparent in a block-bypanel type of matrix multiplication, where the block matrix is typically small (e.g. dimensions of 96 × 64). Such operations are characteristic to our motivating application in PyFR - an implementation of Flux Reconstruction schemes for high-order fluid flow simulations on mixed unstructured meshes. GiMMiK fully unrolls the matrix-vector product and embeds matrix entries directly in the code to benefit from the use of the constant cache and compiler optimisations. Further, it reduces the number of floating-point operations by removing multiplications by zeros. Together with the ability of our kernels to avoid the poorly optimised cleanup code, executed by library GEMM, we are able to outperform cuBLAS on two NVIDIA GPUs: GTX 780 Ti and Tesla K40c. We observe speedups of our kernels over cuBLAS GEMM of up to 9.98 and 63.30 times for a 294 × 1029 99% sparse PyFR matrix in double precision on the Tesla K40c and GTX 780 Ti correspondingly. In single precision, observed speedups reach 12.20 and 13.07 times for a 4 × 8 50% sparse PyFR matrix on the two aforementioned cards. Using GiMMiK as the matrix multiplication kernel provider allows us to achieve a speedup of up to 1.70 (2.19) for a simulation of an unsteady flow over a cylinder executed with PyFR in double (single) precision on the Tesla K40c. All results were generated with GiMMiK version 1.0.
Mengaldo G, De Grazia D, Vincent PE, et al., 2015, On the connections between discontinuous Galerkin and flux reconstruction schemes: extension to curvilinear meshes, Journal of Scientific Computing, Vol: 67, Pages: 1272-1292, ISSN: 1573-7691
This paper investigates the connections between many popular variants of the well-established discontinuous Galerkin method and the recently developed high-order flux reconstruction approach on irregular tensor-product grids. We explore these connections by analysing three nodal versions of tensor-product discontinuous Galerkin spectral element approximations and three types of flux reconstruction schemes for solving systems of conservation laws on irregular tensor-product meshes. We demonstrate that the existing connections established on regular grids are also valid on deformed and curved meshes for both linear and nonlinear problems, provided that the metric terms are accounted for appropriately. We also find that the aliasing issues arising from nonlinearities either due to a deformed/curved elements or due to the nonlinearity of the equations are equivalent and can be addressed using the same strategies both in the discontinuous Galerkin method and in the flux reconstruction approach. In particular, we show that the discontinuous Galerkin and the flux reconstruction approach are equivalent also when using higher-order quadrature rules that are commonly employed in the context of over- or consistent-integration-based dealiasing methods. The connections found in this work help to complete the picture regarding the relations between these two numerical approaches and show the possibility of using over- or consistent-integration in an equivalent manner for both the approaches.
leow, iori F, corbett R, et al., 2015, Microbubble void imaging – a non-invasive technique for flow visualisation and quantification of mixing in large vessels using plane wave ultrasound and controlled microbubble contrast agent destruction, Ultrasound in Medicine and Biology, Vol: 41, Pages: 2926-2937, ISSN: 0301-5629
Kahk JM, Villar-Garcia IJ, Grechy L, et al., 2015, A study of the pressure profiles near the first pumping aperture in a high pressure photoelectron spectrometer, Journal of Electron Spectroscopy and Related Phenomena, Vol: 205, Pages: 57-65, ISSN: 1873-2526
In a high-pressure photoelectron spectrometer, the sample is positioned close to a differential pumping aperture, behind which the pressure is several orders of magnitude lower than the pressure in the analysis chamber. To find the optimal sample position, where the path length of the photoelectrons through the high pressure region is minimized as far as possible without compromising knowledge of the actual pressure at the sample surface, an understanding of the pressure variations near the sample and the aperture is required. A computational fluid dynamics study has been carried out to examine the pressure profiles, and the results are compared against experimental spectra whose intensities are analyzed using the Beer–Lambert law. The resultant pressure profiles are broadly similar to the one previously derived from a simplistic molecular flow model, but indicate that as the pressure in the analysis chamber is raised, the region over which the pressure drop occurs becomes progressively narrower.
Vincent PE, Farrington, Witherden, et al., 2015, An extended range of stable-symmetric-conservative flux reconstruction correction functions, Computer Methods in Applied Mechanics and Engineering, Vol: 296, Pages: 248-272, ISSN: 0045-7825
The Flux Reconstruction (FR) approach offers an efficient route to achieving high-order accuracy on unstructured grids. Additionally, FR offers a flexible framework for defining a range of numerical schemes in terms of so-called FR correction functions. Recently, a one-parameter family of FR correction functions were identified that lead to stable schemes for 1D linear advection problems. In this study we develop a procedure for identifying an extended range of stable, symmetric, and conservative FR correction functions. The procedure is applied to identify ranges of such correction functions for various orders of accuracy. Numerical experiments are undertaken, and the results found to be in agreement with the theoretical findings.
Witherden, Vermeire, Vincent PE, 2015, Heterogeneous computing on mixed unstructured grids with PyFR, Computers & Fluids, Vol: 120, Pages: 173-186, ISSN: 0045-7930
PyFR is an open-source high-order accurate computational fluid dynamics solver for unstructured grids. In this paper we detail how PyFR has been extended to run on mixed element meshes, and a range of hardware platforms, including heterogeneous multi-node systems. Performance of our implementation is benchmarked using pure hexahedral and mixed prismatic-tetrahedral meshes of the void space around a circular cylinder. Specifically, for each mesh performance is assessed at various orders of accuracy on three different hardware platforms; an NVIDIA Tesla K40c GPU, an Intel Xeon E5-2697 v2 CPU, and an AMD FirePro W9100 GPU. Performance is then assessed on a heterogeneous multi-node system constructed from a mix of the aforementioned hardware. Results demonstrate that PyFR achieves performance portability across various hardware platforms. In particular, the ability of PyFR to target individual platforms with their ‘native’ language leads to significantly enhanced performance cf. targeting each platform with OpenCL alone. PyFR is also found to be performant on the heterogeneous multi-node system, achieving a significant fraction of the available FLOP/s. Finally, each mesh is used to undertake nominally fifth-order accurate long-time simulations of unsteady flow over a circular cylinder at a Reynolds number of 3900 using a cluster of NVIDIA K20c GPUs. Long-time dynamics of the wake are studied in detail, and results are found to be in excellent agreement with previous experimental/numerical data. All results were obtained with PyFR v0.2.2, which is freely available under a 3-Clause New Style BSD license (see www.pyfr.org).
Mengaldo G, De Grazia D, Moxey D, et al., Dealiasing techniques for high-order spectral element methods on regular and irregular grids, Journal of Computational Physics, Vol: 299, Pages: 56-81, ISSN: 0021-9991
Witherden FD, Vincent PE, 2015, On the identification of symmetric quadrature rules for finite element methods, COMPUTERS & MATHEMATICS WITH APPLICATIONS, Vol: 69, Pages: 1232-1241, ISSN: 0898-1221
Vincent PE, Witherden FD, Farrington AM, et al., 2015, PyFR: Next-generation high-order computational fluid dynamics on many-core hardware
© 2015, American Institute of Aeronautics and Astronautics Inc, AIAA. All rights reserved. High-order numerical methods for unstructured grids combine the superior accuracy of high-order spectral or finite difference methods with the geometric flexibility of low-order finite volume or finite element schemes. The Flux Reconstruction (FR) approach unifles various high-order schemes for unstructured grids within a single framework. Additionally, the FR approach exhibits a signiflcant degree of element locality, and is thus able to run efflciently on modern many-core hardware platforms, such as Graphical Processing Units (GPUs). The aforementioned properties of FR mean it offers a promising route to per-forming affordable, and hence industrially relevant, scale-resolving simulations of hitherto intractable unsteady flows within the vicinity of real-world engineering geometries. Here we present PyFR, an open-source Python based framework for solving advection-diffusion type problems using the FR approach. The framework is designed to solve a range of governing systems on mixed unstructured grids containing various element types. It is also designed to target a range of hardware platforms via use of a custom Mako-derived domain specific language. Specifically, the current release of PyFR is able to solve the compressible Euler and Navier-Stokes equations on grids of quadrilateral and triangular elements in two dimensions, and hexahedral, tetrahedral, prismatic, and pyramidal elements in three dimensions, targeting clusters of multi-core CPUs, NVIDIA GPUs (K20, K40 etc.), AMD GPUs (S10000, W9100 etc.), and heterogeneous mixtures thereof. Results will be presented for various benchmark and real-world' flow problems. PyFR is freely available under an open-source 3-Clause New-Style BSD license (www.pyfr.org).
This data is extracted from the Web of Science and reproduced under a licence from Thomson Reuters. You may not copy or re-distribute this data in whole or in part without the written consent of the Science business of Thomson Reuters.