Imperial College London


Faculty of EngineeringDepartment of Computing

Professor of Software Technology



+44 (0)20 7594 8332p.kelly Website




Level 3 (upstairs), William Penney Building, room 304William Penney LaboratorySouth Kensington Campus





I lead the Software Performance Optimisation group within the Department of Computing, which in turn is part of the Programming Languages and Systems research section.  I am co-Director of Imperial's Centre for Computational Methods in Science and Engineering, and also Director of Industrial Liaison for our Centre for Doctoral Training in High-performance Embedded and Distributed Systems (HiPEDS).

While I have worked in many areas of computer systems, the core of my current work is compiler technology.  Much of my work aims to push the frontiers of compiler research through moving up the "food chain" - exploiting properties and opportunities special to particular classes of application.  This has led me to engage deeply with collaborators in finite element methods, and computer vision.  

For research news please refer to my Departmental home page, at  My Google Scholar page has links to many of my papers.

Selected Publications

Journal Articles

Luporini F, Lange M, Louboutin M, et al., 2020, Architecture and performance of Devito, a system for automated stencil computation, ACM Transactions on Mathematical Software, Vol:46, ISSN:0098-3500, Pages:1-24

Mitchell L, Ham DA, McRae ATT, et al., 2017, Firedrake: automating the finite element method by composing abstractions, Acm Transactions on Mathematical Software, Vol:43, ISSN:1557-7295, Pages:1-27

Kelly PHJ, Reguly IZ, Mudalige GR, et al., 2015, Acceleration of a Full-scale Industrial CFD Application with OP2, Ieee Transactions on Parallel and Distributed Systems, Vol:27, ISSN:1558-2183, Pages:1265-1278

Luporini F, Varbanescu AL, Rathgeber F, et al., 2015, Cross-loop optimization of arithmetic intensity for finite element local assembly, Acm Transactions on Architecture and Code Optimization, Vol:11, ISSN:1544-3973

Kelly PHJ, Russell FP, Wilkinson KA, et al., 2014, Optimised three-dimensional Fourier interpolation: An analysis of techniques and application to a linear-scaling density functional theory code, Computer Physics Communications, ISSN:1879-2944, Pages:8-19

Collingbourne P, Cadar C, Kelly PHJ, 2014, Symbolic Crosschecking of Data-Parallel Floating-Point Code, IEEE Transactions on Software Engineering, Vol:40, ISSN:0098-5589, Pages:710-737

Mudalige GR, Giles MB, Thiyagalingam J, et al., 2013, Design and Initial Performance of a High-level Unstructured Mesh Framework on Heterogeneous Parallel Systems, Parallel Computing, Vol:n/a, ISSN:0167-8191

Russell FP, Kelly PHJ, 2013, Optimized Code Generation for Finite Element Local Assembly Using Symbolic Manipulation, ACM Transactions on Mathematical Software, ISSN:0098-3500

Markall GR, Slemmer A, Ham DA, et al., 2012, Finite element assembly strategies on multi- and many-core architectures, International Journal for Numerical Methods in Fluids

Court C, Kelly PHJ, 2011, Loop-Directed Mothballing: Power Gating Execution Units Using Runtime Loop Analysis, Ieee Micro, Vol:31, Pages:29-38

Giles MB, Gudalige GR, Sharif Z, et al., 2011, Performance Analysis and Optimisation of the OP2 Framework on Many-core Architectures, The Computer Journal, Vol:55

Russell FP, Mellor MR, Kelly PHJ, et al., 2011, DESOLA: An active linear algebra library using delayed evaluation and runtime code generation, Science of Computer Programming, Vol:76, ISSN:0167-6423, Pages:227-242

Giles MB, Mudalige GR, Sharif Z, et al., 2011, Performance analysis of the OP2 framework on many-core architectures, Performance Evaluation Review, Vol:38, ISSN:0163-5999, Pages:9-15

Cantwell CD, Sherwin SJ, Kirby RM, et al., 2011, From h to p Efficiently: Selecting the Optimal Spectral/hp Discretisation in Three Dimensions, Mathematical Modelling of Natural Phenomena, Vol:6, ISSN:0973-5348, Pages:84-96

Cantwell CD, Sherwin SJ, Kirby RM, et al., 2010, From h to p efficiently: Strategy selection for operator evaluation on hexahedral and tetrahedral elements, Computers and Fluids, Vol:43, Pages:23-28

Pearce DJ, Kelly PHJ, Hankin CL, 2007, Efficient field-sensitive pointer analysis of C., Acm Transactions on Programming Languages and Systems (toplas), Vol:30


Kelly, P H J, 1989, Functional Programming for Loosely-coupled Multiprocessors, Pitman/MIT Press


Popovici T, Russell FP, Wilkinson KA, et al., Generating Optimized Fourier Interpolation Routines for Density Function Theory Using SPIRAL, IEEE International Parallel & Distributed Processing Symposium (IPDPS)

Strout MM, Luporini F, Krieger CD, et al., 2014, Generalizing Run-Time Tiling with the Loop Chain Abstraction, 28th IEEE International Parallel & Distributed Processing Symposium, IEEE Press, Pages:1136-1145, ISSN:1530-2075

Chong N, Donaldson AF, Kelly PHJ, et al., 2013, Barrier Invariants: A Shared State Abstraction for the Analysis of Data-Dependent GPU Kernels, 2013 ACM SIGPLAN international conference on Object oriented programming systems languages & applications (OOPSLA'13), ASSOC COMPUTING MACHINERY, Pages:605-621, ISSN:0362-1340

Kelly PH, Konstantinidis A, Ramanujam J, et al., 2013, Parametric GPU Code Generation for Affine Loop Programs, The 26th International Workshop on Languages and Compilers for Parallel Computing, Springer

Markall GR, Rathgeber F, Mitchell L, et al., 2013, Performance-Portable Finite Element Assembly Using PyOP2 and FEniCS, International Supercomputing Conference (ISC), Springer, Pages:279-289, ISSN:0302-9743

Salas-Moreno RF, Newcombe RA, Strasdat H, et al., 2013, SLAM++: Simultaneous Localisation and Mapping at the Level of Objects, Computer Vision and Pattern Recognition, IEEE Press, Pages:1352-1359, ISSN:1063-6919

Bertolli C, Betts A, Mudalige GR, et al., 2013, Compiler optimizations for industrial unstructured mesh CFD applications on GPUs, International Workshop on Languages and Compilers for Parallel Computing (LCPC), Springer, Pages:112-126

Gorman GJ, Southern J, Farrell PE, et al., 2012, Hybrid OpenMP/MPI anisotropic mesh smoothing, International Conference on Computational Science (ICCS), ELSEVIER SCIENCE BV, Pages:1513-1522, ISSN:1877-0509

Collingbourne P, Cadar C, Kelly PHJ, 2011, Symbolic crosschecking of floating-point and SIMD code, ACM, New York, NY, USA, Pages:315-328

Cornwall JLT, Howes LW, Kelly PHJ, et al., 2009, High-performance SIMT code generation in an active visual effects library, ACM Computing Frontiers, ACM Press, Pages:175-184

Howes LW, Lokhmotov A, Donaldson AE, et al., 2009, Deriving Efficient Data Movement from Decoupled Access/Execute Specifications, 4th International Conference on High Performance Embedded Architectures and Compilers, SPRINGER-VERLAG BERLIN, Pages:168-+, ISSN:0302-9743

Beckmann, O., Houghton, A., Mellor, M., et al., 2003, Runtime code generation in C++ as a foundation for domain-specific optimisation, International seminar on domain-specific program generation, Dagstuhl, Germany, 2003, Springer-Verlag, Berlin, Pages:291-306

Yeung, K.C., Kelly, P.H.J., 2003, Optimising Java RMI programs by communication restructuring, ACM/IFIP/UNSENIX international middeware conference, Rio de Janeiro, Brazil, 2003, Springer-Verlag, Berlin, Pages:324-343

Kelly P, Beckmann O, 2000, A Linear Algebra Formulation for Optimising Replication in Data Parallel Programs, Languages and Compilers for Parallel Computing, 12th International Workshop, LCPC'99, La Jolla/San Diego, CA, USA, August 4-6, 1999, Springer

Talbot, S.A.M., Kelly, P.H.J., 1998, Stable Performance for cc-NUMA using First Touch Page Placement and Reactive Proxies, HPCS'98, Kluwer

Jones, R.W.M., Kelly, P.H.J., 1997, Backwards-compatible bounds checking for arrays and pointers in C programs, Third International Workshop on Automated Debugging, Linkoping University Electronic Press

Murray, K., Stiemerling, T., Wilkinson, T., et al., 1994, Angel: Resource Unification in a 64-bit Micro-Kernel, Proceedings of 27th Hawaii International Conference on Systems Science

Darlington, D, Field, et al., 1993, Parallel Programming Using Skeleton Functions, PARLE'93: Parallel Architectures and Languages Europe, Springer LNCS, Pages:146-160

More Publications