Imperial College London


Faculty of EngineeringDepartment of Computing

Professor of Software Technology



+44 (0)20 7594 8332p.kelly Website




Level 3 (upstairs), William Penney Building, room 304Huxley BuildingSouth Kensington Campus





I lead the Software Performance Optimisation group within the Department of Computing, which in turn is part of the Programming Languages and Systems research section.  I am co-Director of Imperial's Centre for Computational Methods in Science and Engineering, and also Director of Industrial Liaison for our Centre for Doctoral Training in High-performance Embedded and Distributed Systems (HiPEDS).

While I have worked in many areas of computer systems, the core of my current work is compiler technology.  Much of my work aims to push the frontiers of compiler research through moving up the "food chain" - exploiting properties and opportunities special to particular classes of application.  This has led me to engage deeply with collaborators in finite element methods, and computer vision.  

For research news please refer to my Departmental home page, at  My Google Scholar page has links to many of my papers.

Selected Publications

Journal Articles

Rathgeber F, Ham DA, Mitchell L, et al., 2017, Firedrake: Automating the Finite Element Method by Composing Abstractions, ACM Transactions on Mathematical Software, Vol:43, ISSN:0098-3500

Reguly IZ, Mudalige GR, Bertolli C, et al., 2016, Acceleration of a Full-Scale Industrial CFD Application with OP2, IEEE Transactions on Parallel and Distributed Systems, Vol:27, ISSN:1045-9219, Pages:1265-1278

Luporini F, Varbanescu AL, Rathgeber F, et al., 2014, Cross-Loop Optimization of Arithmetic Intensity for Finite Element Local Assembly, ACM Transactions on Architecture and Code Optimization, Vol:11, ISSN:1544-3566

Russell FP, Wilkinson KA, Kelly PHJ, et al., 2015, Optimised three-dimensional Fourier interpolation: An analysis of techniques and application to a linear-scaling density functional theory code, Computer Physics Communications, Vol:187, ISSN:0010-4655, Pages:8-19

Collingbourne P, Cadar C, Kelly PHJ, 2014, Symbolic Crosschecking of Data-Parallel Floating-Point Code, IEEE Transactions on Software Engineering, Vol:40, ISSN:0098-5589, Pages:710-737

Cantwell CD, Sherwin SJ, Kirby RM, et al., 2011, From h to p efficiently: Strategy selection for operator evaluation on hexahedral and tetrahedral elements, Computers & Fluids, Vol:43, ISSN:0045-7930, Pages:23-28

Mudalige GR, Giles MB, Thiyagalingam J, et al., 2013, Design and initial performance of a high-level unstructured mesh framework on heterogeneous parallel systems, Parallel Computing, Vol:39, ISSN:0167-8191, Pages:669-692

Giles MB, Mudalige GR, Sharif Z, et al., 2011, Performance analysis of the OP2 framework on many-core architectures, Performance Evaluation Review, Vol:38, ISSN:0163-5999, Pages:9-9

Russell FP, Kelly PHJ, 2013, Optimized Code Generation for Finite Element Local Assembly Using Symbolic Manipulation, ACM Transactions on Mathematical Software, Vol:39, ISSN:0098-3500

Markall GR, Slemmer A, Ham DA, et al., 2013, Finite element assembly strategies on multi-core and many-core architectures, International Journal for Numerical Methods in Fluids, Vol:71, ISSN:0271-2091, Pages:80-97

Russell FP, Mellor MR, Kelly PHJ, et al., 2011, DESOLA: An active linear algebra library using delayed evaluation and runtime code generation, Science of Computer Programming, Vol:76, ISSN:0167-6423, Pages:227-242

Cantwell CD, Sherwin SJ, Kirby RM, et al., 2011, From h to p Efficiently: Selecting the Optimal Spectral/hp Discretisation in Three Dimensions, Mathematical Modelling of Natural Phenomena, Vol:6, ISSN:0973-5348, Pages:84-96

Giles MB, Mudalige R, Sharif Z, et al., 2012, Performance Analysis and Optimization of the OP2 Framework on Many-Core Architectures, Computer Journal, Vol:55, ISSN:0010-4620, Pages:168-180


Pearce DJ, Kelly PHJ, Hankin C, 2008, Efficient field-sensitive pointer analysis of C, ACM Transactions on Programming Languages and Systems, Vol:30, ISSN:0164-0925


Kelly, P H J, 1989, Functional Programming for Loosely-coupled Multiprocessors, Pitman/MIT Press


Popovici DT, Russell FP, Wilkinson K, et al., 2015, Generating Optimized Fourier Interpolation Routines for Density Functional Theory using SPIRAL, 29th IEEE International Parallel and Distributed Processing Symposium (IPDPS), IEEE, Pages:743-752, ISSN:1530-2075

Konstantinidis A, Kelly PHJ, Ramanujam J, et al., 2014, Parametric GPU Code Generation for Affine Loop Programs, 26th International Workshop on Languages and Compilers for Parallel Computing (LCPC), SPRINGER-VERLAG BERLIN, Pages:136-151, ISSN:0302-9743

Strout MM, Luporini F, Krieger CD, et al., 2014, Generalizing Run-time Tiling with the Loop Chain Abstraction, IEEE 28th International Parallel & Distributed Processing Symposium (IPDPS), IEEE, ISSN:1530-2075

Chong N, Donaldson AF, Kelly PHJ, et al., 2013, Barrier Invariants: A Shared State Abstraction for the Analysis of Data-Dependent GPU Kernels, 2013 ACM SIGPLAN international conference on Object oriented programming systems languages & applications (OOPSLA'13), ASSOC COMPUTING MACHINERY, Pages:605-621, ISSN:0362-1340

Markall GR, Rathgeber F, Mitchell L, et al., 2013, Performance-portable finite element assembly using PyOP2 and FEniCS, Pages:279-289, ISSN:0302-9743

Salas-Moreno RF, Newcombe RA, Strasdat H, et al., 2013, SLAM plus plus : Simultaneous Localisation and Mapping at the Level of Objects, 26th IEEE Conference on Computer Vision and Pattern Recognition (CVPR), IEEE, Pages:1352-1359, ISSN:1063-6919

Bertolli C, Betts A, Loriant N, et al., 2013, Compiler optimizations for industrial unstructured mesh CFD applications on GPUs, Pages:112-126, ISSN:0302-9743

Gorman GJ, Southern J, Farrell PE, et al., 2012, Hybrid OpenMP/MPI anisotropic mesh smoothing, International Conference on Computational Science (ICCS), ELSEVIER SCIENCE BV, Pages:1513-1522, ISSN:1877-0509

Beckmann, O., Houghton, A., Mellor, M., et al., 2003, Runtime code generation in C++ as a foundation for domain-specific optimisation, International seminar on domain-specific program generation, Dagstuhl, Germany, 2003, Springer-Verlag, Berlin, Pages:291-306

Collingbourne P, Cadar C, Kelly PHJ, 2011, Symbolic Crosschecking of Floating-Point and SIMD Code, 6th ACM EuroSys Conference on Computer Systems (EuroSys 2011), ASSOC COMPUTING MACHINERY, Pages:315-328

Cornwall JLT, Howes L, Kelly PHJ, et al., 2009, High-Performance SIMT Code Generation in an Active Visual Effects Library, 6th ACM International Conference on Computing Frontiers and Workshops, ASSOC COMPUTING MACHINERY, Pages:175-184

Howes LW, Lokhmotov A, Donaldson AE, et al., 2009, Deriving Efficient Data Movement from Decoupled Access/Execute Specifications, 4th International Conference on High Performance Embedded Architectures and Compilers, SPRINGER-VERLAG BERLIN, Pages:168-+, ISSN:0302-9743

Kelly P, Beckmann O, 2000, A Linear Algebra Formulation for Optimising Replication in Data Parallel Programs, Languages and Compilers for Parallel Computing, 12th International Workshop, LCPC'99, La Jolla/San Diego, CA, USA, August 4-6, 1999, Springer

Yeung, K.C., Kelly, P.H.J., 2003, Optimising Java RMI programs by communication restructuring, ACM/IFIP/UNSENIX international middeware conference, Rio de Janeiro, Brazil, 2003, Springer-Verlag, Berlin, Pages:324-343

Murray, K., Stiemerling, T., Wilkinson, T., et al., 1994, Angel: Resource Unification in a 64-bit Micro-Kernel, Proceedings of 27th Hawaii International Conference on Systems Science

Talbot, S.A.M., Kelly, P.H.J., 1998, Stable Performance for cc-NUMA using First Touch Page Placement and Reactive Proxies, HPCS'98, Kluwer

Darlington J, Field AJ, Harrison PG, et al., 1993, Parallel Programming Using Skeleton Functions., Springer, Pages:146-160

Jones, R.W.M., Kelly, P.H.J., 1997, Backwards-compatible bounds checking for arrays and pointers in C programs, Third International Workshop on Automated Debugging, Linkoping University Electronic Press

More Publications