ProfessorPaulKelly

Faculty of Engineering, Department of Computing

Professor of Software Technology

Contact

Location

Level 3 (upstairs), William Penney Building, room 304William Penney LaboratorySouth Kensington Campus

Summary

I lead the Software Performance Optimisation group within the Department of Computing, which in turn is part of the Programming Languages and Systems research section. I am co-Director of Imperial's Centre for Computational Methods in Science and Engineering, and also Director of Industrial Liaison for our Centre for Doctoral Training in High-performance Embedded and Distributed Systems (HiPEDS).

While I have worked in many areas of computer systems, the core of my current work is compiler technology. Much of my work aims to push the frontiers of compiler research through moving up the "food chain" - exploiting properties and opportunities special to particular classes of application. This has led me to engage deeply with collaborators in finite element methods, and computer vision.

For research news please refer to my Departmental home page, at http://www.doc.ic.ac.uk/~phjk. My Google Scholar page has links to many of my papers.

Selected Publications

Journal Articles

Luporini F, Lange M, Louboutin M, et al.Luporini F, Lange M, Louboutin M, Kukreja N, Hückelheim J, Yount C, Witte P, Kelly PHJ, Herrmann FJ, Gorman G close, 2020, Architecture and performance of Devito, a system for automated stencil computation, ACM Transactions on Mathematical Software, Vol:46, ISSN:0098-3500, Pages:1-24

Mitchell L, Ham DA, McRae ATT, et al.Mitchell L, Ham DA, McRae ATT, Rathgeber F, Lange M, Luporini F, Bercea G-T, Markall G, Kelly PHJ close, 2017, Firedrake: automating the finite element method by composing abstractions, Acm Transactions on Mathematical Software, Vol:43, ISSN:1557-7295, Pages:1-27

Kelly PHJ, Reguly IZ, Mudalige GR, et al.Kelly PHJ, Reguly IZ, Mudalige GR, Bertolli C, Giles MB, Betts A, Radford D close, 2015, Acceleration of a Full-scale Industrial CFD Application with OP2, Ieee Transactions on Parallel and Distributed Systems, Vol:27, ISSN:1558-2183, Pages:1265-1278

Luporini F, Varbanescu AL, Rathgeber F, et al.Luporini F, Varbanescu AL, Rathgeber F, Bercea D, Rananujam J, Ham DA, Kelly PHJ close, 2015, Cross-loop optimization of arithmetic intensity for finite element local assembly, Acm Transactions on Architecture and Code Optimization, Vol:11, ISSN:1544-3973

Kelly PHJ, Russell FP, Wilkinson KA, et al.Kelly PHJ, Russell FP, Wilkinson KA, Skylaris CK close, 2014, Optimised three-dimensional Fourier interpolation: An analysis of techniques and application to a linear-scaling density functional theory code, Computer Physics Communications, ISSN:1879-2944, Pages:8-19

Collingbourne P, Cadar C, Kelly PHJ, 2014, Symbolic Crosschecking of Data-Parallel Floating-Point Code, IEEE Transactions on Software Engineering, Vol:40, ISSN:0098-5589, Pages:710-737

Mudalige GR, Giles MB, Thiyagalingam J, et al.Mudalige GR, Giles MB, Thiyagalingam J, Reguly I, Bertolli C, Kelly PHJ, Trefethen AE close, 2013, Design and Initial Performance of a High-level Unstructured Mesh Framework on Heterogeneous Parallel Systems, Parallel Computing, Vol:n/a, ISSN:0167-8191

Russell FP, Kelly PHJ, 2013, Optimized Code Generation for Finite Element Local Assembly Using Symbolic Manipulation, ACM Transactions on Mathematical Software, ISSN:0098-3500

Markall GR, Slemmer A, Ham DA, et al.Markall GR, Slemmer A, Ham DA, Kelly PHJ, Cantwell CD, Sherwin SJ close, 2012, Finite element assembly strategies on multi- and many-core architectures, International Journal for Numerical Methods in Fluids

Court C, Kelly PHJ, 2011, Loop-Directed Mothballing: Power Gating Execution Units Using Runtime Loop Analysis, Ieee Micro, Vol:31, Pages:29-38

Giles MB, Gudalige GR, Sharif Z, et al.Giles MB, Gudalige GR, Sharif Z, Markall GR, Kelly PHJ close, 2011, Performance Analysis and Optimisation of the OP2 Framework on Many-core Architectures, The Computer Journal, Vol:55

Russell FP, Mellor MR, Kelly PHJ, et al.Russell FP, Mellor MR, Kelly PHJ, Beckmann O close, 2011, DESOLA: An active linear algebra library using delayed evaluation and runtime code generation, Science of Computer Programming, Vol:76, ISSN:0167-6423, Pages:227-242

Giles MB, Mudalige GR, Sharif Z, et al.Giles MB, Mudalige GR, Sharif Z, Markall G, Kelly PHJ close, 2011, Performance analysis of the OP2 framework on many-core architectures, Performance Evaluation Review, Vol:38, ISSN:0163-5999, Pages:9-15

Cantwell CD, Sherwin SJ, Kirby RM, et al.Cantwell CD, Sherwin SJ, Kirby RM, Kelly PHJ close, 2011, From h to p Efficiently: Selecting the Optimal Spectral/hp Discretisation in Three Dimensions, Mathematical Modelling of Natural Phenomena, Vol:6, ISSN:0973-5348, Pages:84-96

Cantwell CD, Sherwin SJ, Kirby RM, et al.Cantwell CD, Sherwin SJ, Kirby RM, Kelly PHJ close, 2010, From h to p efficiently: Strategy selection for operator evaluation on hexahedral and tetrahedral elements, Computers and Fluids, Vol:43, Pages:23-28

Pearce DJ, Kelly PHJ, Hankin CL, 2007, Efficient field-sensitive pointer analysis of C., Acm Transactions on Programming Languages and Systems (toplas), Vol:30

Books

Kelly, P H J, 1989, Functional Programming for Loosely-coupled Multiprocessors, Pitman/MIT Press

Conference

Popovici T, Russell FP, Wilkinson KA, et al.Popovici T, Russell FP, Wilkinson KA, Skylaris CK, Kelly PHJ, Franchetti F close, Generating Optimized Fourier Interpolation Routines for Density Function Theory Using SPIRAL, IEEE International Parallel & Distributed Processing Symposium (IPDPS)

Open Access Link

Strout MM, Luporini F, Krieger CD, et al.Strout MM, Luporini F, Krieger CD, Bertolli C, Bercea GT, Olschanowsky C, Ramanujam J, Kelly PHJ close, 2014, Generalizing Run-Time Tiling with the Loop Chain Abstraction, 28th IEEE International Parallel & Distributed Processing Symposium, IEEE Press, Pages:1136-1145, ISSN:1530-2075

Chong N, Donaldson AF, Kelly PHJ, et al.Chong N, Donaldson AF, Kelly PHJ, Ketema J, Qadeer S close, 2013, Barrier Invariants: A Shared State Abstraction for the Analysis of Data-Dependent GPU Kernels, 2013 ACM SIGPLAN international conference on Object oriented programming systems languages & applications (OOPSLA'13), ASSOC COMPUTING MACHINERY, Pages:605-621, ISSN:0362-1340

Kelly PH, Konstantinidis A, Ramanujam J, et al.Kelly PH, Konstantinidis A, Ramanujam J, Sadayappan P close, 2013, Parametric GPU Code Generation for Affine Loop Programs, The 26th International Workshop on Languages and Compilers for Parallel Computing, Springer

Author Web Link

Markall GR, Rathgeber F, Mitchell L, et al.Markall GR, Rathgeber F, Mitchell L, Loriant N, Bertolli C, Kelly PHJ close, 2013, Performance-Portable Finite Element Assembly Using PyOP2 and FEniCS, International Supercomputing Conference (ISC), Springer, Pages:279-289, ISSN:0302-9743

Salas-Moreno RF, Newcombe RA, Strasdat H, et al.Salas-Moreno RF, Newcombe RA, Strasdat H, Kelly PHJ, Davison AJ close, 2013, SLAM++: Simultaneous Localisation and Mapping at the Level of Objects, Computer Vision and Pattern Recognition, IEEE Press, Pages:1352-1359, ISSN:1063-6919

Bertolli C, Betts A, Mudalige GR, et al.Bertolli C, Betts A, Mudalige GR, Loriant N, Ham D, Giles MB, Kelly PHJ close, 2013, Compiler optimizations for industrial unstructured mesh CFD applications on GPUs, International Workshop on Languages and Compilers for Parallel Computing (LCPC), Springer, Pages:112-126

Gorman GJ, Southern J, Farrell PE, et al.Gorman GJ, Southern J, Farrell PE, Piggott MD, Rokos G, Kelly PHJ close, 2012, Hybrid OpenMP/MPI anisotropic mesh smoothing, International Conference on Computational Science (ICCS), ELSEVIER SCIENCE BV, Pages:1513-1522, ISSN:1877-0509

Collingbourne P, Cadar C, Kelly PHJ, 2011, Symbolic crosschecking of floating-point and SIMD code, ACM, New York, NY, USA, Pages:315-328

Cornwall JLT, Howes LW, Kelly PHJ, et al.Cornwall JLT, Howes LW, Kelly PHJ, Parsonage P, Nicoletti B close, 2009, High-performance SIMT code generation in an active visual effects library, ACM Computing Frontiers, ACM Press, Pages:175-184

Howes LW, Lokhmotov A, Donaldson AE, et al.Howes LW, Lokhmotov A, Donaldson AE, Kelly PHJ close, 2009, Deriving Efficient Data Movement from Decoupled Access/Execute Specifications, 4th International Conference on High Performance Embedded Architectures and Compilers, SPRINGER-VERLAG BERLIN, Pages:168-+, ISSN:0302-9743

Author Web Link

Beckmann, O., Houghton, A., Mellor, M., et al.Beckmann, O., Houghton, A., Mellor, M., Kelly, P.H.J. close, 2003, Runtime code generation in C++ as a foundation for domain-specific optimisation, International seminar on domain-specific program generation, Dagstuhl, Germany, 2003, Springer-Verlag, Berlin, Pages:291-306

Yeung, K.C., Kelly, P.H.J., 2003, Optimising Java RMI programs by communication restructuring, ACM/IFIP/UNSENIX international middeware conference, Rio de Janeiro, Brazil, 2003, Springer-Verlag, Berlin, Pages:324-343

Kelly P, Beckmann O, 2000, A Linear Algebra Formulation for Optimising Replication in Data Parallel Programs, Languages and Compilers for Parallel Computing, 12th International Workshop, LCPC'99, La Jolla/San Diego, CA, USA, August 4-6, 1999, Springer

Talbot, S.A.M., Kelly, P.H.J., 1998, Stable Performance for cc-NUMA using First Touch Page Placement and Reactive Proxies, HPCS'98, Kluwer

Jones, R.W.M., Kelly, P.H.J., 1997, Backwards-compatible bounds checking for arrays and pointers in C programs, Third International Workshop on Automated Debugging, Linkoping University Electronic Press

Murray, K., Stiemerling, T., Wilkinson, T., et al.Murray, K., Stiemerling, T., Wilkinson, T., Kelly, P.H.J. close, 1994, Angel: Resource Unification in a 64-bit Micro-Kernel, Proceedings of 27th Hawaii International Conference on Systems Science

Darlington, D, Field, et al.Darlington, D, Field, A J, Harrison, P G, Kelly, P H J, Sharp, D W N, Wu, Q, While, R L close, 1993, Parallel Programming Using Skeleton Functions, PARLE'93: Parallel Architectures and Languages Europe, Springer LNCS, Pages:146-160

More Publications