In this section

Efficient embedded transformer neural networks for science and beyond

Transformer neural network models, based around attention and first proposed in the seminal paper Attention Is All You Need from Google researchers in 2017, are the key technical advance which has enabled the current explosion in AI. All cutting-edge AI models such as ChatGPT, Gemini, Llama etc. are based on this technology. However, these models, while extremely powerful, have billions or trillions of parameters and cost 10s of millions of dollars to train, as well as consuming astonishing amounts of power for training and inference. This blocks their direct use in EDGE devices, realtime remote components and wearable technologies as well as leading to huge environmental impact.

We are looking to develop techniques and tools to shrink these models, while preserving high performance, to allow them to be used in a wider range of applications, importantly with far lower energy use.

We were among the first in the world to demonstrate such architectures on FPGA (field programmable gate array) platforms in 2022 and remain world leaders in this area. Our work to date includes a recent contribution to Neurips (arXiv:2510.24784) and the best paper award at the FPT conference (arXiv:2508.15468). We showcased the multi-head attention mechanism in an FPGA chip with latency of O(100ns) and performed an initial exploration of the scaling behaviour of the design.

We have also contributed to state-of-the-art optimisation tools and formed international partnerships, in particular with colleagues in the USA and CERN through the FastML collaboration and Altera UK. The work was done in the context of particle physics, which provides an excellent testbed for such work through its huge, highly structured and open datasets as well as direct scientific impact.

We aim to recruit and train an excellent PhD student with a background in computing or physics to build on the initial work, focusing on challenges in particle physics. The research programme will explore an approach to produce high performance designs with high energy efficiency, and the capability of automating such designs. First, to study how the approach can cover physics applications and second, to extend the approach for healthcare applications such as adaptive radiotherapy, based on our existing collaboration with the Institute of Cancer Research.

Contacts:

Professor Alex Tapper, Department of Physics (a.tapper@imperial.ac.uk)
Professor Wayne Luk, Department of Computing (w.luk@imperial.ac.uk)
Dr Hongxiang Fan, Department of Computing (hongxiang.fan@imperial.ac.uk)
Professor Gavin Davies, Department of Physics (g.j.davies@imperial.ac.uk)