Imperial College London


Faculty of EngineeringDepartment of Electrical and Electronic Engineering

Honorary Research Fellow



n.kapre Website




905Electrical EngineeringSouth Kensington Campus





Focus Area: Application Acceleration

We can exploit spatial parallelism and the vast amount of onchip memory bandwidth available on modern FPGAs to significantly accelerate important, computationally-challenging problems. However, programming FPGAs continues to be a painful, arduous task. Can we develop a general-purpose framework and tools for implementing different kinds of parallel computation on FPGAs? Broadly speaking, how do we translate the parallelism potential inherent in silicon into application performance in a productive manner? We will consider representative applications from different domains that are characterized by unique computational, communication and memory access patterns. This experimental approach will allow us to derive reusable design patterns that can influence the engineering of spatial programming frameworks.

SPICE Circuit Simulations 
Neural Simulations 
Parallel Sparse-Solve 
Parallel "Dataflow" Simulations 
Query Processing Acceleration 


Focus Area: Tools and Libraries

Programming frameworks support several high-level design patterns for efficiently capturing and managing descriptions of computation (e.g. object-oriented programming, domain-specific languages). Historically, performance improvements have been delivered by frequency scaling and architecture enhancements. Applications performance scaled by simply riding the Moore's Law scaling curve under the ISA model (e.g. Intel microprocessors). To continue performance scaling in the future, we must now shift some of this responsibility to software and develop automated tools for managing computation to minimize programmer burden.

Auto-Tuning Tools 
High-Level Communication API 
Dataflow/VLIW Generators and Scalability 
FPGA-Accelerated FPGA CAD 

Focus Area: Languages and Frameworks

Modern computing systems are a complex organization of heterogeneous processing elements e.g. CPUs, GPUs, FPGAs, etc. This trend is likely to accelerate as we start to integrate multiple domain-specific compute accelerators for power-efficiency. Additionally, we are expected to support different kinds of computational patterns efficiently to fully utilize the capacity of these systems. We must develop high-level frameworks and languages compilers that will support emerging these systems and application requirements.

OpenCL Compiler for FPGAs 
SCORE Runtime and Dynamic Allocation 
LLVM Backends for Dataflow, VLIW and Streaming 
Dataflow Extensions to OpenMP/TBB 

Focus Area: Architecture Studies

What is the best way to organize silicon for an efficient implementation of computation? With power-efficient computation fast becoming the key design constraint, we must explore different alternatives including specialized application accelerators, spatial implementation using FPGAs, power-efficient GPUs, communication-centric multi-cores. The design and engineering of communication systems will be key to reducing power requirements of the compute substrates. Some initial vectors for exploring these ideas are listed below:

Domain-Specific CGRAs 
Power Measurement and Analysis 
Architecture Refinement 

I am broadly interested in understanding and exploiting the potential of spatial parallelism for implementing computation. In the rapidly developing field of computer engineering, we must be prepared to re-examine design assumptions to reflect the changing realities of the physical computing substrates. Spatial parallelism can play a key role in navigating this evolving landscape. Ultimately, I want to develop programming methodologies, mapping tools and system infrastructure that train the next generation of engineers who will design tomorrow's computing systems. Some questions to get us started down this road are shown below:

  • What is the raw potential for exploiting spatial parallelism in silicon? How does this change as we prepare for life beyond Moore's Law?
  • Can we integrate spatial architectures in mainstream processing systems? How do we program such closely-coupled system organizations?
  • How will spatial parallelism influence conventional computing architectures? Can we make it easy to capture/exploit locality/communication structure on existing architectures?
  • Will there be a demand for application-specific co-processing in an energy-constrained architecture? How do we design such application accelerators using spatial parallelism?
  • How do we conceptualize, capture and reason about spatial parallelism? Can we define high-level interfaces and APIs for integrating this parallelism in modern programming frameworks?