Imperial College London

ProfessorWayneLuk

Faculty of EngineeringDepartment of Computing

Professor of Computer Engineering
 
 
 
//

Contact

 

+44 (0)20 7594 8313w.luk Website

 
 
//

Location

 

434Huxley BuildingSouth Kensington Campus

//

Summary

 

Publications

Publication Type
Year
to

619 results found

Coutinho JGF, Pell O, O'Neill E, Sanders P, McGlone J, Grigoras P, Luk W, Ragusa Cet al., 2014, HARNESS project: Managing heterogeneous computing resources for a cloud platform, Pages: 324-329, ISSN: 0302-9743

Most cloud service offerings are based on homogeneous commodity resources, such as large numbers of inexpensive machines interconnected by off-the-shelf networking equipment and disk drives, to provide low-cost application hosting. However, cloud service providers have reached a limit in satisfying performance and cost requirements for important classes of applications, such as geo-exploration and real-time business analytics. The HARNESS project aims to fill this gap by developing architectural principles that enable the next generation cloud platforms to incorporate heterogeneous technologies such as reconfigurable Dataflow Engines (DFEs), programmable routers, and SSDs, and provide as a result vastly increased performance, reduced energy consumption, and lower cost profiles. In this paper we focus on three challenges for supporting heterogeneous computing resources in the context of a cloud platform, namely: (1) cross-optimisation of heterogeneous computing resources, (2) resource virtualisation and (3) programming heterogeneous platforms. © 2014 Springer International Publishing Switzerland.

Conference paper

Pnevmatikatos DN, Becker T, Brokalakis A, Gaydadjiev GN, Luk W, Papadimitriou K, Papaefstathiou I, Pau D, Pell O, Pilato C, Santambrogio MD, Sciuto D, Stroobandt Det al., 2014, Effective reconfigurable design: The FASTER approach, Pages: 318-323, ISSN: 0302-9743

While fine-grain, reconfigurable devices have been available for years, they are mostly used in a fixed functionality, "asic-replacement" manner. To exploit opportunities for flexible and adaptable run-time exploitation of fine grain reconfigurable resources (as implemented currently in dynamic, partial reconfiguration), better tool support is needed. The FASTER project aims to provide a methodology and a tool-chain that will enable designers to efficiently implement a reconfigurable system on a platform combining software and reconfigurable resources. Starting from a high-level application description and a target platform, our tools analyse the application, evaluate reconfiguration options, and implement the designer choices on underlying vendor tools. In addition, FASTER addresses micro-reconfiguration, verification, and the run-time management of system resources. We use industrial applications to demonstrate the effectiveness of the proposed framework and identify new opportunities for reconfigurable technologies. © 2014 Springer International Publishing Switzerland.

Conference paper

Guo L, Thomas DB, Luk W, 2014, Automated Framework for General-Purpose Genetic Algorithms in FPGAs, 17th European Conference on Applications of Evolutionary Computation (EvpApplications), Publisher: SPRINGER-VERLAG BERLIN, Pages: 714-725, ISSN: 0302-9743

Conference paper

Kurek M, Becker T, Chau TCP, Luk Wet al., 2014, Automating Optimization of Reconfigurable Designs, 22nd IEEE Annual International Symposium on Field-Programmable Custom Computing Machines ((FCCM), Publisher: IEEE, Pages: 210-213

Conference paper

Chau TCP, Kurek M, Targett JS, Humphrey J, Skouroupathis G, Eele A, Maciejowski J, Cope B, Cobden K, Leong P, Cheung PYK, Luk Wet al., 2014, SMCGen: Generating Reconfigurable Design for Sequential Monte Carlo Applications, 22nd IEEE Annual International Symposium on Field-Programmable Custom Computing Machines ((FCCM), Publisher: IEEE, Pages: 141-148

Conference paper

Ma Y, Liu J, Zhang C, Luk Wet al., 2014, HW/SW Partitioning For Region-based Dynamic Partial Reconfigurable FPGAs, 32nd IEEE International Conference on Computer Design (ICCD), Publisher: IEEE, Pages: 470-476, ISSN: 1063-6404

Conference paper

Guo C, Luk W, 2014, Accelerating parameter estimation for multivariate self-exciting point processes, Pages: 181-184

Self-exciting point processes are stochastic processes capturing occurrence patterns of random events. They oer powerful tools to describe and predict temporal distributions of random events like stock trading and neurone spiking. A critical calculation in self-exciting point process models is parameter estimation, which ts a model to a data set. This calculation is computationally demanding when the number of data points is large and when the data dimension is high. This paper proposes the rst recongurable computing solution to accelerate this calculation. We derive an acceleration strategy in a mathematical specication by eliminating complex data dependency, by cutting hardware resource requirement, and by parallelising arithmetic operations. In our experimental evaluation, an FPGA-based implementation of the proposed solution is up to 79 times faster than one CPU core, and 13 times faster than the same CPU with eight cores.

Conference paper

Funie A-I, Salmon M, Luk W, 2014, A Hybrid Genetic-Programming Swarm-Optimisation Approach for Examining the Nature and Stability of High Frequency Trading Strategies, 13th International Conference on Machine Learning and Applications (ICMLA), Publisher: IEEE, Pages: 29-34

Conference paper

Yang J, Guo C, Luk W, Nahar Tet al., 2014, Collaborative processing of Least-Square Monte Carlo for American Options, International Conference on Field Programmable Technology, Publisher: IEEE, Pages: 52-59

Conference paper

Bara A, Niu X, Luk W, 2014, A Dataflow System for Anomaly Detection and Analysis, International Conference on Field Programmable Technology, Publisher: IEEE, Pages: 276-279

Conference paper

Inggs G, Fleming S, Thomas D, Luk Wet al., 2014, Is High Level Synthesis ready for business? A computational finance case study, International Conference on Field Programmable Technology, Publisher: IEEE, Pages: 12-19

Conference paper

Shao S, Guo C, Luk W, Weston Set al., 2014, Accelerating Transfer Entropy Computation, International Conference on Field Programmable Technology, Publisher: IEEE, Pages: 60-67

Conference paper

Spada F, Scolari A, Durelli GC, Cattaneo R, Santambrogio MD, Sciuto D, Pnevmatikatos DN, Gaydadjiev GN, Pell O, Brokalakis A, Luk W, Stroobandt D, Pau Det al., 2014, FPGA-based design using the FASTER toolchain: the case of STM Spear development board, 12th IEEE International Symposium on Parallel and Distributed Processing with Applications (ISPA), Publisher: IEEE, Pages: 134-141, ISSN: 2158-9178

Conference paper

Pnevmatikatos DN, Becker T, Brokalakis A, Gaydadjiev GN, Luk W, Papadimitriou K, Papaefstathiou I, Pau D, Pell O, Pilato C, Santambrogio MD, Sciuto D, Stroobandt Det al., 2014, Effective reconfigurable design: The FASTER approach, Pages: 318-323, ISSN: 0302-9743

While fine-grain, reconfigurable devices have been available for years, they are mostly used in a fixed functionality, "asic-replacement" manner. To exploit opportunities for flexible and adaptable run-time exploitation of fine grain reconfigurable resources (as implemented currently in dynamic, partial reconfiguration), better tool support is needed. The FASTER project aims to provide a methodology and a tool-chain that will enable designers to efficiently implement a reconfigurable system on a platform combining software and reconfigurable resources. Starting from a high-level application description and a target platform, our tools analyse the application, evaluate reconfiguration options, and implement the designer choices on underlying vendor tools. In addition, FASTER addresses micro-reconfiguration, verification, and the run-time management of system resources. We use industrial applications to demonstrate the effectiveness of the proposed framework and identify new opportunities for reconfigurable technologies. © 2014 Springer International Publishing Switzerland.

Conference paper

Fidjeland AK, Luk W, Muggleton SH, 2014, Customisable multi-processor acceleration of inductive logic programming, Latest Advances in Inductive Logic Programming, Pages: 123-141, ISBN: 9781783265084

© 2015 Imperial College Press. All rights reserved. Parallel approaches to Inductive Logic Programming (ILP) are adopted to address the computational complexity in the learning process. Existing parallel ILP implementations build on conventional general-purpose processors. This chapter describes a different approach, by exploiting usercustomisable parallelism available in advanced reconfigurable devices such as Field-Programmable Gate Arrays (FPGAs). Our customisable parallel architecture for ILP has three elements: a customisable logic programming processor, a multi-processor for parallel hypothesis evaluation, and an architecture generation framework for creating such multi-processors. Our approach offers a means of achieving high performance by producing parallel architectures adapted both to the problem domain and to specific problem instances. The coverage test in Progol 4.4 is performed up to 56 times faster using our multi-processor.

Book chapter

, 2014,

Journal article

Coutinho JGF, Pell O, O'Neill E, Sanders P, McGlone J, Grigoras P, Luk W, Ragusa Cet al., 2014, HARNESS project: Managing heterogeneous computing resources for a cloud platform, Pages: 324-329, ISSN: 0302-9743

Most cloud service offerings are based on homogeneous commodity resources, such as large numbers of inexpensive machines interconnected by off-the-shelf networking equipment and disk drives, to provide low-cost application hosting. However, cloud service providers have reached a limit in satisfying performance and cost requirements for important classes of applications, such as geo-exploration and real-time business analytics. The HARNESS project aims to fill this gap by developing architectural principles that enable the next generation cloud platforms to incorporate heterogeneous technologies such as reconfigurable Dataflow Engines (DFEs), programmable routers, and SSDs, and provide as a result vastly increased performance, reduced energy consumption, and lower cost profiles. In this paper we focus on three challenges for supporting heterogeneous computing resources in the context of a cloud platform, namely: (1) cross-optimisation of heterogeneous computing resources, (2) resource virtualisation and (3) programming heterogeneous platforms. © 2014 Springer International Publishing Switzerland.

Conference paper

Lam YM, Luk W, 2014, A many-core based parallel tabu search, International Journal of Computers and Applications, Vol: 36, Pages: 15-22, ISSN: 1206-212X

A many-core platform based parallel tabu search is presented for solving combinatorial optimization problems. The computing capability of many-core platforms is fully utilized by exploiting parallelism at two different levels: (1) search level for launching a number of searches in parallel and (2) move level for parallel exploration of a number of solutions in each search. A dynamic thread allocation technique is proposed to schedule computing resources for promising search directions. Moreover, a move squeezing technique is employed for better mapping the parallel algorithm onto a many-core platform to enhance the search speed. The proposed approach is evaluated by using two classic optimization problems: the traveling salesman problem and the quadratic assignment problem. Experimental results show that the proposed techniques can improve the search speed up to 373.8% and enhance the solution quality up to 7.9%. Compared with a CPU implementation, many-core implementation can evaluate solutions up to 85.7 times faster and enhance solution quality up to 10.2%.

Journal article

Todman T, Stilkerich S, Luk W, 2014, Using Statistical Assertions to Guide Self-Adaptive Systems, International Journal of Reconfigurable Computing, Vol: 2014, Pages: 1-8, ISSN: 1687-7195

Journal article

Wang Y, Zhou X, Wang L, Yan J, Luk W, Peng C, Tong Jet al., 2013, SPREAD: A Streaming-Based Partially Reconfigurable Architecture and Programming Model, IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, Vol: 21, Pages: 2179-2192, ISSN: 1063-8210

Journal article

Cardoso JMP, José JG, Nane R, Sima VM, Olivier B, Carvalho T, Nobre R, Diniz PC, Petrov Z, Bertels K, Gonçalves F, Van Someren H, Hübner M, Constantinides G, Luk W, Becker J, Krátký K, Bhattacharya S, Alves JC, Ferreira JCet al., 2013, The REFLECT design-flow, Compilation and Synthesis for Embedded Reconfigurable Systems: An Aspect-Oriented Approach, Pages: 13-34, ISBN: 9781461448938

© Springer Science+Business Media New York 2013. All rights are reserved. This chapter describes the design-flow approach developed in the REFLECT project as presented originally in [1]. Over the course of the project, this design-flow has evolved and has been extended into a fully operational toolchain. We begin by presenting an overview of the underlying aspect-oriented compilation flow followed by an extended description of the design-flow and its toolchain.

Book chapter

Gonçalves F, Petrov Z, José JG, Nane R, Sima VM, Cardoso JMP, Werner S, Bhattacharya S, Carvalho T, Nobre R, De Sá J, Teixeira J, Diniz PC, Bertels K, Constantinides G, Luk W, Becker J, Alves JC, Ferreira JC, Almeida GMet al., 2013, LARA experiments, Compilation and Synthesis for Embedded Reconfigurable Systems: An Aspect-Oriented Approach, Pages: 135-179, ISBN: 9781461448938

© Springer Science+Business Media New York 2013. All rights are reserved. This chapter describes a series of experiments aimed at evaluating the effectiveness of the REFLECT design-flow in terms of ease of use and quality of the generated designs. In these experiments, we exercised the use of LARA to control and guide the REFLECT design-flow components, such as the Harmonic weaver, the CoSy-based compilers, and the back-end Molen/ML510 toolchain. Various research results have been presented in previous publications focusing on specific aspects of the REFLECT design-flow [1], including strategies for optimizing hardware/software systems [2], strategies for optimizing hardware synthesis [3], strategies for hardware/software specialization [4], strategies for resource efficiency [5], and strategies addressing safety requirements [6, 7].

Book chapter

Cardoso JMP, Carvalho T, Coutinho JGF, Nobre R, Nane R, Diniz PC, Petrov Z, Luk W, Bertels Ket al., 2013, Controlling a complete hardware synthesis toolchain with LARA aspects, MICROPROCESSORS AND MICROSYSTEMS, Vol: 37, Pages: 1073-1089, ISSN: 0141-9331

Journal article

José JG, Cardoso JMP, Carvalho T, Bhattacharya S, Luk W, Constantinides G, Diniz PC, Petrov Zet al., 2013, Aspect-based source to source transformations, Compilation and Synthesis for Embedded Reconfigurable Systems: An Aspect-Oriented Approach, Pages: 71-103, ISBN: 9781461448938

© Springer Science+Business Media New York 2013. All rights are reserved. Source-to-source weaving is a key mechanism in the REFLECT design-flow since it allows the inclusion of application-specific information in the transformed program. In particular, LARA [1, 2] aspects are used to control the design-flow, and to trigger source-to-source code transformations and compilation/synthesis optimizations on a given application. Hence, user knowledge about an application and/or target architecture can be codified as aspects, allowing the original application code to be automatically extended to satisfy non-functional concerns, such as arithmetic precision and performance.

Book chapter

Eele A, Maciejowski J, Chau T, Luk Wet al., 2013, Control of aircraft in the terminal manoeuvring area using parallelised sequential Monte Carlo

This paper reports on the use of a parallelised Model Predictive Control, Sequential Monte Carlo algorithm for solving the problem of conflict resolution and aircraft trajectory control in air traffic management specifically around the terminal manoeuvring area of an airport. The target problem is nonlinear, highly constrained, non-convex and uses a single decision-maker with multiple aircraft. The implementation includes a spatio-temporal wind model and rolling window simulations for realistic ongoing scenarios. The method is capable of handling arriving and departing aircraft simultaneously including some with very low fuel remaining. A novel flow field is proposed to smooth the approach trajectories for arriving aircraft and all trajectories are planned in three dimensions. Massive parallelisation of the algorithm allows solution speeds to approach those required for real-time use.

Conference paper

Lam YM, Tsoi KH, Luk W, 2013, Parallel neighbourhood search on many-core platforms, International Journal of Computational Science and Engineering, Vol: 8, Pages: 281-293, ISSN: 1742-7185

This paper presents a parallel search parallel move approach to parallelise neighbourhood search algorithms on many-core platforms. In this approach, a large number of searches are run concurrently and coordinated periodically. Iteratively, each search generates and evaluates multiple moves in parallel. The proposed approach can fully utilise the computing capability of many-core platforms under various platform specific constraints. A parallel simulated annealing algorithm for solving the travelling salesman problem is developed using the parallel search parallel move scheme and implemented on an NVIDIA Tesla C2050 GPU platform. We evaluate the performance of our approach against a multi-threaded CPU implementation on a server containing two Intel Xeon X5650 CPUs (12 cores in total). The experimental results of 20 benchmark problems show that the GPU implementation achieves 99 times speedup on average in solution space exploration speed. In terms of effectiveness, the GPU implementation is capable of finding good solutions 39.5 times faster or with 21.7% solution quality improvement given the same searching time. Copyright © 2013 Inderscience Enterprises Ltd.

Journal article

Arram J, Tsoi KH, Luk W, Jiang Pet al., 2013, Hardware acceleration of genetic sequence alignment, Pages: 13-24, ISSN: 0302-9743

Next generation DNA sequencing machines have been improving at an exceptional rate; the subsequent analysis of the generated sequenced data has become a bottleneck in current systems. This paper explores the use of reconfigurable hardware to accelerate the short read mapping problem, where the positions of millions of short DNA sequences are located relative to a known reference sequence. The proposed design comprises of an alignment processor based on a backtracking variation of the FM-index algorithm. The design represents a full solution to the short read mapping problem, capable of efficient exact and approximate alignment. We use reconfigurable hardware to accelerate the design and find that an implementation targeting the MaxWorkstation performs considerably faster and more energy efficient than current CPU and GPU based software aligners. © 2013 Springer-Verlag.

Conference paper

Pell O, Mencer O, Tsoi KH, Luk Wet al., 2013, Maximum performance computing with dataflow engines, High-Performance Computing Using FPGAs, Pages: 747-774, ISBN: 9781461417903

© 2013 Springer Science+Business Media, LLC. All rights are reserved. Maximum Performance Computing (MPC) means striving to deliver the maximum possible performance within a space and/or power budget. The essence of the method is to start with a particular application and develop an appropriate computer by iterating between algorithm optimization and machine optimization, essentially, cross-optimizing across the layers of abstraction from mathematics to logic gates. An MPC system pairs fast scalar processors with dataflow engines which can be emulated on FPGAs. In this chapter we outline the general approach, and describe in detail example hardware architecture, programming model and tools. We also discuss additional issues that arise at the cluster level, and describe a detailed case study of applying MPC to Reverse Time Migration, a computational geophysics algorithm widely used in the oil industry.

Book chapter

Kwok K-W, Tsoi KH, Vitiello V, Clark J, Chow GCT, Luk W, Yang G-Zet al., 2013, Dimensionality Reduction in Controlling Articulated Snake Robot for Endoscopy Under Dynamic Active Constraints, IEEE TRANSACTIONS ON ROBOTICS, Vol: 29, Pages: 15-31, ISSN: 1552-3098

Journal article

Spacey S, Luk W, Kuhn D, Kelly PHJet al., 2013, Parallel partitioning for distributed systems using sequential assignment, Journal of Parallel and Distributed Computing, Vol: 73, Pages: 207-219

This paper introduces a method to combine the advantages of both task parallelism and fine-grained co-design specialisation to achieve faster execution times than either method alone on distributed heterogeneous architectures. The method uses a novel mixed integer linear programming formalisation to assign code sections from parallel tasks to share computational components with the optimal trade-off between acceleration from component specialism and serialisation delay. The paper provides results for software benchmarks partitioned using the method and formal implementations of previous alternatives to demonstrate both the practical tractability of the linear programming approach and the increase in program acceleration potential deliverable.

Journal article

This data is extracted from the Web of Science and reproduced under a licence from Thomson Reuters. You may not copy or re-distribute this data in whole or in part without the written consent of the Science business of Thomson Reuters.

Request URL: http://wlsprd.imperial.ac.uk:80/respub/WEB-INF/jsp/search-html.jsp Request URI: /respub/WEB-INF/jsp/search-html.jsp Query String: id=00154588&limit=30&person=true&page=5&amp%3bid=00154588&amp%3brespub-action=search.html&amp%3bperson=true&respub-action=search.html&amp%3bpage=3