Imperial College London

ProfessorWayneLuk

Faculty of EngineeringDepartment of Computing

Professor of Computer Engineering
 
 
 
//

Contact

 

+44 (0)20 7594 8313w.luk Website

 
 
//

Location

 

434Huxley BuildingSouth Kensington Campus

//

Summary

 

Publications

Publication Type
Year
to

611 results found

Li Y, Zhang Y, Yang J, Luk W, Yang G, Zheng Wet al., 2014, An Approach of Processor Core Customization for Stencil Computation, IEEE 25th International Conference on Application-Specific Systems, Architectures and Processors (ASAP), Publisher: IEEE, Pages: 182-+, ISSN: 2160-0511

CONFERENCE PAPER

Denholm S, Inoue H, Takenaka T, Becker T, Luk Wet al., 2014, Low Latency FPGA Acceleration of Market Data Feed Arbitration, IEEE 25th International Conference on Application-Specific Systems, Architectures and Processors (ASAP), Publisher: IEEE, Pages: 36-40, ISSN: 2160-0511

CONFERENCE PAPER

Ng N, Yoshida N, Luk W, 2014, Scalable Session Programming for Heterogeneous High-Performance Systems, 11th International Conference on Software Engineering and Formal Methods (SEFM), Publisher: SPRINGER INT PUBLISHING AG, Pages: 82-98, ISSN: 0302-9743

CONFERENCE PAPER

Coutinho JGF, Pell O, O'Neill E, Sanders P, McGlone J, Grigoras P, Luk W, Ragusa Cet al., 2014, HARNESS project: Managing heterogeneous computing resources for a cloud platform, Pages: 324-329, ISSN: 0302-9743

Most cloud service offerings are based on homogeneous commodity resources, such as large numbers of inexpensive machines interconnected by off-the-shelf networking equipment and disk drives, to provide low-cost application hosting. However, cloud service providers have reached a limit in satisfying performance and cost requirements for important classes of applications, such as geo-exploration and real-time business analytics. The HARNESS project aims to fill this gap by developing architectural principles that enable the next generation cloud platforms to incorporate heterogeneous technologies such as reconfigurable Dataflow Engines (DFEs), programmable routers, and SSDs, and provide as a result vastly increased performance, reduced energy consumption, and lower cost profiles. In this paper we focus on three challenges for supporting heterogeneous computing resources in the context of a cloud platform, namely: (1) cross-optimisation of heterogeneous computing resources, (2) resource virtualisation and (3) programming heterogeneous platforms. © 2014 Springer International Publishing Switzerland.

CONFERENCE PAPER

Pnevmatikatos DN, Becker T, Brokalakis A, Gaydadjiev GN, Luk W, Papadimitriou K, Papaefstathiou I, Pau D, Pell O, Pilato C, Santambrogio MD, Sciuto D, Stroobandt Det al., 2014, Effective reconfigurable design: The FASTER approach, Pages: 318-323, ISSN: 0302-9743

While fine-grain, reconfigurable devices have been available for years, they are mostly used in a fixed functionality, "asic-replacement" manner. To exploit opportunities for flexible and adaptable run-time exploitation of fine grain reconfigurable resources (as implemented currently in dynamic, partial reconfiguration), better tool support is needed. The FASTER project aims to provide a methodology and a tool-chain that will enable designers to efficiently implement a reconfigurable system on a platform combining software and reconfigurable resources. Starting from a high-level application description and a target platform, our tools analyse the application, evaluate reconfiguration options, and implement the designer choices on underlying vendor tools. In addition, FASTER addresses micro-reconfiguration, verification, and the run-time management of system resources. We use industrial applications to demonstrate the effectiveness of the proposed framework and identify new opportunities for reconfigurable technologies. © 2014 Springer International Publishing Switzerland.

CONFERENCE PAPER

Guo C, Luk W, 2014, Accelerating parameter estimation for multivariate self-exciting point processes, Pages: 181-184

Self-exciting point processes are stochastic processes capturing occurrence patterns of random events. They oer powerful tools to describe and predict temporal distributions of random events like stock trading and neurone spiking. A critical calculation in self-exciting point process models is parameter estimation, which ts a model to a data set. This calculation is computationally demanding when the number of data points is large and when the data dimension is high. This paper proposes the rst recongurable computing solution to accelerate this calculation. We derive an acceleration strategy in a mathematical specication by eliminating complex data dependency, by cutting hardware resource requirement, and by parallelising arithmetic operations. In our experimental evaluation, an FPGA-based implementation of the proposed solution is up to 79 times faster than one CPU core, and 13 times faster than the same CPU with eight cores.

CONFERENCE PAPER

Lam YM, Luk W, 2014, A many-core based parallel tabu search, International Journal of Computers and Applications, Vol: 36, Pages: 15-22, ISSN: 1206-212X

A many-core platform based parallel tabu search is presented for solving combinatorial optimization problems. The computing capability of many-core platforms is fully utilized by exploiting parallelism at two different levels: (1) search level for launching a number of searches in parallel and (2) move level for parallel exploration of a number of solutions in each search. A dynamic thread allocation technique is proposed to schedule computing resources for promising search directions. Moreover, a move squeezing technique is employed for better mapping the parallel algorithm onto a many-core platform to enhance the search speed. The proposed approach is evaluated by using two classic optimization problems: the traveling salesman problem and the quadratic assignment problem. Experimental results show that the proposed techniques can improve the search speed up to 373.8% and enhance the solution quality up to 7.9%. Compared with a CPU implementation, many-core implementation can evaluate solutions up to 85.7 times faster and enhance solution quality up to 10.2%.

JOURNAL ARTICLE

Yang J, Lin B, Luk W, Nahar Tet al., 2014, Particle filtering-based maximum likelihood estimation for financial parameter estimation

© 2014 Technical University of Munich (TUM). This paper presents a novel method for estimating parameters of financial models with jump diffusions. It is a Particle Filter based Maximum Likelihood Estimation process, which uses particle streams to enable efficient evaluation of constraints and weights. We also provide a CPU-FPGA collaborative design for parameter estimation of Stochastic Volatility with Correlated and Contemporaneous Jumps model as a case study. The result is evaluated by comparing with a CPU and a cloud computing platform. We show 14 times speed up for the FPGA design compared with the CPU, and similar speedup but better convergence compared with an alternative parallelisation scheme using Techila Middleware on a multi-CPU environment.

CONFERENCE PAPER

Pnevmatikatos DN, Becker T, Brokalakis A, Gaydadjiev GN, Luk W, Papadimitriou K, Papaefstathiou I, Pau D, Pell O, Pilato C, Santambrogio MD, Sciuto D, Stroobandt Det al., 2014, Effective reconfigurable design: The FASTER approach, Pages: 318-323, ISSN: 0302-9743

While fine-grain, reconfigurable devices have been available for years, they are mostly used in a fixed functionality, "asic-replacement" manner. To exploit opportunities for flexible and adaptable run-time exploitation of fine grain reconfigurable resources (as implemented currently in dynamic, partial reconfiguration), better tool support is needed. The FASTER project aims to provide a methodology and a tool-chain that will enable designers to efficiently implement a reconfigurable system on a platform combining software and reconfigurable resources. Starting from a high-level application description and a target platform, our tools analyse the application, evaluate reconfiguration options, and implement the designer choices on underlying vendor tools. In addition, FASTER addresses micro-reconfiguration, verification, and the run-time management of system resources. We use industrial applications to demonstrate the effectiveness of the proposed framework and identify new opportunities for reconfigurable technologies. © 2014 Springer International Publishing Switzerland.

CONFERENCE PAPER

Coutinho JGF, Pell O, O'Neill E, Sanders P, McGlone J, Grigoras P, Luk W, Ragusa Cet al., 2014, HARNESS project: Managing heterogeneous computing resources for a cloud platform, Pages: 324-329, ISSN: 0302-9743

Most cloud service offerings are based on homogeneous commodity resources, such as large numbers of inexpensive machines interconnected by off-the-shelf networking equipment and disk drives, to provide low-cost application hosting. However, cloud service providers have reached a limit in satisfying performance and cost requirements for important classes of applications, such as geo-exploration and real-time business analytics. The HARNESS project aims to fill this gap by developing architectural principles that enable the next generation cloud platforms to incorporate heterogeneous technologies such as reconfigurable Dataflow Engines (DFEs), programmable routers, and SSDs, and provide as a result vastly increased performance, reduced energy consumption, and lower cost profiles. In this paper we focus on three challenges for supporting heterogeneous computing resources in the context of a cloud platform, namely: (1) cross-optimisation of heterogeneous computing resources, (2) resource virtualisation and (3) programming heterogeneous platforms. © 2014 Springer International Publishing Switzerland.

CONFERENCE PAPER

Todman T, Stilkerich S, Luk W, 2014, Using Statistical Assertions to Guide Self-Adaptive Systems, International Journal of Reconfigurable Computing, Vol: 2014, Pages: 1-8, ISSN: 1687-7195

JOURNAL ARTICLE

Inggs G, Thomas DB, Luk W, 2014, A Domain Specific Approach to Heterogeneous Computing: From Availability to Accessibility., CoRR, Vol: abs/1408.4965

JOURNAL ARTICLE

Thomas DB, Luk W, 2013, Multiplierless Algorithm for Multivariate Gaussian Random Number Generation in FPGAs, IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, Vol: 21, Pages: 2193-2205, ISSN: 1063-8210

JOURNAL ARTICLE

Wang Y, Zhou X, Wang L, Yan J, Luk W, Peng C, Tong Jet al., 2013, SPREAD: A Streaming-Based Partially Reconfigurable Architecture and Programming Model, IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, Vol: 21, Pages: 2179-2192, ISSN: 1063-8210

JOURNAL ARTICLE

José JG, Cardoso JMP, Carvalho T, Bhattacharya S, Luk W, Constantinides G, Diniz PC, Petrov Zet al., 2013, Aspect-based source to source transformations, Compilation and Synthesis for Embedded Reconfigurable Systems: An Aspect-Oriented Approach, Pages: 71-103, ISBN: 9781461448938

© Springer Science+Business Media New York 2013. All rights are reserved. Source-to-source weaving is a key mechanism in the REFLECT design-flow since it allows the inclusion of application-specific information in the transformed program. In particular, LARA [1, 2] aspects are used to control the design-flow, and to trigger source-to-source code transformations and compilation/synthesis optimizations on a given application. Hence, user knowledge about an application and/or target architecture can be codified as aspects, allowing the original application code to be automatically extended to satisfy non-functional concerns, such as arithmetic precision and performance.

BOOK CHAPTER

Cardoso JMP, José JG, Nane R, Sima VM, Olivier B, Carvalho T, Nobre R, Diniz PC, Petrov Z, Bertels K, Gonçalves F, Van Someren H, Hübner M, Constantinides G, Luk W, Becker J, Krátký K, Bhattacharya S, Alves JC, Ferreira JCet al., 2013, The REFLECT design-flow, Compilation and Synthesis for Embedded Reconfigurable Systems: An Aspect-Oriented Approach, Pages: 13-34, ISBN: 9781461448938

© Springer Science+Business Media New York 2013. All rights are reserved. This chapter describes the design-flow approach developed in the REFLECT project as presented originally in [1]. Over the course of the project, this design-flow has evolved and has been extended into a fully operational toolchain. We begin by presenting an overview of the underlying aspect-oriented compilation flow followed by an extended description of the design-flow and its toolchain.

BOOK CHAPTER

Cardoso JMP, Carvalho T, Coutinho JGF, Nobre R, Nane R, Diniz PC, Petrov Z, Luk W, Bertels Ket al., 2013, Controlling a complete hardware synthesis toolchain with LARA aspects, MICROPROCESSORS AND MICROSYSTEMS, Vol: 37, Pages: 1073-1089, ISSN: 0141-9331

JOURNAL ARTICLE

Gonçalves F, Petrov Z, José JG, Nane R, Sima VM, Cardoso JMP, Werner S, Bhattacharya S, Carvalho T, Nobre R, De Sá J, Teixeira J, Diniz PC, Bertels K, Constantinides G, Luk W, Becker J, Alves JC, Ferreira JC, Almeida GMet al., 2013, LARA experiments, Compilation and Synthesis for Embedded Reconfigurable Systems: An Aspect-Oriented Approach, Pages: 135-179, ISBN: 9781461448938

© Springer Science+Business Media New York 2013. All rights are reserved. This chapter describes a series of experiments aimed at evaluating the effectiveness of the REFLECT design-flow in terms of ease of use and quality of the generated designs. In these experiments, we exercised the use of LARA to control and guide the REFLECT design-flow components, such as the Harmonic weaver, the CoSy-based compilers, and the back-end Molen/ML510 toolchain. Various research results have been presented in previous publications focusing on specific aspects of the REFLECT design-flow [1], including strategies for optimizing hardware/software systems [2], strategies for optimizing hardware synthesis [3], strategies for hardware/software specialization [4], strategies for resource efficiency [5], and strategies addressing safety requirements [6, 7].

BOOK CHAPTER

Eele A, Maciejowski J, Chau T, Luk Wet al., 2013, Control of aircraft in the terminal manoeuvring area using parallelised sequential Monte Carlo

This paper reports on the use of a parallelised Model Predictive Control, Sequential Monte Carlo algorithm for solving the problem of conflict resolution and aircraft trajectory control in air traffic management specifically around the terminal manoeuvring area of an airport. The target problem is nonlinear, highly constrained, non-convex and uses a single decision-maker with multiple aircraft. The implementation includes a spatio-temporal wind model and rolling window simulations for realistic ongoing scenarios. The method is capable of handling arriving and departing aircraft simultaneously including some with very low fuel remaining. A novel flow field is proposed to smooth the approach trajectories for arriving aircraft and all trajectories are planned in three dimensions. Massive parallelisation of the algorithm allows solution speeds to approach those required for real-time use.

CONFERENCE PAPER

Lam YM, Tsoi KH, Luk W, 2013, Parallel neighbourhood search on many-core platforms, International Journal of Computational Science and Engineering, Vol: 8, Pages: 281-293, ISSN: 1742-7185

This paper presents a parallel search parallel move approach to parallelise neighbourhood search algorithms on many-core platforms. In this approach, a large number of searches are run concurrently and coordinated periodically. Iteratively, each search generates and evaluates multiple moves in parallel. The proposed approach can fully utilise the computing capability of many-core platforms under various platform specific constraints. A parallel simulated annealing algorithm for solving the travelling salesman problem is developed using the parallel search parallel move scheme and implemented on an NVIDIA Tesla C2050 GPU platform. We evaluate the performance of our approach against a multi-threaded CPU implementation on a server containing two Intel Xeon X5650 CPUs (12 cores in total). The experimental results of 20 benchmark problems show that the GPU implementation achieves 99 times speedup on average in solution space exploration speed. In terms of effectiveness, the GPU implementation is capable of finding good solutions 39.5 times faster or with 21.7% solution quality improvement given the same searching time. Copyright © 2013 Inderscience Enterprises Ltd.

JOURNAL ARTICLE

Arram J, Tsoi KH, Luk W, Jiang Pet al., 2013, Hardware acceleration of genetic sequence alignment, Pages: 13-24, ISSN: 0302-9743

Next generation DNA sequencing machines have been improving at an exceptional rate; the subsequent analysis of the generated sequenced data has become a bottleneck in current systems. This paper explores the use of reconfigurable hardware to accelerate the short read mapping problem, where the positions of millions of short DNA sequences are located relative to a known reference sequence. The proposed design comprises of an alignment processor based on a backtracking variation of the FM-index algorithm. The design represents a full solution to the short read mapping problem, capable of efficient exact and approximate alignment. We use reconfigurable hardware to accelerate the design and find that an implementation targeting the MaxWorkstation performs considerably faster and more energy efficient than current CPU and GPU based software aligners. © 2013 Springer-Verlag.

CONFERENCE PAPER

Thomas DB, Luk W, 2013, The LUT-SR Family of Uniform Random Number Generators for FPGA Architectures, IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, Vol: 21, Pages: 761-770, ISSN: 1063-8210

JOURNAL ARTICLE

Pell O, Mencer O, Tsoi KH, Luk Wet al., 2013, Maximum performance computing with dataflow engines, High-Performance Computing Using FPGAs, Pages: 747-774, ISBN: 9781461417903

© 2013 Springer Science+Business Media, LLC. All rights are reserved. Maximum Performance Computing (MPC) means striving to deliver the maximum possible performance within a space and/or power budget. The essence of the method is to start with a particular application and develop an appropriate computer by iterating between algorithm optimization and machine optimization, essentially, cross-optimizing across the layers of abstraction from mathematics to logic gates. An MPC system pairs fast scalar processors with dataflow engines which can be emulated on FPGAs. In this chapter we outline the general approach, and describe in detail example hardware architecture, programming model and tools. We also discuss additional issues that arise at the cluster level, and describe a detailed case study of applying MPC to Reverse Time Migration, a computational geophysics algorithm widely used in the oil industry.

BOOK CHAPTER

Spacey S, Luk W, Kuhn D, Kelly PHJet al., 2013, Parallel partitioning for distributed systems using sequential assignment, JOURNAL OF PARALLEL AND DISTRIBUTED COMPUTING, Vol: 73, Pages: 207-219, ISSN: 0743-7315

JOURNAL ARTICLE

Kwok K-W, Tsoi KH, Vitiello V, Clark J, Chow GCT, Luk W, Yang G-Zet al., 2013, Dimensionality Reduction in Controlling Articulated Snake Robot for Endoscopy Under Dynamic Active Constraints, IEEE TRANSACTIONS ON ROBOTICS, Vol: 29, Pages: 15-31, ISSN: 1552-3098

JOURNAL ARTICLE

Chau TCP, Niu X, Eele A, Luk W, Cheung PYK, Maciejowski Jet al., 2013, Heterogeneous Reconfigurable System for Adaptive Particle Filters in Real-Time Applications, 9th International Applied Reconfigurable Computing Symposium (ARC), Publisher: SPRINGER-VERLAG BERLIN, Pages: 1-12, ISSN: 0302-9743

CONFERENCE PAPER

Niu X, Coutinho JGF, Luk W, 2013, A SCALABLE DESIGN APPROACH FOR STENCIL COMPUTATION ON RECONFIGURABLE CLUSTERS, 23rd International Conference on Field Programmable Logic and Applications (FPL), Publisher: IEEE, ISSN: 1946-1488

CONFERENCE PAPER

Guo C, Luk W, 2013, ACCELERATING MAXIMUM LIKELIHOOD ESTIMATION FOR HAWKES POINT PROCESSES, 23rd International Conference on Field Programmable Logic and Applications (FPL), Publisher: IEEE, ISSN: 1946-1488

CONFERENCE PAPER

Liu Q, Ma Y, Wang Y, Luk W, Bian Jet al., 2013, RALP: Reconvergence-Aware Layer Partitioning For 3D FPGAs, International Conference on Reconfigurable Computing and FPGAs (ReConFig), Publisher: IEEE, ISSN: 2325-6532

CONFERENCE PAPER

Gan L, Fu H, Luk W, Yang C, Xue W, Huang X, Zhang Y, Yang Get al., 2013, ACCELERATING SOLVERS FOR GLOBAL ATMOSPHERIC EQUATIONS THROUGH MIXED-PRECISION DATA FLOWENGINE, 23rd International Conference on Field Programmable Logic and Applications (FPL), Publisher: IEEE, ISSN: 1946-1488

CONFERENCE PAPER

This data is extracted from the Web of Science and reproduced under a licence from Thomson Reuters. You may not copy or re-distribute this data in whole or in part without the written consent of the Science business of Thomson Reuters.

Request URL: http://wlsprd.imperial.ac.uk:80/respub/WEB-INF/jsp/search-html.jsp Request URI: /respub/WEB-INF/jsp/search-html.jsp Query String: id=00154588&limit=30&person=true&page=5&respub-action=search.html