Publications

Coutinho JGF, Pell O, O'Neill E, Sanders P, McGlone J, Grigoras P, Luk W, Ragusa Cet al., 2014, HARNESS project: Managing heterogeneous computing resources for a cloud platform, Pages: 324-329, ISSN: 0302-9743

Most cloud service offerings are based on homogeneous commodity resources, such as large numbers of inexpensive machines interconnected by off-the-shelf networking equipment and disk drives, to provide low-cost application hosting. However, cloud service providers have reached a limit in satisfying performance and cost requirements for important classes of applications, such as geo-exploration and real-time business analytics. The HARNESS project aims to fill this gap by developing architectural principles that enable the next generation cloud platforms to incorporate heterogeneous technologies such as reconfigurable Dataflow Engines (DFEs), programmable routers, and SSDs, and provide as a result vastly increased performance, reduced energy consumption, and lower cost profiles. In this paper we focus on three challenges for supporting heterogeneous computing resources in the context of a cloud platform, namely: (1) cross-optimisation of heterogeneous computing resources, (2) resource virtualisation and (3) programming heterogeneous platforms. © 2014 Springer International Publishing Switzerland.

Abstract
Cite
Citations: 9

Conference paper

Chow G, Grigoras P, Burovskiy PA, Luk Wet al., 2014, An efficient sparse conjugate gradient solver using a Beneš permutation network, 24th International Conference on Field Programmable Logic and Applications, Publisher: IEEE

The conjugate gradient (CG) is one of the most widely used iterative methods for solving systems of linear equations. However, parallelizing CG for large sparse systems is difficult due to the inherent irregularity in memory access pattern. We propose a novel processor architecture for the sparse conjugate gradient method. The architecture consists of multiple processing elements and memory banks, and is able to compute efficiently both sparse matrix-vector multiplication, and other dense vector operations. A Beneš permutation network with an optimised control scheme is introduced to reduce memory bank conflicts without expensive logic. We describe a heuristics for offline scheduling, the effect of which is captured in a parametric model for estimating the performance of designs generated from our approach.

Conference paper

Fidjeland AK, Luk W, Muggleton SH, 2014, Customisable multi-processor acceleration of inductive logic programming, Latest Advances in Inductive Logic Programming, Pages: 123-141, ISBN: 9781783265084

Parallel approaches to Inductive Logic Programming (ILP) are adopted to address the computational complexity in the learning process. Existing parallel ILP implementations build on conventional general-purpose processors. This chapter describes a different approach, by exploiting usercustomisable parallelism available in advanced reconfigurable devices such as Field-Programmable Gate Arrays (FPGAs). Our customisable parallel architecture for ILP has three elements: a customisable logic programming processor, a multi-processor for parallel hypothesis evaluation, and an architecture generation framework for creating such multi-processors. Our approach offers a means of achieving high performance by producing parallel architectures adapted both to the problem domain and to specific problem instances. The coverage test in Progol 4.4 is performed up to 56 times faster using our multi-processor.

Abstract
Cite
Citations: 5

Book chapter

Lam YM, Luk W, 2014, A many-core based parallel tabu search, International Journal of Computers and Applications, Vol: 36, Pages: 15-22, ISSN: 1206-212X

A many-core platform based parallel tabu search is presented for solving combinatorial optimization problems. The computing capability of many-core platforms is fully utilized by exploiting parallelism at two different levels: (1) search level for launching a number of searches in parallel and (2) move level for parallel exploration of a number of solutions in each search. A dynamic thread allocation technique is proposed to schedule computing resources for promising search directions. Moreover, a move squeezing technique is employed for better mapping the parallel algorithm onto a many-core platform to enhance the search speed. The proposed approach is evaluated by using two classic optimization problems: the traveling salesman problem and the quadratic assignment problem. Experimental results show that the proposed techniques can improve the search speed up to 373.8% and enhance the solution quality up to 7.9%. Compared with a CPU implementation, many-core implementation can evaluate solutions up to 85.7 times faster and enhance solution quality up to 10.2%.

Abstract
Cite

Journal article

Grigoras P, Tottenham M, Niu X, Coutinho JGF, Luk Wet al., 2014, Elastic Management of Reconfigurable Accelerators, 12th IEEE International Symposium on Parallel and Distributed Processing with Applications (ISPA), Publisher: IEEE, Pages: 174-181, ISSN: 2158-9178

Author Web Link
Cite
Citations: 8

Conference paper

Kurek M, Becker T, Chau TCP, Luk Wet al., 2014, Automating Optimization of Reconfigurable Designs, 22nd IEEE Annual International Symposium on Field-Programmable Custom Computing Machines ((FCCM), Publisher: IEEE, Pages: 210-213

Author Web Link
Cite
Citations: 9

Conference paper

Chau TCP, Kurek M, Targett JS, Humphrey J, Skouroupathis G, Eele A, Maciejowski J, Cope B, Cobden K, Leong P, Cheung PYK, Luk Wet al., 2014, SMCGen: Generating Reconfigurable Design for Sequential Monte Carlo Applications, 22nd IEEE Annual International Symposium on Field-Programmable Custom Computing Machines ((FCCM), Publisher: IEEE, Pages: 141-148

Author Web Link
Cite
Citations: 2

Conference paper

Ma Y, Liu J, Zhang C, Luk Wet al., 2014, HW/SW Partitioning For Region-based Dynamic Partial Reconfigurable FPGAs, 32nd IEEE International Conference on Computer Design (ICCD), Publisher: IEEE, Pages: 470-476, ISSN: 1063-6404

Author Web Link
Cite
Citations: 9

Conference paper

Funie A-I, Salmon M, Luk W, 2014, A Hybrid Genetic-Programming Swarm-Optimisation Approach for Examining the Nature and Stability of High Frequency Trading Strategies, 13th International Conference on Machine Learning and Applications (ICMLA), Publisher: IEEE, Pages: 29-34

Author Web Link
Cite
Citations: 4

Conference paper

Yang J, Guo C, Luk W, Nahar Tet al., 2014, Collaborative processing of Least-Square Monte Carlo for American Options, International Conference on Field Programmable Technology, Publisher: IEEE, Pages: 52-59

Author Web Link
Cite
Citations: 1

Conference paper

Bara A, Niu X, Luk W, 2014, A Dataflow System for Anomaly Detection and Analysis, International Conference on Field Programmable Technology, Publisher: IEEE, Pages: 276-279

Author Web Link
Cite
Citations: 1

Conference paper

Inggs G, Fleming S, Thomas D, Luk Wet al., 2014, Is High Level Synthesis ready for business? A computational finance case study, International Conference on Field Programmable Technology, Publisher: IEEE, Pages: 12-19

Author Web Link
Cite
Citations: 22

Conference paper

Shao S, Guo C, Luk W, Weston Set al., 2014, Accelerating Transfer Entropy Computation, International Conference on Field Programmable Technology, Publisher: IEEE, Pages: 60-67

Author Web Link
Cite
Citations: 13

Conference paper

Guo C, Luk W, Weston S, 2014, Pipelined Reconfigurable Accelerator for Ordinal Pattern Encoding, IEEE 25th International Conference on Application-Specific Systems, Architectures and Processors (ASAP), Publisher: IEEE, Pages: 194-201, ISSN: 2160-0511

Author Web Link
Cite
Citations: 3

Conference paper

Spada F, Scolari A, Durelli GC, Cattaneo R, Santambrogio MD, Sciuto D, Pnevmatikatos DN, Gaydadjiev GN, Pell O, Brokalakis A, Luk W, Stroobandt D, Pau Det al., 2014, FPGA-based design using the FASTER toolchain: the case of STM Spear development board, 12th IEEE International Symposium on Parallel and Distributed Processing with Applications (ISPA), Publisher: IEEE, Pages: 134-141, ISSN: 2158-9178

Author Web Link
Cite
Citations: 1

Conference paper

Todman T, Stilkerich S, Luk W, 2014, Using Statistical Assertions to Guide Self-Adaptive Systems, International Journal of Reconfigurable Computing, Vol: 2014, Pages: 1-8, ISSN: 1687-7195

Cite

Journal article

Inggs G, Thomas DB, Luk W, 2014, A Domain Specific Approach to Heterogeneous Computing: From Availability to Accessibility., CoRR, Vol: abs/1408.4965

Cite

Journal article

Li Y, Zhang Y, Yang J, Luk W, Yang G, Zheng Wet al., 2014, An Approach of Processor Core Customization for Stencil Computation, IEEE 25th International Conference on Application-Specific Systems, Architectures and Processors (ASAP), Publisher: IEEE, Pages: 182-+, ISSN: 2160-0511

Conference paper

Guo L, Thomas DB, Luk W, 2014, Automated Framework for General-Purpose Genetic Algorithms in FPGAs, 17th European Conference on Applications of Evolutionary Computation (EvpApplications), Publisher: SPRINGER-VERLAG BERLIN, Pages: 714-725, ISSN: 0302-9743

Author Web Link
Cite
Citations: 4

Conference paper

Guo C, Luk W, Vinkovskaya E, Cont Ret al., 2013, Customisable pipelined engine for intensity evaluation in multivariate hawkes point processes, ACM SIGARCH Computer Architecture News, Vol: 41, Pages: 59-64, ISSN: 0163-5964

<jats:p>Hawkes processes are point processes that can be used to build probabilistic models to capture occurrence patterns of random events. They are widely used in high-frequency trading, seismic analysis and neuroscience. A critical calculation in Hawkes process models is intensity evaluation. The intensity of a point process represents the instantaneous rate of occurrence of events, but it is computationally expensive and challenging to calculate efficiently in order to make predictions using Hawkes process models. To accelerate the computation, we analyse data dependency in the intensity evaluation routine, and present a strategy to enable multiple intensities to be computed with a single pass through the data. We then design and optimise a pipelined hardware engine based on our strategy. In our experiments, an FPGA-based implementation of the proposed engine is evaluated by four case studies. This implementation achieves up to 94 times speedup over an optimised CPU implementation with one core, and 12 times speedup over a CPU with eight cores.</jats:p>

Journal article

Guo L, Thomas DB, Luk W, 2013, Customisable architectures for the set covering problem, ACM SIGARCH Computer Architecture News, Vol: 41, Pages: 101-106, ISSN: 0163-5964

<jats:p>This paper proposes novel customisable streaming architectures for the NP-hard set covering problem. Our approach covers both exhaustive and genetic algorithms, supporting coarse-grain parallelism and deep pipelines while allowing trade-offs between performance and resource usage. Experiments targeting Maxeler systems show that our FPGA-based designs are more effective than the corresponding multicore software versions. The speed up of the exhaustive algorithm exceeds 250 times, and that of the genetic algorithm exceeds 60 times. Meanwhile, our implementations are more flexible than other FPGA solutions, allowing users to customise parameters at run time without recompilation.</jats:p>

Journal article

Wang Y, Zhou X, Wang L, Yan J, Luk W, Peng C, Tong Jet al., 2013, SPREAD: A Streaming-Based Partially Reconfigurable Architecture and Programming Model, IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, Vol: 21, Pages: 2179-2192, ISSN: 1063-8210

Author Web Link
Cite
Citations: 22

Journal article

Cardoso JMP, José JG, Nane R, Sima VM, Olivier B, Carvalho T, Nobre R, Diniz PC, Petrov Z, Bertels K, Gonçalves F, Van Someren H, Hübner M, Constantinides G, Luk W, Becker J, Krátký K, Bhattacharya S, Alves JC, Ferreira JCet al., 2013, The REFLECT design-flow, Compilation and Synthesis for Embedded Reconfigurable Systems: An Aspect-Oriented Approach, Pages: 13-34, ISBN: 9781461448938

This chapter describes the design-flow approach developed in the REFLECT project as presented originally in [1]. Over the course of the project, this design-flow has evolved and has been extended into a fully operational toolchain. We begin by presenting an overview of the underlying aspect-oriented compilation flow followed by an extended description of the design-flow and its toolchain.

Abstract
Cite

Book chapter

Gonçalves F, Petrov Z, José JG, Nane R, Sima VM, Cardoso JMP, Werner S, Bhattacharya S, Carvalho T, Nobre R, De Sá J, Teixeira J, Diniz PC, Bertels K, Constantinides G, Luk W, Becker J, Alves JC, Ferreira JC, Almeida GMet al., 2013, LARA experiments, Compilation and Synthesis for Embedded Reconfigurable Systems: An Aspect-Oriented Approach, Pages: 135-179, ISBN: 9781461448938

This chapter describes a series of experiments aimed at evaluating the effectiveness of the REFLECT design-flow in terms of ease of use and quality of the generated designs. In these experiments, we exercised the use of LARA to control and guide the REFLECT design-flow components, such as the Harmonic weaver, the CoSy-based compilers, and the back-end Molen/ML510 toolchain. Various research results have been presented in previous publications focusing on specific aspects of the REFLECT design-flow [1], including strategies for optimizing hardware/software systems [2], strategies for optimizing hardware synthesis [3], strategies for hardware/software specialization [4], strategies for resource efficiency [5], and strategies addressing safety requirements [6, 7].

Abstract
Cite

Book chapter

José JG, Cardoso JMP, Carvalho T, Bhattacharya S, Luk W, Constantinides G, Diniz PC, Petrov Zet al., 2013, Aspect-based source to source transformations, Compilation and Synthesis for Embedded Reconfigurable Systems: An Aspect-Oriented Approach, Pages: 71-103, ISBN: 9781461448938

Source-to-source weaving is a key mechanism in the REFLECT design-flow since it allows the inclusion of application-specific information in the transformed program. In particular, LARA [1, 2] aspects are used to control the design-flow, and to trigger source-to-source code transformations and compilation/synthesis optimizations on a given application. Hence, user knowledge about an application and/or target architecture can be codified as aspects, allowing the original application code to be automatically extended to satisfy non-functional concerns, such as arithmetic precision and performance.

Abstract
Cite

Book chapter

Cardoso JMP, Carvalho T, Coutinho JGF, Nobre R, Nane R, Diniz PC, Petrov Z, Luk W, Bertels Ket al., 2013, Controlling a complete hardware synthesis toolchain with LARA aspects, MICROPROCESSORS AND MICROSYSTEMS, Vol: 37, Pages: 1073-1089, ISSN: 0141-9331

Author Web Link
Cite
Citations: 14

Journal article

Eele A, Maciejowski J, Chau T, Luk Wet al., 2013, Control of aircraft in the terminal manoeuvring area using parallelised sequential Monte Carlo

This paper reports on the use of a parallelised Model Predictive Control, Sequential Monte Carlo algorithm for solving the problem of conflict resolution and aircraft trajectory control in air traffic management specifically around the terminal manoeuvring area of an airport. The target problem is nonlinear, highly constrained, non-convex and uses a single decision-maker with multiple aircraft. The implementation includes a spatio-temporal wind model and rolling window simulations for realistic ongoing scenarios. The method is capable of handling arriving and departing aircraft simultaneously including some with very low fuel remaining. A novel flow field is proposed to smooth the approach trajectories for arriving aircraft and all trajectories are planned in three dimensions. Massive parallelisation of the algorithm allows solution speeds to approach those required for real-time use.

Abstract
Cite
Citations: 2

Conference paper

Arram J, Tsoi KH, Luk W, Jiang Pet al., 2013, Hardware acceleration of genetic sequence alignment, Pages: 13-24, ISSN: 0302-9743

Next generation DNA sequencing machines have been improving at an exceptional rate; the subsequent analysis of the generated sequenced data has become a bottleneck in current systems. This paper explores the use of reconfigurable hardware to accelerate the short read mapping problem, where the positions of millions of short DNA sequences are located relative to a known reference sequence. The proposed design comprises of an alignment processor based on a backtracking variation of the FM-index algorithm. The design represents a full solution to the short read mapping problem, capable of efficient exact and approximate alignment. We use reconfigurable hardware to accelerate the design and find that an implementation targeting the MaxWorkstation performs considerably faster and more energy efficient than current CPU and GPU based software aligners. © 2013 Springer-Verlag.

Abstract
Cite
Citations: 25

Conference paper

Pell O, Mencer O, Tsoi KH, Luk Wet al., 2013, Maximum performance computing with dataflow engines, High-Performance Computing Using FPGAs, Pages: 747-774, ISBN: 9781461417903

Maximum Performance Computing (MPC) means striving to deliver the maximum possible performance within a space and/or power budget. The essence of the method is to start with a particular application and develop an appropriate computer by iterating between algorithm optimization and machine optimization, essentially, cross-optimizing across the layers of abstraction from mathematics to logic gates. An MPC system pairs fast scalar processors with dataflow engines which can be emulated on FPGAs. In this chapter we outline the general approach, and describe in detail example hardware architecture, programming model and tools. We also discuss additional issues that arise at the cluster level, and describe a detailed case study of applying MPC to Reverse Time Migration, a computational geophysics algorithm widely used in the oil industry.

Abstract
Cite
Citations: 41

Book chapter

Kwok K-W, Tsoi KH, Vitiello V, Clark J, Chow GCT, Luk W, Yang G-Zet al., 2013, Dimensionality Reduction in Controlling Articulated Snake Robot for Endoscopy Under Dynamic Active Constraints, IEEE TRANSACTIONS ON ROBOTICS, Vol: 29, Pages: 15-31, ISSN: 1552-3098

Author Web Link
Cite
Citations: 61

Journal article

ProfessorWayneLuk

Contact

Location

Summary