Imperial College London

ProfessorWayneLuk

Faculty of EngineeringDepartment of Computing

Professor of Computer Engineering
 
 
 
//

Contact

 

+44 (0)20 7594 8313w.luk Website

 
 
//

Location

 

434Huxley BuildingSouth Kensington Campus

//

Summary

 

Publications

Publication Type
Year
to

619 results found

Ang SS, Constantinides GA, Luk W, Cheung PYKet al., 2008, Custom parallel caching schemes for hardware-accelerated image compression, JOURNAL OF REAL-TIME IMAGE PROCESSING, Vol: 3, Pages: 289-302

In an effort to achieve lower bandwidth requirements, video compression algorithms have become increasingly complex. Consequently, the deployment of these algorithms on field programmable gate arrays (FPGAs) is becoming increasingly desirable, because of the computational parallelism on these platforms as well as the measure of flexibility afforded to designers. Typically, video data are stored in large and slow external memory arrays, but the impact of the memory access bottleneck may be reduced by buffering frequently used data in fast on-chip memories. The order of the memory accesses, resulting from many compression algorithms are dependent on the input data (Jain in Proceedings of the IEEE, pp. 349–389, 1981). These data-dependent memory accesses complicate the exploitation of data re-use, and subsequently reduce the extent to which an application may be accelerated. In this paper, we present a hybrid memory sub-system which is able to capture data re-use effectively in spite of data-dependent memory accesses. This memory sub-system is made up of a custom parallel cache and a scratchpad memory. Further, the framework is capable of exploiting 2D spatial locality, which is frequently exhibited in the access patterns of image processing applications. In a case study involving the quad-tree structured pulse code modulation (QSDPCM) application, the impact of data dependence on memory accesses is shown to be significant. In comparison with an implementation which only employs an SPM, performance improvements of up to 1.7× and 1.4× are observed through actual implementation on two modern FPGA platforms. These performance improvements are more pronounced for image sequences exhibiting greater inter-frame movements. In addition, reductions of on-chip memory resources by up to 3.2× are achievable using this framework. These results indicate that, on custom hardware platforms, there is substantial scope for improvement in the capture of data re-us

Journal article

Atasu K, Ozturan C, Dundar G, Mencer O, Luk Wet al., 2008, CHIPS: Custom Hardware Instruction Processor Synthesis, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, Vol: 27, Pages: 528-541

This paper describes an integer linear programming (ILP) based system called CHIPS that identifies custom instructions for critical code segments, given the available data bandwidth and transfer latencies between custom logic and a baseline processor with architecturally visible state registers. Our approach enables designers to optionally constrain the number of input and output operands for custom instructions. We describe a design flow to identify promising area, performance and code size tradeoffs. We study the effect of input/output constraints, register file ports, and compiler transformations such as if-conversion. Our experiments show that, in most cases, the solutions with the highest performance are identified when the input/output constraints are removed. However, input/output constraints help our algorithms identify frequently used code segments, reducing the overall area overhead. Results for 11 benchmarks covering cryptography and multimedia are shown, with speed-ups between 1.7 and 6.6 times, code size reductions between 6\% and 72\%, and area costs ranging between 12 and 256 adders for maximum speed-up. Our ILP based approach scales well: benchmarks with basic blocks consisting of more than 1000 instructions can be optimally solved, most of the time within a few seconds.

Journal article

Bower JA, Cho WN, Luk W, 2007, Unifying FPGA hardware development, Pages: 113-120

In current FPGA development environments complex projects often end up in an ad-hoc tangle of programming systems; examples include Perl, Makefiles, and Verilog and/or VHDL. To combat this we develop an approach to FPGA development in which a single specification is used to combine: high- and low-level description of custom hardware, parameterisation of existing IP and project build. In this paper we present an abstract overview of our unified approach and a prototype implementation called YAHDL, composed ofa set of libraries written in the object-oriented software language Ruby. To explore YAHDL's effectiveness we apply it to an existing project, creating FPGA hardware designs for floating-point Monte Carlo simulations. With this case-study we show it is possible to use YAHDL to simplify the generation of application specific instances of our Monte Carlo architectures while achieving performance in the 200-300MHz range. © 2007 IEEE.

Conference paper

Luk W, 2007, Field-programmable technology: Today's and tomorrow's, Proceedings of the Topical Workshop on Electronics for Particle Physics, TWEPP 2007, Pages: 47-53

Good: Moore's Law, bad: productivity gap vision: unified design synthesis and analysis devices and design today - growing gap: amount of I/O and amount of logic - enhance optimality and re-use: I/O driven devices tomorrow - hybrid FPGA: multi-granularity fabric - 3D FPGA: customisable system-in-package design tomorrow - guided synthesis: optimised and portable design - data representation optimisation - upgradable and self-tuned design.

Journal article

Fahmy SA, Bouganis C, Cheung PYK, Luk Wet al., 2007, Real-time hardware acceleration of the trace transform, Journal of Real-Time Image Processing, Vol: 2, Pages: 235-248, ISSN: 1861-8200

Journal article

Sedcole P, Cheung PYK, Constantinides GA, Luk Wet al., 2007, Run-Time Integration of Reconfigurable Video Processing Systems, IEEE Trans VLSI Systems, Vol: 15, Pages: 1003-1016

Journal article

Ho C, Yu C, Leong P, Luk W, Wilton Set al., 2007, Domain-Specific FPGA: Architecture and Floating Point Applications, International Conference on Field Programmable Logic and Applications (FPL), Pages: 196-201

This paper presents a novel architecture for domain-specific FPGA devices. This architecture can be optimised for both speed and density by exploiting domain-specific information to produce efficient reconfigurable logic with multiple granularity. In the reconfigurable logic, general-purpose fine-grained units are used for implementing control logic and bit-oriented operations, while domain-specific coarse-grained units and heterogeneous blocks are used for implementing datapaths; the precise amount of each type of resources can be customised to suit specific application domains. Issues and challenges associated with the design flow and the architecture modelling are addressed. Examples of the proposed architecture for speeding up floating point applications are illustrated. Current results indicate that the proposed architecture can achieve 2.5 times improvement in speed and 18 times reduction in area on average, when compared with traditional FPGA devices on selected floating point benchmark circuits.

Conference paper

Cheung RCC, Lee D-U, Luk W, Villasenor JDet al., 2007, Hardware generation of arbitrary random number distributions from uniform distributions via the inversion method, IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, Vol: 15, Pages: 952-962, ISSN: 1063-8210

Journal article

Mak TST, Sedcole P, Cheung PYK, Luk Wet al., 2007, Average interconnection delay estimation for on-FPGA communication links, Electronics Letters, Vol: 43, Pages: 918-919

A new method is presented and an analytical expression is derived for average interconnection delay estimation. This method is directly applicable to predicting the average delay for high-bandwidth communication links implemented on FPGAs. The theoretical results are compared with the measured data from the actual circuits and an average error of 4.6% is reported.

Journal article

Thomas DB, Luk W, 2007, Non-uniform random number generation through piecewise linear approximations, 16th International Conference on Field Programmable Logic and Applications, Publisher: INST ENGINEERING TECHNOLOGY-IET, Pages: 312-321, ISSN: 1751-8601

Conference paper

Thomas DB, Luk W, 2007, High quality uniform random number generation using LUT optimised state-transition matrices, 4th IEEE International Conference on Field Programmable Technology, Publisher: SPRINGER, Pages: 77-92, ISSN: 0922-5773

Conference paper

Becker T, Luk W, Cheung PYK, 2007, Enhancing relocatability of partial bitstreams for run-time reconfiguration, 15th Annual IEEE Symposium on Field-Programmable Custom Computing Machines, Publisher: IEEE COMPUTER SOC, Pages: 35-+

Conference paper

Cope B, Cheung PYK, Luk W, 2007, Bridging the gap between FPGAs and multi-processor architectures: A video processing perspective, 18th IEEE International Conference on Application-Specific Systems, Architectures and Processors, Publisher: IEEE, Pages: 308-+, ISSN: 1063-6862

Conference paper

Thomas DB, Luk W, Leong PHW, Villasenor JDet al., 2007, Gaussian random number generators, ACM COMPUTING SURVEYS, Vol: 39, ISSN: 0360-0300

Journal article

Atasu K, Dimond RG, Mencer O, Luk W, Ozturan C, Dundar Get al., 2007, Optimizing Instruction-set Extensible Processors under Data Bandwidth Constraints, Design, Automation and Test in Europe Conference and Exhibition (DATE), Pages: 588-593

Conference paper

Juvonen MPT, Coutinho JGF, Luk W, 2007, Hardware architectures for adaptive background modelling, 2007 3RD SOUTHERN CONFERENCE ON PROGRAMMABLE LOGIC, PROCEEDINGS, Pages: 149-+

Journal article

Todman T, Luk W, 2007, Domain Specific Transformations for Hardware Ray Tracing, 30th WoTUG Technical Meeting 2007, Publisher: IOS PRESS, Pages: 479-+, ISSN: 1383-7575

Conference paper

De Bosschere K, Luk W, Martorell X, Navarro N, O'Boyle M, Pnevmatikatos D, Ramirez A, Sainrat P, Seznec A, Stenstrom P, Temam Oet al., 2007, High-performance embedded architecture and compilation roadmap, Transactions on High-Performance Embedded Architectures and Compilers I, Vol: 4050, Pages: 5-29, ISSN: 0302-9743

Journal article

Osborne WG, Cheung RCC, Coutinho JGF, Luk W, Mencer Oet al., 2007, Automatic accuracy-guaranteed bit-width optimization for fixed and floating-point systems, 17th International Conference on Field Programmable Logic and Applications, Publisher: IEEE, Pages: 617-620, ISSN: 1946-1488

Conference paper

Thomas DB, Luk W, 2007, Sampling from the Multivariate Gaussian distribution using reconfigurable hardware, 15th Annual IEEE Symposium on Field-Programmable Custom Computing Machines, Publisher: IEEE COMPUTER SOC, Pages: 3-+

Conference paper

Thomas DB, Luk W, 2007, A domain specific language for reconfigurable path-based Monte Carlo simulations, Annual International Conference on Field Programmable Technology, Publisher: IEEE, Pages: 97-104

Conference paper

Todman T, Fu H, Mencer O, Luk Wet al., 2007, Improving bounds for FPGA logic minimization, Annual International Conference on Field Programmable Technology, Publisher: IEEE, Pages: 245-248

Conference paper

Osborne WG, Coutinho JGF, Cheung RCC, Luk W, Mencer Oet al., 2007, Instrumented multi-stage word-length optimization, Annual International Conference on Field Programmable Technology, Publisher: IEEE, Pages: 89-96

Conference paper

Thomas DB, Bower JA, Luk W, 2007, Automatic generation and optimisation of reconfigurable financial Monte-Carlo simulations, 18th IEEE International Conference on Application-Specific Systems, Architectures and Processors, Publisher: IEEE, Pages: 168-173, ISSN: 2160-0511

Conference paper

Mak TST, Sedcole P, Cheung PYK, Luk W, Lam KPet al., 2007, A hybrid analog-digital routing network for NoC dynamic routing, 1st International Symposium on Networks-on-Chip, Publisher: IEEE COMPUTER SOC, Pages: 173-+

Conference paper

Thomas DB, Luk W, Stumpf M, 2007, Reconfigurable hardware acceleration of canonical graph labelling, 3rd International Workshop on Applied Reconfigurable Computing, Publisher: SPRINGER-VERLAG BERLIN, Pages: 302-+, ISSN: 0302-9743

Conference paper

Wilton SJE, Ho CH, Leong PHW, Luk W, Quinton Bet al., 2007, A Synthesizable Datapath-Oriented Embedded FPGA Fabric, 15th ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, Publisher: ASSOC COMPUTING MACHINERY, Pages: 33-41

Conference paper

Sano K, Pell O, Luk W, Yamamoto Set al., 2007, FPGA-based streaming computation for lattice Boltzmann method, Annual International Conference on Field Programmable Technology, Publisher: IEEE, Pages: 233-+

Conference paper

Mencer O, Fu H, Luk W, 2007, Optimizing Logarithmic Arithmetic on FPGAs\r\n, IEEE Symposium on Field-Programmable Custom Computing Machines (FCCM), Pages: 163-172

This paper proposes optimizations of the methods and\r\nparameters used in both mathematical approximation and\r\nhardware design for logarithmic number system (LNS)\r\narithmetic. First, we introduce a general polynomial approximation\r\napproach with an adaptive divide-in-halves\r\nsegmentation method for evaluation of LNS arithmetic\r\nfunctions. Second, we develop a library generator that automatically\r\ngenerates optimized LNS arithmetic units with\r\na wide bit-width range from 21 to 64 bits, to support LNS\r\napplication development and design exploration. The basic\r\narithmetic units are tested on practical FPGA boards\r\nas well as software simulation. When compared with existing\r\nLNS designs, our generated units provide in most cases\r\n6% to 37% reduction in area and 20% to 50% reduction\r\nin latency. The key challenge for LNS remains on the application\r\nlevel. We show the performance of LNS versus\r\nfloating-point for realistic applications: digital sine/cosine\r\nwaveform generator, matrix multiplication and radiative\r\nMonte Carlo simulation. Our infrastructure for fast prototyping\r\nLNS FPGA applications allows us to efficiently\r\nstudy LNS number representation and its tradeoffs in speed\r\nand size when compared with floating-point designs.

Conference paper

Ang S-S, Constantinides GA, Luk W, Cheung PYKet al., 2007, A Hybrid Memory Sub-system for Video Coding Applications

Conference paper

This data is extracted from the Web of Science and reproduced under a licence from Thomson Reuters. You may not copy or re-distribute this data in whole or in part without the written consent of the Science business of Thomson Reuters.

Request URL: http://wlsprd.imperial.ac.uk:80/respub/WEB-INF/jsp/search-html.jsp Request URI: /respub/WEB-INF/jsp/search-html.jsp Query String: id=00154588&limit=30&person=true&page=13&amp%3bid=00154588&amp%3brespub-action=search.html&amp%3bperson=true&respub-action=search.html&amp%3bpage=3