Imperial College London

Professor Peter Y. K. Cheung

Faculty of EngineeringDyson School of Design Engineering

Head of the Dyson School of Design Engineering
 
 
 
//

Contact

 

+44 (0)20 7594 6200p.cheung Website

 
 
//

Assistant

 

Mrs Wiesia Hsissen +44 (0)20 7594 6261

 
//

Location

 

910BElectrical EngineeringSouth Kensington Campus

//

Summary

 

Publications

Publication Type
Year
to

340 results found

Angelopoulou M, Bouganis CS, Cheung PYK, Constantinides GAet al., 2009, Robust Real-Time Super-Resolution on FPGA and an Application to Video Enhancement, ACM Transactions on Reconfigurable Technology and Systems (TRETS), Vol: 2

The high density image sensors of state-of-the-art imaging systems provide outputs with high spatial resolution, but require long exposure times. This limits their applicability, due to the motion blur effect. Recent technological advances have lead to adaptive image sensors that can combine several pixels together in real time to form a larger pixel. Larger pixels require shorter exposure times and produce high-frame-rate samples with reduced motion blur. This work proposes combining an FPGA with an adaptive image sensor to produce an output of high resolution both in space and time. The FPGA is responsible for the spatial resolution enhancement of the high-frame-rate samples using super-resolution (SR) techniques in real time. To achieve it, this article proposes utilizing the Iterative Back Projection (IBP) SR algorithm. The original IBP method is modified to account for the presence of noise, leading to an algorithm more robust to noise. An FPGA implementation of this algorithm is presented. The proposed architecture can serve as a general purpose real-time resolution enhancement system, and its performance is evaluated under various noise levels.

Journal article

Fahmy SA, Cheung PYK, Luk W, 2009, High-throughput one-dimensional median and weighted median filters on FPGA, Computers & Digital Techniques, IET, Vol: 3, Pages: 384-394

Most effort in designing median filters has focused on two-dimensional filters with small window sizes, used for image processing. However, recent work on novel image processing algorithms, such as the trace transform, has highlighted the need for architectures that can compute the median and weighted median of large one-dimensional windows, to which the optimisations in the aforementioned architectures do not apply. A set of architectures for computing both the median and weighted median of large, flexibly sized windows through parallel cumulative histogram construction is presented. The architecture uses embedded memories to control the highly parallel bank of histogram nodes, and can implicitly determine window sizes for median and weighted median calculations. The architecture is shown to perform at 72 Msamples, and has been integrated within a trace transform architecture.

Journal article

Wong JSJ, Sedcole P, Cheung PYK, 2009, Self-Measurement of Combinatorial Circuit Delays in FPGAs, ACM Transactions on Reconfigurable Technology and Systems (TRETS), Vol: 2, Pages: 1-22

This article proposes a Built-In Self-Test (BIST) method to accurately measure the combinatorial circuit delays on an FPGA. The flexibility of the on-chip clock generation capability found in modern FPGAs is employed to step through a range of frequencies until timing failure in the combinatorial circuit is detected. In this way, the delay of any combinatorial circuit can be determined with a timing resolution of the order of picoseconds. Parallel and optimized implementations of the method for self-characterization of the delay of all the LUTs on an FPGA are also proposed. The method was applied to Altera Cyclone II and III FPGAs . A complete self-characterization of LUTs on a Cyclone II was achieved in 2.5 seconds, utilizing only 13kbit of block RAM to store the results. More extensive tests were carried out on the Cyclone III and the delays of adder circuits and embedded multiplier blocks were successfully measured. This self-measurement method paves the way for matching timing requirements in designs to FPGAs as a means of combating the problem of process variations.

Journal article

Liu Q, Constantinides GA, Masselos K, Cheung PYKet al., 2009, Data-reuse exploration under an on-chip memory constraint for low-power FPGA-based systems, Computers & Digital Techniques, IET, Vol: 3, Pages: 235-246

Contemporary FPGA-based reconfigurable systems have been widely used to implement data-dominated applications. In these applications, data transfer and storage consume a large proportion of the system energy. Exploiting data-reuse can introduce significant power savings, but also introduces the extra requirement for on-chip memory. To aid data-reuse design exploration early during the design cycle, the authors present an optimisation approach to achieve a power-optimal design satisfying an on-chip memory constraint in a targeted FPGA-based platform. The data-reuse exploration problem is mathematically formulated and shown to be equivalent to the multiple-choice knapsack problem. The solution to this problem for an application code corresponds to the decision of which array references are to be buffered on-chip and where loading reused data of the array references into on-chip memory happen in the code, in order to minimise power consumption for a fixed on-chip memory size. The authors also present an experimentally verified power model, capable of providing the relative power information between different data-reuse design options of an application, resulting in a fast and efficient design-space exploration. The experimental results demonstrate that the approach enables us to find the most power-efficient design for all the benchmark circuits tested.

Journal article

Liu Q, Constantinides GA, Masselos K, Cheung Pet al., 2009, Combining Data Reuse With Data-Level Parallelization for FPGA-Targeted Hardware Compilation: A Geometric Programming Framework, Computer-Aided Design of Integrated Circuits and Systems, IEEE Transactions on, Vol: 28, Pages: 305-315

A nonlinear optimization framework is proposed in this paper to automate exploration of the design space consisting of data-reuse (buffering) decisions and loop-level parallelization, in the context of field-programmable-gate-array-targeted hardware compilation. Buffering frequently accessed data in on-chip memories can reduce off-chip memory accesses and open avenues for parallelization. However, the exploitation of both data reuse and parallelization is limited by the memory resources available on-chip. As a result, considering these two problems separately, e.g., first exploring data reuse and then exploring data-level parallelization, based on the data-reuse options determined in the first step, may not yield the performance-optimal designs for limited on-chip memory resources. We consider both problems at the same time, exposing the dependence between the two. We show that this combined problem can be formulated as a nonlinear program and further show that efficient solution techniques exist for this problem, based on recent advances in optimization of so-called geometric programming problems. The results from applying this framework to several real benchmarks implemented on a Xilinx device demonstrate that given different constraints on on-chip memory utilization, the corresponding performance-optimal designs are automatically determined by the framework. We have also implemented designs determined by a two-stage optimization method that first explores data reuse and then explores parallelization on the same platform, and by comparison, the performance-optimal designs proposed by our framework are faster than the designs determined by the two-stage method by up to 5.7 times.

Journal article

Becker T, Jamieson P, Luk W, Cheung PYK, Rissa Tet al., 2009, POWER CHARACTERISATION FOR THE FABRIC IN FINE-GRAIN RECONFIGURABLE ARCHITECTURES, 5th Southern Conference on Programmable Logic, Publisher: IEEE, Pages: 77-+

Conference paper

Becker T, Luk W, Cheung PYK, 2009, Parametric Design for Reconfigurable Software-Defined Radio, 5th International Workshop on Applied Reconfigurable Computing, Publisher: SPRINGER-VERLAG BERLIN, Pages: 15-+, ISSN: 0302-9743

Conference paper

Jamieson P, Becker T, Luk W, Cheung PYK, Rissa T, Pitkaenen Tet al., 2009, Benchmarking Reconfigurable Architectures in the Mobile Domain, 17th Annual IEEE Symposium on Field Programmable Custom Computing Machines, Publisher: IEEE COMPUTER SOC, Pages: 131-+

Conference paper

Potter PG, Luk W, Cheung P, 2009, Partition-based exploration for reconfigurable JPEG designs, Design, Automation and Test in Europe Conference and Exhibition, Publisher: IEEE, Pages: 886-889, ISSN: 1530-1591

Conference paper

Wang L, Mak T, Sedcole P, Cheung PYKet al., 2009, Throughput Maximization for Wave-Pipelined Interconnects Using Cascaded Buffers and Transistor Sizing, IEEE International Symposium on Circuits and Systems (ISCAS 2009), Publisher: IEEE, Pages: 1293-1296

Conference paper

Bouganis CS, Park SB, Constantinides GA, Cheung PYKet al., 2009, Synthesis and Optimization of 2D Filter Designs for Heterogeneous FPGAs, ACM Transactions on Reconfigurable Technology and Systems (TRETS), Vol: 1, ISSN: 1936-7406

Many image processing applications require fast convolution of an image with one or more 2D filters. Field-Programmable Gate Arrays (FPGAs) are often used to achieve this goal due to their fine grain parallelism and reconfigurability. However, the heterogeneous nature of modern reconfigurable devices is not usually considered during design optimization. This article proposes an algorithm that explores the space of possible implementation architectures of 2D filters, targeting the minimization of the required area, by optimizing the usage of the different components in a heterogeneous device. This is achieved by exploring the heterogeneous nature of modern reconfigurable devices using a Singular Value Decomposition based algorithm, which provides an efficient mapping of filter's implementation requirements to the heterogeneous components of modern FPGAs. In the case of multiple 2D filters, the proposed algorithm also exploits any redundancy that exists within each filter and between different filters in the set, leading to designs with minimized area. Experiments with real filter sets from computer vision applications demonstrate an average of up to 38% reduction in the required area.

Journal article

Liu Y, Bouganis CS, Cheung PYK, 2009, Hardware architectures for eigenvalue computation of real symmetric matrices, IET Proceeding on Computers & Digital Techniques, Vol: 3, Pages: 72-84

Computation of eigenvalues is essential in many applications in the fields of science and engineering. When the application of interest requires the computation of eigenvalues of high throughput or real-time performance, a hardware implementation of an eigenvalue computation block is often employed. The problem of eigenvalue computation of real symmetric matrices is focused upon. For the general case of a symmetric matrix eigenvalue problem, the approximate Jacobi method is proposed, where for the special case of a 3times3 symmetric matrix, an algebraic-based method is introduced. The proposed methods are compared with various other approaches reported in the literature. Results obtained by mapping the above architectures on a field programmable gate array device illustrate the advantages of the proposed methods over the existing ones.

Journal article

Sedcole NP, Stott EA, Cheung PYK, 2009, Compensating for variability in FPGAs by re-mapping and re-placement, Pages: 613-616-613-616

Conference paper

Smith AM, Constantinides GA, Cheung PYK, 2009, Area Estimation and Optimisation of FPGA Routing Fabrics

Conference paper

Liu Q, Constantinides GA, Masselos K, Cheung PYKet al., 2009, Compiling C-like Languages to FPGA Hardware: Some Novel Approaches Targeting Data Memory Organization, The Computer Journal, Vol:  , Pages: bxp020-bxp020

This paper describes our approaches to raise the level of abstraction at which hardware suitable for accelerating computationally intensive applications can be specified. Field-programmable gate arrays are becoming adopted as a computational platform by the high-performance computing community, but there are challenges to extract maximum performance from these devices. Unlike other approaches, our focus is on data memory organization and input-output bandwidth considerations, which are the typical stumbling block of existing hardware compilation schemes. We describe our approaches, which are based on formal optimization techniques, and present some results showing the advantage of exposing the interaction between data memory system design and parallelism extraction to the compiler.

Journal article

Kahoul A, Smith AM, Constantinides GA, 2009, Heterogeneous Architecture Evaluation: Analysis versus Parameter Sweep, Pages: 133-144

Conference paper

Clarke JA, Constantinides GA, Cheung PYK, 2009, Word-length selection for power minimization via non-linear optimization, ACM Transactions on Design Automation of Electronic Systems, Vol: 14

Journal article

Arifin S, Cheung PYK, 2008, Affective Level Video Segmentation by Utilizing the Pleasure-Arousal-Dominance Information, IEEE TRANSACTIONS ON MULTIMEDIA, Vol: 10, Pages: 1325-1341, ISSN: 1520-9210

Journal article

Turkington K, Constantinides GA, Masselos K, Cheung PYKet al., 2008, Outer Loop Pipelining for Application Specific Datapaths in FPGAs, IEEE Transactions on Very Large Scale Integration (VLSI) Systems, Vol: 16, Pages: 1268-1280

Most hardware compilers apply loop pipelining to increase the parallelism achieved, but pipelining is restricted to the only innermost level in a nested loop. In this work we extend and adapt an existing outer loop pipelining approach known as single dimension software pipelining to generate schedules for field-programmable gate-array (FPGA) hardware coprocessors. Each loop level in nine test loops is pipelined and the resulting schedules are implemented in VHDL and targeted to an Altera Stratix II FPGA. The results show that the fastest solution for all but one of the loops occurs when pipelining is applied one to three levels above the innermost loop. Across the nine test loops we achieve an acceleration over the innermost loop solution of up to seven times, with a mean speedup of 3.2 times. The results suggest that inclusion of outer loop pipelining in future hardware compilers may be worthwhile as it can allow significantly improved results to be achieved at the cost of a small increase in compile time.

Journal article

Becker T, Jamieson P, Luk W, Cheung PYK, Rissa Tet al., 2008, Towards benchmarking energy efficiency of reconfigurable architectures, International Conference on Field Programmable Logic and Applications, Publisher: IEEE, Pages: 691-694

Energy research in reconfigurable architectures often involves legacy benchmarks such as the MCNC benchmarks. These benchmarks, however, are not well-suited for assessing energy consumption of reconfigurable technology, since they lack realistic input stimuli. This paper reviews and categorises a range of computation system benchmarks, and shows that there are no comprehensive benchmarks targeting reconfigurable architectures that would stimulate energy or power research. We review existing energy research in the field which involves microbenchmarks, in-house designs, or legacy benchmark suites used to evaluate power optimisations.

Conference paper

Smith AM, Constantinides GA, Cheung PYK, 2008, Integrated Floorplanning, Module-Selection and Architecture Generation for Reconfigurable Devices, IEEE Transactions on VLSI Systems, Vol: 16, Pages: 733-744

Journal article

Sedcole P, Cheung PYK, 2008, Parametric Yield Modeling and Simulations of FPGA Circuits Considering Within-Die Delay Variations, ACM Transactions on Reconfigurable Technology and Systems (TRETS), Vol: 1, ISSN: 1936-7406

Variations in the semiconductor fabrication process results in differences in parameters between transistors on the same die, a problem exacerbated by lithographic scaling. Field-Programmable Gate Arrays may be able to compensate for within-die delay variability, by judicious use of reconfigurability. This article presents two strategies for compensating within-die stochastic delay variability by using reconfiguration: reconfiguring the entire FPGA, and relocating subcircuits within an FPGA. Analytical models for the theoretical bounds on the achievable gains are derived for both strategies and compared to models for worst-case design as well as statistical static timing analysis (SSTA). All models are validated by comparison to circuit-level Monte Carlo simulations. It is demonstrated that significant improvements in circuit yield and timing are possible using SSTA alone, and these improvements can be enhanced by employing reconfiguration-based techniques.

Journal article

Mak STS, Sedcole P, Cheung PYK, Luk Wet al., 2008, Interconnection lengths and delays estimation for communication links in FPGAs, The 2008 international workshop on System level interconnect prediction, Publisher: ACM, Pages: 1-10

This paper presents a new stochastic model to predict interconnection lengths of communication links in FPGAs. Based on a stochastic inter-module routing model, expected length and variance of interconnections have been rigorously derived and, thus, delay can be computed based on the length estimate. The theoretical results are compared with experimental results of lengths and delays, which are obtained from implementations of links circuits in an FPGA. The stochastic model provides an accurate prediction of length with an average error of 6.3%. Results also show that the proposed model produces reliable predictions of delay and therefore the methodology can be applied to early stage planning and design optimization for communication links. Moreover, as a byproduct of this work, we also present in this paper an interesting phenomenon which we term "interconnection fringing". The fringing effect is attributed to the competition for routing resources in a communication link and will lengthen interconnections and, therefore, increase the delay.

Conference paper

Mak T, D'Alessandro C, Sedcole P, Cheung PYK, Yakovlev A, Luk Wet al., 2008, Implementation of Wave-Pipelined Interconnects in FPGAs, Publisher: IEEE, Pages: 213-214

Global interconnection and communication at high clock frequencies are becoming more problematic in FPGA. In this paper, we address this problem by presenting an interconnect wave-pipelining strategy, which utilizes the existing programmable interconnects fabrics to provide high-throughput communication in FPGA. Two design approaches for interconnect wave-pipelining, using simple clock phase shifting and asynchronous phase encoding, are presented in this paper. Experimental results from a Xilinx Virtex-5 FPGA device are also presented.

Conference paper

Clarke JA, Constantinides GA, Cheung PYK, 2008, Glitch-Aware Output Switching Activity from Word-Level Statistics, Proc. IEEE International Symposium on Circuits and Systems, Pages: 1792-1795

Conference paper

Angelopoulou ME, Cheung PYK, Masselos K, Andreopoulos Yet al., 2008, Implementation and comparison of the 5/3 lifting 2D discrete wavelet transform computation schedules on FPGAs, 5th IEEE International Conference on Field Programmable Technology, Publisher: SPRINGER, Pages: 3-21, ISSN: 1939-8018

Conference paper

Cope BT, Cheung PYK, Luk W, 2008, Using Reconfigurable Logic to Optimise GPU Memory Accesses, Pages: 44-49

Conference paper

Angelopoulou M, Bouganis C, Cheung PYK, Constantinides GAet al., 2008, FPGA-based Real-time Super-Resolution on an Adaptive Image Sensor, Pages: 125-136

Conference paper

Mak T, D'Alessandro C, Sedcole P, Cheung PYK, Yakovlev A, Luk Wet al., 2008, Global Interconnections in FPGAs: Modeling and Performance Analysis, ACM International Workshop on System Level Interconnect Prediction, Publisher: ASSOC COMPUTING MACHINERY, Pages: 51-58

Conference paper

Mak T, Sedcole P, Cheung PYK, Luk Wet al., 2008, Interconnection Lengths and Delays Estimation for Communication Links in FPGAs, ACM International Workshop on System Level Interconnect Prediction, Publisher: ASSOC COMPUTING MACHINERY, Pages: 1-9

Conference paper

This data is extracted from the Web of Science and reproduced under a licence from Thomson Reuters. You may not copy or re-distribute this data in whole or in part without the written consent of the Science business of Thomson Reuters.

Request URL: http://wlsprd.imperial.ac.uk:80/respub/WEB-INF/jsp/search-html.jsp Request URI: /respub/WEB-INF/jsp/search-html.jsp Query String: id=00001081&limit=30&person=true&page=3&respub-action=search.html