620 results found
Morris GW, Thomas DB, Luk W, 2009, FPGA accelerated low-latency market data feed processing, 17th Symposium on High-Performance Interconnects, Publisher: IEEE, Pages: 83-89
Lee D-U, Cheung RCC, Luk W, et al., 2009, Hierarchical Segmentation for Hardware Function Evaluation, IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, Vol: 17, Pages: 103-116, ISSN: 1063-8210
Lamoureux J, Field T, Luk W, 2009, Accelerating a Virtual Ecology Model with FPGAs, 20th IEEE International Conference on Application-Specific Systems, Architectures and Processors, Publisher: IEEE, Pages: 67-74, ISSN: 2160-0511
Ho CH, Luk W, Szefer JM, et al., 2009, Tuning Instruction Customisation for Reconfigurable System-on-Chip, IEEE International SOC Conference, Publisher: IEEE, Pages: 61-+, ISSN: 2164-1676
Koester M, Luk W, Hagemeyer J, et al., 2009, Design Optimizations to Improve Placeability of Partial Reconfiguration Modules, Design, Automation and Test in Europe Conference and Exhibition, Publisher: IEEE, Pages: 976-+, ISSN: 1530-1591
Potter PG, Luk W, Cheung P, 2009, Partition-based exploration for reconfigurable JPEG designs, Design, Automation and Test in Europe Conference and Exhibition, Publisher: IEEE, Pages: 886-889, ISSN: 1530-1591
Das J, Wilton SJE, Leong P, et al., 2009, MODELING POST-TECHMAPPING AND POST-CLUSTERING FPGA CIRCUIT DEPTH, 19th International Conference on Field Programmable Logic and Applications, Publisher: IEEE, Pages: 205-+, ISSN: 1946-1488
Susanto KW, Todman T, Coutinho JG, et al., 2009, Design Validation by Symbolic Simulation and Equivalence Checking: A Case Study in Memory Optimization for Image Manipulation, 35th Conference on Current Trends in Theory and Practice of Computer Science, Publisher: SPRINGER-VERLAG BERLIN, Pages: 509-520, ISSN: 0302-9743
Lam YM, Coutinho JGF, Luk W, et al., 2009, OPTIMISING MULTI-LOOP PROGRAMS FOR HETEROGENEOUS COMPUTING SYSTEMS, 2009 5TH SOUTHERN CONFERENCE ON PROGRAMMABLE LOGIC, PROCEEDINGS, Pages: 129-+
Ho CH, Yu CW, Leong PHW, et al., 2009, Floating-point FPGA: architecture and modeling, IEEE Transactions on VLSI Systems, Vol: 17, Pages: 1709-1718
Terry L, Roitch V, Tufail S, et al., 2009, Harnessing Human Computation Cycles for the FPGA Placement Problem., Publisher: CSREA Press, Pages: 188-194
Wildie M, Luk W, Schultz SR, et al., 2009, Reconfigurable acceleration of neural models with gap junctions, Sydney, Australia, Pages: 439-442
Jamieson P, Luk W, Constantinides GA, et al., 2009, An Energy and Power Consumption Analysis of FPGA Routing Architectures, Pages: 324-327
Liu Q, Todman, Luk W, et al., 2009, Optimising Designs by Combining Model-based and Pattern-based Transformations
Liu Q, Todman, Coutinho G, et al., 2009, Automatic optimisation of map-reduce designs by geometric programming, Pages: 215-222
Pell O, Luk W, 2008, Instance-Specific Design, Pages: 455-474
This chapter covers instance-specific design, an optimization technique involving effective exploitation of information specific to an instance of a generic design description. It introduces different types of instance-specific designs with examples and describes partial evaluation, a systematic method for producing instance-specific designs that can be automated. It covers the application of partial evaluation to hardware design in general and to field-programmable gate arrays (FPGAs) in particular. FPGAs are an effective way to implement designs in computationally intensive datapath-oriented applications such as cryptography, digital signal processing (DSP), and network processing. The main alternative implementation technologies in these application areas are general-purpose processors, digital signal processors, and application-specific integrated circuits (ASICs). Instance-specific design offers the opportunity to exploit the reconfigurable nature of FPGAs to improve performance by tailoring circuits to particular problem instances. It can be broadly categorized into three techniques that include constant folding, which can be applied when some inputs are static; function adaptation, which alters the function of circuitry to produce a certain quality of result; and architecture adaptation, in which the circuit architecture is adapted without affecting its functional behavior. The level of automation that can be applied varies among these approaches. Constant folding can often be carried out automatically using partial evaluation techniques. Function adaptation can be performed by varying bit widths and arithmetic methods in parameterized IP cores. Tools such as Quartz (for low-level design) or tool for stream architectures can produce highly parameterized circuit cores where design parameters can be traded off against each other to achieve the desired requirements in area, speed, and power consumption. Architecture adaptation, such as adding processing units to
Luk W, Mencer O, Savaria Y, 2008, Guest editorial: 20 years of ASAP, JOURNAL OF SIGNAL PROCESSING SYSTEMS FOR SIGNAL IMAGE AND VIDEO TECHNOLOGY, Vol: 53, Pages: 1-2, ISSN: 1939-8018
Becker T, Jamieson P, Luk W, et al., 2008, Towards benchmarking energy efficiency of reconfigurable architectures, International Conference on Field Programmable Logic and Applications, Publisher: IEEE, Pages: 691-694
Energy research in reconfigurable architectures often involves legacy benchmarks such as the MCNC benchmarks. These benchmarks, however, are not well-suited for assessing energy consumption of reconfigurable technology, since they lack realistic input stimuli. This paper reviews and categorises a range of computation system benchmarks, and shows that there are no comprehensive benchmarks targeting reconfigurable architectures that would stimulate energy or power research. We review existing energy research in the field which involves microbenchmarks, in-house designs, or legacy benchmark suites used to evaluate power optimisations.
Echeverría P, Thomas DB, López-Vallejo M, et al., 2008, An FPGA run-time parameterisable log-normal random number generator, Pages: 221-232, ISSN: 0302-9743
Monte Carlo financial simulation relies on the generation of random variables with different probability distribution functions. These simulations, particularly the random number generator (RNG) cores, are computationally intensive and are ideal candidates for hardware acceleration. In this work we present an FPGA based Log-normal RNG ideally suited for financial Monte Carlo simulations, as it is run-time parameterisable and compatible with variance reduction techniques. Our architecture achieves a throughput of one sample per cycle with a 227.6 MHz clock on a Xilinx Virtex-4 FPGA. © 2008 Springer-Verlag Berlin Heidelberg.
Thomas DB, Luk W, 2008, Multivariate Gaussian Random Number Generation Targeting Reconfigurable Hardware, ACM Transactions on Reconfigurable Technology and Systems (TRETS), Vol: 1, Pages: 12:1-12:??, ISSN: 1936-7406
The multivariate Gaussian distribution is often used to model correlations between stochastic time-series, and can be used to explore the effect of these correlations across N\/ time-series in Monte-Carlo simulations. However, generating random correlated vectors is an O\/ (N\/$^2$) process, and quickly becomes a computational bottleneck in software simulations. This article presents an efficient method for generating vectors in parallel hardware, using N\/ parallel pipelined components to generate a new vector every N\/ cycles. This method maps well to the embedded block RAMs and multipliers in contemporary FPGAs, particularly as extensive testing shows that the limited bit-width arithmetic does not reduce the statistical quality of the generated vectors. An implementation of the architecture in the Virtex-4 architecture achieves a 500MHz clock-rate, and can support vector lengths up to 512 in the largest devices. The combination of a high clock-rate and parallelism provides a significant performance advantage over conventional processors, with an xc4vsx55 device at 500MHz providing a 200 times speedup over an Opteron 2.6GHz using an AMD optimised BLAS package. In a case study in Delta-Gamma Value-at Risk, an RC2000 accelerator card using an xc4vsx55 at 400MHz is 26 times faster than a quad Opteron 2.6GHz SMP.
Todman T, Atasu K, Mencer O, et al., 2008, Optimal Implementation of Combinational Logic on Lookup Tables, The Fourth Conference on Ph.D. Research in Microelectronics and Electronics (PRIME'08), Istanbul, Turkey.
We present a methodology for optimally implementing combinational logic equations on networks of look-up tables. Our work effectively extends optimality to span logic minimization and technology mapping. We restrict ourselves to 4-input look-up tables (LUTs) and enumerate all possible circuits up to a certain area or latency. Since simple-minded enumeration would take a long time, we develop levels of abstractions (steps) and we formulate the key step of enumeration as an Integer Linear Programming (ILP) problem. We show results on a set of ISCAS benchmarks.
Lee D-U, Cheung RCC, Luk W, et al., 2008, Hardware implementation trade-offs of polynomial approximations and interpolations, IEEE TRANSACTIONS ON COMPUTERS, Vol: 57, Pages: 686-701, ISSN: 0018-9340
Mak T, D'Alessandro C, Sedcole P, et al., 2008, Implementation of Wave-Pipelined Interconnects in FPGAs, Publisher: IEEE, Pages: 213-214
Global interconnection and communication at high clock frequencies are becoming more problematic in FPGA. In this paper, we address this problem by presenting an interconnect wave-pipelining strategy, which utilizes the existing programmable interconnects fabrics to provide high-throughput communication in FPGA. Two design approaches for interconnect wave-pipelining, using simple clock phase shifting and asynchronous phase encoding, are presented in this paper. Experimental results from a Xilinx Virtex-5 FPGA device are also presented.
Wilton S, Ho C, Quinton B, et al., 2008, A Synthesizable Datapath-Oriented Embedded FPGA Fabric for Silicon Debug Applications, ACM Transactions on Reconfigurable Technology and Systems, Vol: 1, Pages: 1-25
We present an architecture for a synthesizable datapath-oriented FPGA core which can be used to provide post-fabrication flexibility to an SoC. Our architecture is optimized for bus-based operations and employs a directional routing architecture, which allows it to be synthesized using standard ASIC design tools and flows. The primary motivation for this architecture is to provide an efficient mechanism to support on-chip debugging. The fabric can also be used to implement other datapath-oriented circuits such as those needed in signal processing and computation-intensive applications. We evaluate our architecture using a set of benchmark circuits and compare it to previous fabrics in terms of area, speed, and power consumption.\r\n\r\n
Cope BT, Cheung PYK, Luk W, 2008, Using Reconfigurable Logic to Optimise GPU Memory Accesses, Pages: 44-49
Lam YM, Coutinho JGF, Luk W, et al., 2008, Integrated hardware/software codesign for heterogeneous computing systems, 2008 4TH SOUTHERN CONFERENCE ON PROGRAMMABLE LOGIC, PROCEEDINGS, Pages: 217-+
Echeverria P, Thomas DB, Lopez-Vallejo M, et al., 2008, An FPGA run-time parameterisable Log-normal random number generator, 4th International Workshop on Applied Reconfigurable Computing, Publisher: SPRINGER-VERLAG BERLIN, Pages: 221-+, ISSN: 0302-9743
Koester M, Luk W, Brown G, 2008, A HARDWARE COMPILATION FLOW FOR INSTANCE-SPECIFIC VLIW CORES, 18th International Conference on Field Programmable and Logic Applications, Publisher: IEEE, Pages: 618-+, ISSN: 1946-1488
Mak T, D'Alessandro C, Sedcole P, et al., 2008, Global Interconnections in FPGAs: Modeling and Performance Analysis, ACM International Workshop on System Level Interconnect Prediction, Publisher: ASSOC COMPUTING MACHINERY, Pages: 51-58
Atasu K, Mencer O, Luk W, et al., 2008, Fast Custom Instruction Identification by Convex Subgraph Enumeration, 19th IEEE International Conference on Application-specific Systems, Architectures and Processors (ASAP)
This data is extracted from the Web of Science and reproduced under a licence from Thomson Reuters. You may not copy or re-distribute this data in whole or in part without the written consent of the Science business of Thomson Reuters.