619 results found
Tse AHT, Thomas DB, Luk W, 2009, Accelerating Quadrature Methods for Option Valuation, 17th Annual IEEE Symposium on Field Programmable Custom Computing Machines, Publisher: IEEE COMPUTER SOC, Pages: 29-36
Tsoi KH, Rueckert D, Ho CH, et al., 2009, RECONFIGURABLE ACCELERATION OF 3D IMAGE REGISTRATION, 5th Southern Conference on Programmable Logic, Publisher: IEEE, Pages: 95-100
Fidjeland AK, Roesch EB, Shanahan MP, et al., 2009, NeMo: A Platform for Neural Modelling of Spiking Neurons Using GPUs, 20th IEEE International Conference on Application-Specific Systems, Architectures and Processors, Publisher: IEEE, Pages: 137-144, ISSN: 2160-0511
Spacey SA, Luk W, Kelly PHJ, et al., 2009, RAPID DESIGN SPACE VISUALISATION THROUGH HARDWARE/SOFTWARE PARTITIONING, 5th Southern Conference on Programmable Logic, Publisher: IEEE, Pages: 159-164
Lam YM, Coutinho JGF, Luk W, et al., 2009, OPTIMISING MULTI-LOOP PROGRAMS FOR HETEROGENEOUS COMPUTING SYSTEMS, 2009 5TH SOUTHERN CONFERENCE ON PROGRAMMABLE LOGIC, PROCEEDINGS, Pages: 129-+
Todman T, Fu H, Tsoi B, et al., 2009, Smart Enumeration: A Systematic Approach to Exhaustive Search, 18th International Workshop on Power and Timing Modeling, Optimization and Simulation (PATMOS), Publisher: SPRINGER-VERLAG BERLIN, Pages: 429-438, ISSN: 0302-9743
Susanto KW, Todman T, Coutinho JG, et al., 2009, Design Validation by Symbolic Simulation and Equivalence Checking: A Case Study in Memory Optimization for Image Manipulation, 35th Conference on Current Trends in Theory and Practice of Computer Science, Publisher: SPRINGER-VERLAG BERLIN, Pages: 509-520, ISSN: 0302-9743
Ho CH, Yu CW, Leong PHW, et al., 2009, Floating-point FPGA: architecture and modeling, IEEE Transactions on VLSI Systems, Vol: 17, Pages: 1709-1718
Wildie M, Luk W, Schultz SR, et al., 2009, Reconfigurable acceleration of neural models with gap junctions, Sydney, Australia, Pages: 439-442
Liu Q, Todman, Luk W, et al., 2009, Optimising Designs by Combining Model-based and Pattern-based Transformations
Jamieson P, Luk W, Constantinides GA, et al., 2009, An Energy and Power Consumption Analysis of FPGA Routing Architectures, Pages: 324-327
Liu Q, Todman, Coutinho G, et al., 2009, Automatic optimisation of map-reduce designs by geometric programming, Pages: 215-222
Terry L, Roitch V, Tufail S, et al., 2009, Harnessing Human Computation Cycles for the FPGA Placement Problem., Publisher: CSREA Press, Pages: 188-194
Pell O, Luk W, 2008, Instance-Specific Design, Pages: 455-474
This chapter covers instance-specific design, an optimization technique involving effective exploitation of information specific to an instance of a generic design description. It introduces different types of instance-specific designs with examples and describes partial evaluation, a systematic method for producing instance-specific designs that can be automated. It covers the application of partial evaluation to hardware design in general and to field-programmable gate arrays (FPGAs) in particular. FPGAs are an effective way to implement designs in computationally intensive datapath-oriented applications such as cryptography, digital signal processing (DSP), and network processing. The main alternative implementation technologies in these application areas are general-purpose processors, digital signal processors, and application-specific integrated circuits (ASICs). Instance-specific design offers the opportunity to exploit the reconfigurable nature of FPGAs to improve performance by tailoring circuits to particular problem instances. It can be broadly categorized into three techniques that include constant folding, which can be applied when some inputs are static; function adaptation, which alters the function of circuitry to produce a certain quality of result; and architecture adaptation, in which the circuit architecture is adapted without affecting its functional behavior. The level of automation that can be applied varies among these approaches. Constant folding can often be carried out automatically using partial evaluation techniques. Function adaptation can be performed by varying bit widths and arithmetic methods in parameterized IP cores. Tools such as Quartz (for low-level design) or tool for stream architectures can produce highly parameterized circuit cores where design parameters can be traded off against each other to achieve the desired requirements in area, speed, and power consumption. Architecture adaptation, such as adding processing units to
Koester M, Luk W, Brown G, 2008, A hardware compilation flow for instance-specific vliw cores, Pages: 619-622
Hardware compilers for high-level programming languages are important tools to reduce the design productivity gap in hardware development. In this paper a hardware compilation approach is described, which is able to generate a hardware description based on a specification in a high-level programming language such as ANSI C. No modification of the program specification is required, allowing it to be suitable for a hardware and a software implementation at the same time. The parallelism is extracted by using VLIW optimization techniques. The generated hardware implementation is an instance-specific VLIW core, which is defined by its high-level program specification. To demonstrate the principle of the design flow, a prototype is presented which uses the VEX compiler as the front-end and the Handel-C tool chain as the back-end. The resulting instance-specific VLIW cores of several test functions are compared to equivalent software implementations. © 2008 IEEE.
Luk W, Mencer O, Savaria Y, 2008, Guest editorial: 20 years of ASAP, JOURNAL OF SIGNAL PROCESSING SYSTEMS FOR SIGNAL IMAGE AND VIDEO TECHNOLOGY, Vol: 53, Pages: 1-2, ISSN: 1939-8018
Becker T, Jamieson P, Luk W, et al., 2008, Towards benchmarking energy efficiency of reconfigurable architectures, International Conference on Field Programmable Logic and Applications, Publisher: IEEE, Pages: 691-694
Energy research in reconfigurable architectures often involves legacy benchmarks such as the MCNC benchmarks. These benchmarks, however, are not well-suited for assessing energy consumption of reconfigurable technology, since they lack realistic input stimuli. This paper reviews and categorises a range of computation system benchmarks, and shows that there are no comprehensive benchmarks targeting reconfigurable architectures that would stimulate energy or power research. We review existing energy research in the field which involves microbenchmarks, in-house designs, or legacy benchmark suites used to evaluate power optimisations.
Echeverría P, Thomas DB, López-Vallejo M, et al., 2008, An FPGA run-time parameterisable log-normal random number generator, Pages: 221-232, ISSN: 0302-9743
Monte Carlo financial simulation relies on the generation of random variables with different probability distribution functions. These simulations, particularly the random number generator (RNG) cores, are computationally intensive and are ideal candidates for hardware acceleration. In this work we present an FPGA based Log-normal RNG ideally suited for financial Monte Carlo simulations, as it is run-time parameterisable and compatible with variance reduction techniques. Our architecture achieves a throughput of one sample per cycle with a 227.6 MHz clock on a Xilinx Virtex-4 FPGA. © 2008 Springer-Verlag Berlin Heidelberg.
Todman T, Atasu K, Mencer O, et al., 2008, Optimal Implementation of Combinational Logic on Lookup Tables, The Fourth Conference on Ph.D. Research in Microelectronics and Electronics (PRIME'08), Istanbul, Turkey.
We present a methodology for optimally implementing combinational logic equations on networks of look-up tables. Our work effectively extends optimality to span logic minimization and technology mapping. We restrict ourselves to 4-input look-up tables (LUTs) and enumerate all possible circuits up to a certain area or latency. Since simple-minded enumeration would take a long time, we develop levels of abstractions (steps) and we formulate the key step of enumeration as an Integer Linear Programming (ILP) problem. We show results on a set of ISCAS benchmarks.
Thomas DB, Luk W, 2008, Multivariate Gaussian Random Number Generation Targeting Reconfigurable Hardware, ACM Transactions on Reconfigurable Technology and Systems (TRETS), Vol: 1, Pages: 12:1-12:??, ISSN: 1936-7406
The multivariate Gaussian distribution is often used to model correlations between stochastic time-series, and can be used to explore the effect of these correlations across N\/ time-series in Monte-Carlo simulations. However, generating random correlated vectors is an O\/ (N\/$^2$) process, and quickly becomes a computational bottleneck in software simulations. This article presents an efficient method for generating vectors in parallel hardware, using N\/ parallel pipelined components to generate a new vector every N\/ cycles. This method maps well to the embedded block RAMs and multipliers in contemporary FPGAs, particularly as extensive testing shows that the limited bit-width arithmetic does not reduce the statistical quality of the generated vectors. An implementation of the architecture in the Virtex-4 architecture achieves a 500MHz clock-rate, and can support vector lengths up to 512 in the largest devices. The combination of a high clock-rate and parallelism provides a significant performance advantage over conventional processors, with an xc4vsx55 device at 500MHz providing a 200 times speedup over an Opteron 2.6GHz using an AMD optimised BLAS package. In a case study in Delta-Gamma Value-at Risk, an RC2000 accelerator card using an xc4vsx55 at 400MHz is 26 times faster than a quad Opteron 2.6GHz SMP.
Lee D-U, Cheung RCC, Luk W, et al., 2008, Hardware implementation trade-offs of polynomial approximations and interpolations, IEEE TRANSACTIONS ON COMPUTERS, Vol: 57, Pages: 686-701, ISSN: 0018-9340
Mak T, D'Alessandro C, Sedcole P, et al., 2008, Implementation of Wave-Pipelined Interconnects in FPGAs, Publisher: IEEE, Pages: 213-214
Global interconnection and communication at high clock frequencies are becoming more problematic in FPGA. In this paper, we address this problem by presenting an interconnect wave-pipelining strategy, which utilizes the existing programmable interconnects fabrics to provide high-throughput communication in FPGA. Two design approaches for interconnect wave-pipelining, using simple clock phase shifting and asynchronous phase encoding, are presented in this paper. Experimental results from a Xilinx Virtex-5 FPGA device are also presented.
Wilton S, Ho C, Quinton B, et al., 2008, A Synthesizable Datapath-Oriented Embedded FPGA Fabric for Silicon Debug Applications, ACM Transactions on Reconfigurable Technology and Systems, Vol: 1, Pages: 1-25
We present an architecture for a synthesizable datapath-oriented FPGA core which can be used to provide post-fabrication flexibility to an SoC. Our architecture is optimized for bus-based operations and employs a directional routing architecture, which allows it to be synthesized using standard ASIC design tools and flows. The primary motivation for this architecture is to provide an efficient mechanism to support on-chip debugging. The fabric can also be used to implement other datapath-oriented circuits such as those needed in signal processing and computation-intensive applications. We evaluate our architecture using a set of benchmark circuits and compare it to previous fabrics in terms of area, speed, and power consumption.\r\n\r\n
Cope BT, Cheung PYK, Luk W, 2008, Using Reconfigurable Logic to Optimise GPU Memory Accesses, Pages: 44-49
Osborne WG, Coutinho JGF, Luk W, et al., 2008, Power-Aware and Branch-Aware Word-Length Optimization, 16th Annual IEEE Symposium on Field-Programmable Custom Computing Machines, Publisher: IEEE COMPUTER SOC, Pages: 129-138
Mak T, Sedcole P, Cheung PYK, et al., 2008, Interconnection Lengths and Delays Estimation for Communication Links in FPGAs, ACM International Workshop on System Level Interconnect Prediction, Publisher: ASSOC COMPUTING MACHINERY, Pages: 1-9
Thomas DB, Luk W, 2008, Credit Risk Modelling using Hardware Accelerated Monte-Carlo Simulation, 16th Annual IEEE Symposium on Field-Programmable Custom Computing Machines, Publisher: IEEE COMPUTER SOC, Pages: 229-238
Thomas DB, Luk W, 2008, SAMPLING FROM THE EXPONENTIAL DISTRIBUTION USING INDEPENDENT BERNOULLI VARIATES, 18th International Conference on Field Programmable and Logic Applications, Publisher: IEEE, Pages: 239-244, ISSN: 1946-1488
This data is extracted from the Web of Science and reproduced under a licence from Thomson Reuters. You may not copy or re-distribute this data in whole or in part without the written consent of the Science business of Thomson Reuters.