619 results found
Rabozzi M, Cattaneo R, Becker T, et al., 2015, Relocation-aware Floorplanning for Partially-Reconfigurable FPGA-based Systems, 29th IEEE International Parallel and Distributed Processing Symposium (IPDPS), Publisher: IEEE, Pages: 97-104
Guo L, Funie AI, Xie Z, et al., 2015, A general-purpose framework for FPGA-accelerated genetic algorithms, INTERNATIONAL JOURNAL OF BIO-INSPIRED COMPUTATION, Vol: 7, Pages: 361-375, ISSN: 1758-0366
Ciobanu CB, Varbanescu AL, Pnevmatikatos D, et al., 2015, EXTRA: Towards an Efficient Open Platform for Reconfigurable High Performance Computing, 18th IEEE International Conference on Computational Science and Engineering (CSE), Publisher: IEEE, Pages: 339-342
Todman T, Stilkerich S, Luk W, 2015, In-circuit temporal monitors for runtime verification of reconfigurable designs, 52nd ACM/EDAC/IEEE Design Automation Conference (DAC), Publisher: IEEE COMPUTER SOC, ISSN: 0738-100X
Lee K-H, Guo Z, Chow GCT, et al., 2015, GPU-based Proximity Query Processing on Unstructured Triangular Mesh Model, IEEE International Conference on Robotics and Automation (ICRA), Publisher: IEEE COMPUTER SOC, Pages: 4405-4411, ISSN: 1050-4729
Zhang C, Ma Y, Luk W, 2015, HW/SW Partitioning Algorithm Targeting MPSOC With Dynamic Partial Reconfigurable Fabric, 14th International Conference on Computer Aided design and Computer Graphics, Publisher: IEEE, Pages: 240-241
Luk W, Constantinides GA, 2015, Preface, ISBN: 9781783266968
Luk W, 2015, Analysing reconfigurable computing systems, Transforming Reconfigurable Systems: A Festschrift Celebrating the 60th Birthday of Professor Peter Cheung, Pages: 101-116, ISBN: 9781783266968
© 2015 by Imperial College Press. All rights reserved. The distinguishing feature of a reconfigurable computing system is that the function and the interconnection of its processing elements can be changed, in some cases during run-time. However, reconfigurability is a double-edged sword: it only produces attractive results if used judiciously, since there are various overheads associated with exploiting reconfigurability in computing systems. This chapter introduces a simple approach for analysing the performance, resource usage and energy consumption of reconfigurable computing systems, and explains how it can be used in analysing some recent advances in design techniques for various applications that produce run-time reconfigurable implementations. Directions for future development of this approach are also explored.
Luk W, Constantinides GA, 2015, Transforming reconfigurable systems: A festschrift celebrating the 60th birthday of Professor Peter Cheung, ISBN: 9781783266968
© 2015 by Imperial College Press. All rights reserved. Over the last three decades, Professor Peter Cheung has made significant contributions to a variety of areas, such as analogue and digital computer-aided design tools, high-level synthesis and hardware/software codesign, low-power and high-performance circuit architectures for signal and image processing, and mixed-signal integrated-circuit design. However, the area that has attracted his greatest attention is reconfigurable systems and their design, and his work has contributed to the transformation of this important and exciting discipline. This festschrift contains a unique collection of technical papers based on presentations at a workshop at Imperial College London in May 2013 celebrating Professor Cheung's 60th birthday. Renowned researchers who have been inspired and motivated by his outstanding research in the area of reconfigurable systems are brought together from across the globe to offer their latest research in reconfigurable systems. Professor Cheung has devoted much of his professional career to Imperial College London, and has served with distinction as the Head of Department of Electrical and Electronic Engineering for several years. His outstanding capability and his loyalty to Imperial College and the Department of Electrical and Electronic Engineering are legendary. Professor Cheung has made tremendous strides in ensuring excellence in both research and teaching, and in establishing sound governance and strong financial endowment; but above all, he has made his department a wonderful place in which to work and study.
Inggs G, Thomas DB, Luk W, 2015, An Efficient, Automatic Approach to High Performance Heterogeneous Computing., CoRR, Vol: abs/1505.04417
Chau TCP, Niu X, Eele A, et al., 2014, Mapping Adaptive Particle Filters to Heterogeneous Reconfigurable Systems, ACM Transactions on Reconfigurable Technology and Systems, Vol: 7, ISSN: 1936-7414
This article presents an approach for mapping real-time applications based on particle filters (PFs) toheterogeneous reconfigurable systems, which typically consist of multiple FPGAs and CPUs. A method isproposed to adapt the number of particles dynamically and to utilise runtime reconfigurability of FPGAs forreduced power and energy consumption. A data compression scheme is employed to reduce communicationoverhead between FPGAs and CPUs. A mobile robot localisation and tracking application is developed toillustrate our approach. Experimental results show that the proposed adaptive PF can reduce up to 99% ofcomputation time. Using runtime reconfiguration, we achieve a 25% to 34% reduction in idle power. A 1Usystem with four FPGAs is up to 169 times faster than a single-core CPU and 41 times faster than a 1UCPU server with 12 cores. It is also estimated to be 3 times faster than a system with four GPUs.
Guo C, Luk W, 2014, Pipelined HAC Estimation Engines for Multivariate Time Series, JOURNAL OF SIGNAL PROCESSING SYSTEMS FOR SIGNAL IMAGE AND VIDEO TECHNOLOGY, Vol: 77, Pages: 117-129, ISSN: 1939-8018
Yang J, Lin B, Luk W, et al., 2014, Particle filtering-based maximum likelihood estimation for financial parameter estimation, 24th International Conference on Field Programmable Logic and Applications, (FPL) 2014, Publisher: IEEE
This paper presents a novel method for estimating parameters of financial models with jump diffusions. It is a Particle Filter based Maximum Likelihood Estimation process, which uses particle streams to enable efficient evaluation of constraints and weights. We also provide a CPU-FPGA collaborative design for parameter estimation of Stochastic Volatility with Correlated and Contemporaneous Jumps model as a case study. The result is evaluated by comparing with a CPU and a cloud computing platform. We show 14 times speed up for the FPGA design compared with the CPU, and similar speedup but better convergence compared with an alternative parallelisation scheme using Techila Middleware on a multi-CPU environment.
Burovskiy PA, Girdlestone S, Davies C, et al., 2014, Dataflow acceleration of Krylov subspace sparse banded problems, 24th International Conference on Field Programmable Logic and Applications (FPL), 2014, Publisher: IEEE
Most of the efforts in the FPGA community related to sparse linear algebra focus on increasing the degree of internal parallelism in matrix-vector multiply kernels. We propose a parametrisable dataflow architecture presenting an alternative and complementary approach to support acceleration of banded sparse linear algebra problems which benefit from building a Krylov subspace. We use banded structure of a matrix A to overlap the computations Ax, A2x,..., Akx by building a pipeline of matrix-vector multiplication processing elements (PEs) each performing Aix. Due to on-chip data locality, FLOPS rate sustainable by such pipeline scales linearly with k. Our approach enables trade-off between the number k of overlapped matrix power actions and the level of parallelism in a PE. We illustrate our approach for Google PageRank computation by power iteration for large banded single precision sparse matrices. Our design scales up to 32 sequential PEs with floating point accumulation and 80 PEs with fixed point accumulation on Stratix V D8 FPGA. With 80 single-pipe fixed point PEs clocked at 160Mhz, our design sustains 12.7 GFLOPS.
Hung E, Todman T, Luk W, 2014, Transparent Insertion of Latency-Oblivious Logic onto FPGAs, 24th International Conference on Field Programmable Logic and Applications (FPL), 2014, Publisher: IEEE
We present an approach for inserting latency-oblivious functionality into pre-existing FPGA circuits transparently. To ensure transparency — that such modifications do not affect the design’s maximum clock frequency — we insert any additional logic post place-and-route, using only the spare resources that were not consumed by the pre-existing circuit. The typical challenge with adding new functionality into existing circuits incrementally is that spare FPGA resources to host this functionality must be located close to the input signals that it requires, in order to minimise the impact of routing delays. In congested designs, however, such co-location is often not possible. We overcome this challenge by using flow techniques to pipeline and route signals from where they originate, potentially in a region of high resource congestion, into a region of low congestion capable of hosting new circuitry, at the expense of latency. We demonstrate and evaluate our approach by augmenting realistic designs with self-monitoring circuitry, which is not sensitive to latency. We report results on circuits operating over 200MHz and show that our insertions have no impact on timing, are 2–4 times faster than compile-time insertion, and incur only a small power overhead.
Liu Q, Mak T, Zhang T, et al., 2014, Power-Adaptive Computing System Design for Solar-Energy-Powered Embedded Systems, IEEE Transactions on Very Large Scale Integration (VLSI) Systems, Vol: 23, Pages: 1402-1414, ISSN: 1557-9999
Through energy harvesting system, new energy sources are made available immediately for many advanced applications based on environmentally embedded systems. However, the harvested power, such as the solar energy, varies significantly under different ambient conditions, which in turn affects the energy conversion efficiency. In this paper, we propose an approach for designing power-adaptive computing systems to maximize the energy utilization under variable solar power supply. Using the geometric programming technique, the proposed approach can generate a customized parallel computing structure effectively. Then, based on the prediction of the solar energy in the future time slots by a multilayer perceptron neural network, a convex model-based adaptation strategy is used to modulate the power behavior of the real-time computing system. The developed power-adaptive computing system is implemented on the hardware and evaluated by a solar harvesting system simulation framework for five applications. The results show that the developed power-adaptive systems can track the variable power supply better. The harvested solar energy utilization efficiency is 2.46 times better than the conventional static designs and the rule-based adaptation approaches. Taken together, the present thorough design approach for self-powered embedded computing systems has a better utilization of ambient energy sources.
Shan Y, Hao Y, Wang W, et al., 2014, Hardware Acceleration for an Accurate Stereo Vision System Using Mini-Census Adaptive Support Region, ACM TRANSACTIONS ON EMBEDDED COMPUTING SYSTEMS, Vol: 13, ISSN: 1539-9087
Guo C, Luk W, Vinkovskaya E, et al., 2014, Customisable pipelined engine for intensity evaluation in multivariate hawkes point processes, ACM SIGARCH Computer Architecture News, Vol: 41, Pages: 59-64, ISSN: 0163-5964
Guo L, Thomas DB, Luk W, 2014, Customisable architectures for the set covering problem, ACM SIGARCH Computer Architecture News, Vol: 41, Pages: 101-106, ISSN: 0163-5964
Niu X, Jin Q, Luk W, et al., 2014, A Self-Aware Tuning and Self-Aware Evaluation Method for Finite-Difference Applications in Reconfigurable Systems, ACM TRANSACTIONS ON RECONFIGURABLE TECHNOLOGY AND SYSTEMS, Vol: 7, ISSN: 1936-7406
Chau TCP, Targett JS, Wijeyasinghe M, et al., 2014, Accelerating Sequential Monte Carlo Method for Real-time Air Traffic Management, Publisher: ACM, Pages: 35-40, ISSN: 0163-5964
Le Masle A, Luk W, 2014, Mapping Loop Structures onto Parametrized Hardware Pipelines, IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, Vol: 22, Pages: 631-640, ISSN: 1063-8210
Guo C, Luk W, Weston S, 2014, Pipelined Reconfigurable Accelerator for Ordinal Pattern Encoding, IEEE 25th International Conference on Application-Specific Systems, Architectures and Processors (ASAP), Publisher: IEEE, Pages: 194-201, ISSN: 2160-0511
Li Y, Zhang Y, Yang J, et al., 2014, An Approach of Processor Core Customization for Stencil Computation, IEEE 25th International Conference on Application-Specific Systems, Architectures and Processors (ASAP), Publisher: IEEE, Pages: 182-+, ISSN: 2160-0511
Grigoras P, Tottenham M, Niu X, et al., 2014, Elastic Management of Reconfigurable Accelerators, 12th IEEE International Symposium on Parallel and Distributed Processing with Applications (ISPA), Publisher: IEEE, Pages: 174-181, ISSN: 2158-9178
Zhao W, Fu H, Yang G, et al., 2014, Patra: Parallel tree-reweighted message passing architecture
© 2014 Technical University of Munich (TUM). Maximum a posteriori probability inference algorithms for Markov Random Field are widely used in many applications, such as computer vision and machine learning. Sequential tree-reweighted message passing (TRW-S) is an inference algorithm which shows good quality in finding optimal solutions. However, the performance of TRW-S in software cannot meet the requirements of many real-time applications, due to the sequential scheme and the high memory, bandwidth and computational costs. This paper proposes Patra, a novel parallel tree-reweighted message passing architecture, which involves a fully pipelined design targeting FPGA technology. We build a hybrid CPU/FPGA system to test the performance of Patra for stereo matching. Experimental results show that Patra provides about 100 times faster than a software implementation of TRW-S, and 12 times faster than a GPU-based message passing algorithm. Compared with an existing design in four FPGAs, we can achieve 2 times speedup in a single FPGA. Moreover, Patra can work at video rate in many cases, such as a rate of 167 frame/sec for a standard stereo matching test case, which makes it promising for many real-time applications.
Gan L, Fu H, Yang C, et al., 2014, A highly-efficient and green data flow engine for solving Euler atmospheric equations
© 2014 Technical University of Munich (TUM). Atmospheric modeling is an essential issue in the study of climate change. However, due to the complicated algorithmic and communication models, scientists and researchers are facing tough challenges in finding efficient solutions to solve the atmospheric equations. In this paper, we accelerate a solver for the three-dimensional Euler atmospheric equations through reconfigurable data flow engines. We first propose a hybrid design that achieves efficient resource allocation and data reuse. Furthermore, through algorithmic offsetting, fast memory table, and customizable-precision arithmetic, we map a complex Euler kernel into a single FPGA chip, which can perform 956 floating point operations per cycle. In a 1U-chassis, our CPU-DFE unit with 8 FPGA chips is 18.5 times faster and 8.3 times more power efficient than a multicore system based on two 12-core Intel E5-2697 (Ivy Bridge) CPUs, and is 6.2 times faster and 5.2 times more power efficient than a hybrid unit equipped with two 12-core Intel E5-2697 (Ivy Bridge) CPUs and three Intel Xeon Phi 5120d (MIC) cards.
Guo L, Thomas DBJ, Guo C, et al., 2014, Automated framework for FPGA-based parallel genetic algorithms, 24th International Conference on Field Programmable Logic and Applications, Publisher: IEEE
Parallel genetic algorithms (pGAs) are a variant of genetic algorithms which can promise substantial gains in both efficiency of execution and quality of results. pGAs have attracted researchers to implement them in FPGAs, but the implementation always needs large human effort. To simplify the implementation process and make the hardware pGA designs accessible to potential non-expert users, this paper proposes a general-purpose framework, which takes in a high-level description of the optimisation target and automatically generates pGA designs for FPGAs. Our pGA system exploits the two levels of parallelism found in GA instances and genetic operations, allowing users to tailor the architecture for resource constraints at compile-time. The framework also enables users to tune a subset of parameters at run-time without time-consuming recompilation. Our pGA design is more flexible than previous ones, and has an average speedup of 26 times compared to the multi-core counterparts over five combinatorial and numerical optimisation problems. When compared with a GPU, it also shows a 6.8 times speedup over a combinatorial application.
Chow G, Grigoras P, Burovskiy PA, et al., 2014, An efficient sparse conjugate gradient solver using a Beneš permutation network, 24th International Conference on Field Programmable Logic and Applications, Publisher: IEEE
The conjugate gradient (CG) is one of the most widely used iterative methods for solving systems of linear equations. However, parallelizing CG for large sparse systems is difficult due to the inherent irregularity in memory access pattern. We propose a novel processor architecture for the sparse conjugate gradient method. The architecture consists of multiple processing elements and memory banks, and is able to compute efficiently both sparse matrix-vector multiplication, and other dense vector operations. A Beneš permutation network with an optimised control scheme is introduced to reduce memory bank conflicts without expensive logic. We describe a heuristics for offline scheduling, the effect of which is captured in a parametric model for estimating the performance of designs generated from our approach.
Denholm S, Inoue H, Takenaka T, et al., 2014, Low Latency FPGA Acceleration of Market Data Feed Arbitration, IEEE 25th International Conference on Application-Specific Systems, Architectures and Processors (ASAP), Publisher: IEEE, Pages: 36-40, ISSN: 2160-0511
This data is extracted from the Web of Science and reproduced under a licence from Thomson Reuters. You may not copy or re-distribute this data in whole or in part without the written consent of the Science business of Thomson Reuters.