611 results found
Wang S, Niu X, Ma N, et al., 2016, A Scalable Dataflow Accelerator for Real Time Onboard Hyperspectral Image Classification, 12th International Symposium on Applied Reconfigurable Computing, Publisher: SPRINGER INTERNATIONAL PUBLISHING AG, Pages: 105-116, ISSN: 0302-9743
Niu X, Ng N, Yuki T, et al., 2016, EURECA Compilation: Automatic Optimisation of Cycle-Reconfigurable Circuits, 26th International Conference on Field-Programmable Logic and Applications (FPL), Publisher: IEEE, ISSN: 1946-1488
Dueben PD, Russell FP, Niu X, et al., 2015, On the use of programmable hardware and reduced numerical precision in earth-system modeling, JOURNAL OF ADVANCES IN MODELING EARTH SYSTEMS, Vol: 7, Pages: 1393-1408, ISSN: 1942-2466
Liu Q, Mak T, Zhang T, et al., 2015, Power-Adaptive Computing System Design for Solar-Energy-Powered Embedded Systems, IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, Vol: 23, Pages: 1402-1414, ISSN: 1063-8210
Pnevmatikatos D, Papadimitriou K, Becker T, et al., 2015, FASTER: Facilitating Analysis and Synthesis Technologies for Effective Reconfiguration, MICROPROCESSORS AND MICROSYSTEMS, Vol: 39, Pages: 321-338, ISSN: 0141-9331
Niu X, Chau TCP, Jin Q, et al., 2015, Automating Elimination of Idle Functions by Runtime Reconfiguration, ACM TRANSACTIONS ON RECONFIGURABLE TECHNOLOGY AND SYSTEMS, Vol: 8, ISSN: 1936-7406
Gan L, Fu H, Luk W, et al., 2015, Solving the global atmospheric equations through heterogeneous reconfigurable platforms, ACM Transactions on Reconfigurable Technology and Systems, Vol: 8, ISSN: 1936-7414
One of the most essential and challenging components in climate modeling is the atmospheric model. To solve multiphysical atmospheric equations, developers have to face extremely complex stencil kernels that are costly in terms of both computing and memory resources. This article aims to accelerate the solution of global shallow water equations (SWEs), which is one of the most essential equation sets describing atmospheric dynamics. We first design a hybrid methodology that employs both the host CPU cores and the field-programmable gate array (FPGA) accelerators to work in parallel. Through a careful adjustment of the computational domains, we achieve a balanced resource utilization and a further improvement of the overall performance. By decomposing the resource-demanding SWE kernel, we manage to map the double-precision algorithm into three FPGAs. Moreover, by using fixed-point and reduced-precision floating point arithmetic, we manage to build a fully pipelined mixed-precision design on a single FPGA, which can perform 428 floating-point and 235 fixed-point operations per cycle. The mixed-precision design with four FPGAs running together can achieve a speedup of 20 over a fully optimized design on a CPU rack with two eight-core processorsand is 8 times faster than the fully optimized Kepler GPU design. As for power efficiency, the mixed-precision design with four FPGAs is 10 times more power efficient than a Tianhe-1A supercomputer node.
Shao S, Guo L, Guo C, et al., Recursive pipelined genetic propagation for bilevel optimisation, FPL
The on-chip timing behaviour of synchronous circuits can be quantified at run-time by adding shadow registers, which allow designers to sample the most critical paths of a circuit at a different point in time than the user register would normally. In order to sample these paths precisely, the path skew between the user and the shadow register must be tightly controlled and consistent across all paths that are shadowed. Unlike a custom IC, FPGAs contain prefabricated resources from which composing an arbitrary routing delay is not trivial. This paper presents a method for inserting shadow registers with a minimum skew bound, whilst also reducing the maximum skew. To preserve circuit timing, we apply this to FPGA circuits post place-and-route, using only the spare resources left behind. We find that our techniques can achieve an average STA reported delay bound of ±200ps on a Xilinx device despite incomplete timing information, and achieve <1ps accuracy against our own delay model.
Arram J, Luk W, Jiang P, 2015, Ramethy: Reconfigurable acceleration of bisulfite sequence alignment, Pages: 250-259
This paper proposes a novel reconfigurable architecture for accelerating DNA sequence alignment. This architecture is applied to bisulfite sequence alignment, a stage in recently developed bioinformatics pipelines for cancer and non-invasive prenatal diagnosis. Alignment is currently the bottleneck in such pipelines, accounting for over 50% of the total analysis time. Our design, Ramethy (Reconfigurable Acceleration of METHYlation data analysis), performs alignment of short reads with up to two mismatches. Ramethy is based on the FM-index, which we optimise to reduce the number of search steps and improve approximate matching performance. We implement Ramethy on a 1U Maxeler MPC-X1000 dataow node consisting of 8 Altera Stratix-V FPGAs. Measured results show a 14.9 times speedup compared to soap2 running with 16 threads on dual Intel Xeon E5-2650 CPUs, and 3.8 times speedup compared to soap3-dp running on an NVIDIA GTX 580 GPU. Upper-bound performance estimates for the MPC-X1000 indicate a maximum speedup of 88.4 times and 22.6 times compared to soap2 and soap3-dp respectively. In addition to runtime, Ramethy consumes over an order of magnitude lower energy while having accuracy identical to soap2 and soap3-dp, making it a strong candidate for integration into bioinformatics pipelines.
Niu X, Luk W, Wang Y, 2015, EURECA: On-chip configuration generation for effective dynamic data access, Pages: 74-83
© Copyright ACM. This paper describes Effective Utilities for Run-timE Configuration Adaptation (EURECA), a novel memory architecture for supporting effective dynamic data access in reconfigurable devices. EURECA exploits on-chip configuration generation to reconfigure active connections in such devices cycle by cycle. When integrated into a baseline architecture based on the Virtex-6 SX475T, the EURECA memory architecture introduces small area, delay and power overhead. Three benchmark applications are developed with the proposed architecture targeting social networking (Memcached), scientific computing (sparse matrix-vector multiplication), and in-memory database (large-scale sorting). Compared with conventional static designs, up to 14.9 times reduction in area, 2.2 times reduction in critical-path delay, and 32.1 times reduction in area-delay product are achieved.
Bsoul AAM, Wilton SJE, Tsoi KH, et al., 2015, An FPGA Architecture and CAD Flow Supporting Dynamically Controlled Power Gating, IEEE Transactions on Very Large Scale Integration (VLSI) Systems, Vol: 24, Pages: 178-191, ISSN: 1063-8210
Leakage power is an important component of the total power consumption in field-programmable gate arrays (FPGAs) built using 90-nm and smaller technology nodes. Power gating was shown to be effective at reducing the leakage power. Previous techniques focus on turning OFF unused FPGA resources at configuration time; the benefit of this approach depends on resource utilization. In this paper, we present an FPGA architecture that enables dynamically controlled power gating, in which FPGA resources can be selectively powered down at run-time. This could lead to significant overall energy savings for applications having modules with long idle times. We also present a CAD flow that can be used to map applications to the proposed architecture. We study the area and power tradeoffs by varying the different FPGA architecture parameters and power gating granularity. The proposed CAD flow is used to map a set of benchmark circuits that have multiple power-gated modules to the proposed architecture. Power savings of up to 83% are achievable for these circuits. Finally, we study a control system of a robot that is used in endoscopy. Using the proposed architecture combined with clock gating results in up to 19% energy savings in this application.
Denholm S, Inoue H, Takenaka T, et al., 2015, Network-level FPGA acceleration of low latency market data feed arbitration, IEICE Transactions on Information and Systems, Vol: E98D, Pages: 288-297, ISSN: 0916-8532
Financial exchanges provide market data feeds to update their members about changes in the market. Feed messages are often used in time-critical automated trading applications, and two identical feeds (A and B feeds) are provided in order to reduce message loss. A key challenge is to support A/B line arbitration efficiently to compensate for missing packets, while offering flexibility for various operational modes such as prioritising for low latency or for high data reliability. This paper presents a reconfigurable acceleration approach for A/B arbitration operating at the network level, capable of supporting any messaging protocol. Two modes of operation are provided simultaneously: one prioritising low latency, and one prioritising high reliability with three dynamically configurable windowing methods. We also present a model for message feed processing latencies that is useful for evaluating scalability in future applications. We outline a new low latency, high throughput architecture and demonstrate a cycle-accurate testing framework to measure the actual latency of packets within the FPGA. We implement and compare the performance of the NASDAQ TotalView-ITCH, OPRA and ARCA market data feed protocols using a Xilinx Virtex-6 FPGA. For high reliability messages we achieve latencies of 42ns for TotalView-ITCH and 36.75ns for OPRA and ARCA. 6ns and 5.25ns are obtained for low latency messages. The most resource intensive protocol, TotalView-ITCH, is also implemented in a Xilinx Virtex- 5 FPGA within a network interface card; it is used to validate our approach with real market data. We offer latencies 10 times lower than an FPGA-based commercial design and 4.1 times lower than the hardware-accelerated IBM PowerEN processor, with throughputs more than double the required 10Gbps line rate.
Todman T, Stilkerich S, Luk W, 2015, In-circuit temporal monitors for runtime verification of reconfigurable designs, 52nd ACM/EDAC/IEEE Design Automation Conference (DAC), Publisher: IEEE COMPUTER SOC, ISSN: 0738-100X
Grigoras P, Burovskiy P, Hung E, et al., 2015, Accelerating SpMV on FPGAs by Compressing Nonzero Values, 2015 IEEE 23rd Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM), Publisher: IEEE, Pages: 64-67
Luk W, Constantinides GA, 2015, Transforming reconfigurable systems: A festschrift celebrating the 60th birthday of Professor Peter Cheung, ISBN: 9781783266968
© 2015 by Imperial College Press. All rights reserved. Over the last three decades, Professor Peter Cheung has made significant contributions to a variety of areas, such as analogue and digital computer-aided design tools, high-level synthesis and hardware/software codesign, low-power and high-performance circuit architectures for signal and image processing, and mixed-signal integrated-circuit design. However, the area that has attracted his greatest attention is reconfigurable systems and their design, and his work has contributed to the transformation of this important and exciting discipline. This festschrift contains a unique collection of technical papers based on presentations at a workshop at Imperial College London in May 2013 celebrating Professor Cheung's 60th birthday. Renowned researchers who have been inspired and motivated by his outstanding research in the area of reconfigurable systems are brought together from across the globe to offer their latest research in reconfigurable systems. Professor Cheung has devoted much of his professional career to Imperial College London, and has served with distinction as the Head of Department of Electrical and Electronic Engineering for several years. His outstanding capability and his loyalty to Imperial College and the Department of Electrical and Electronic Engineering are legendary. Professor Cheung has made tremendous strides in ensuring excellence in both research and teaching, and in establishing sound governance and strong financial endowment; but above all, he has made his department a wonderful place in which to work and study.
Luk W, Constantinides GA, 2015, Preface, ISBN: 9781783266968
Burovskiy P, Grigoras P, Sherwin S, et al., 2015, Efficient Assembly for High Order Unstructured FEM Meshes, 25th International Conference on Field Programmable Logic and Applications, Publisher: IEEE, ISSN: 1946-1488
Funie AI, Grigoras P, Burovskiy P, et al., 2015, Reconfigurable Acceleration of Fitness Evaluation in Trading Strategies, 26th International Conference on Application-Specific Systems, Architectures and Processors (ASAP), Publisher: IEEE, Pages: 210-217, ISSN: 1063-6862
Lee K-H, Guo Z, Chow GCT, et al., 2015, GPU-based Proximity Query Processing on Unstructured Triangular Mesh Model, IEEE International Conference on Robotics and Automation (ICRA), Publisher: IEEE COMPUTER SOC, Pages: 4405-4411, ISSN: 1050-4729
Russell FP, Duben PD, Niu X, et al., 2015, Architectures and precision analysis for modelling atmospheric variables with chaotic behaviour, 2015 IEEE 23rd Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM), Publisher: IEEE, Pages: 171-178
Shao S, Guo L, Guo C, et al., 2015, Recursive Pipelined Genetic Propagation for Bilevel Optimisation, 25th International Conference on Field Programmable Logic and Applications, Publisher: IEEE, ISSN: 1946-1488
Guo L, Guo C, Thomas DB, et al., 2015, Pipelined Genetic Propagation, 2015 IEEE 23rd Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM), Publisher: IEEE, Pages: 103-110
Leong PHW, Amano H, Anderson J, et al., 2015, Significant Papers from the First 25 Years of the FPL Conference, 25th International Conference on Field Programmable Logic and Applications, Publisher: IEEE, ISSN: 1946-1488
Rabozzi M, Cattaneo R, Becker T, et al., 2015, Relocation-aware Floorplanning for Partially-Reconfigurable FPGA-based Systems, 29th IEEE International Parallel and Distributed Processing Symposium (IPDPS), Publisher: IEEE, Pages: 97-104
Guo L, Funie AI, Xie Z, et al., 2015, A general-purpose framework for FPGA-accelerated genetic algorithms, INTERNATIONAL JOURNAL OF BIO-INSPIRED COMPUTATION, Vol: 7, Pages: 361-375, ISSN: 1758-0366
Arram J, Pflanzer M, Kaplan T, et al., 2015, FPGA Acceleration of Reference-Based Compression for Genomic Data, International Conference on Field Programmable Technology (FTP), Publisher: IEEE, Pages: 9-16
Targett S, Niu X, Russell F, et al., 2015, Lower Precision for Higher Accuracy: Precision and Resolution Exploration for Shallow Water Equations, International Conference on Field Programmable Technology (FTP), Publisher: IEEE, Pages: 208-211
Ciobanu CB, Varbanescu AL, Pnevmatikatos D, et al., 2015, EXTRA: Towards an Efficient Open Platform for Reconfigurable High Performance Computing, IEEE 18th International Conference on Computational Science and Engineering (CSE), Publisher: IEEE, Pages: 339-342
Zhang C, Ma Y, Luk W, 2015, HW/SW Partitioning Algorithm Targeting MPSOC With Dynamic Partial Reconfigurable Fabric, 14th International Conference on Computer Aided design and Computer Graphics, Publisher: IEEE, Pages: 240-241
This data is extracted from the Web of Science and reproduced under a licence from Thomson Reuters. You may not copy or re-distribute this data in whole or in part without the written consent of the Science business of Thomson Reuters.