611 results found
Spacey SA, Wiesemann W, Kuhn D, et al., 2012, Robust Software Partitioning with Multiple Instantiation, INFORMS JOURNAL ON COMPUTING, Vol: 24, Pages: 500-515, ISSN: 1091-9856
Coutinho JGF, Carvalho T, Durand S, et al., 2012, Experiments with the LARA aspect-oriented approach, Pages: 27-30
This demonstration presents a novel design-flow and aspect-oriented language called LARA , which is currently used to guide the mapping of high-level C application codes to heterogeneous high-performance embedded systems. In particular, LARA is capable of capturing complex strategies and schemes involving: hardware/software partitioning, code specialization, source code transformations and code instrumentation. A key element of LARA, and a distinguishing feature from existing approaches, is its ability to support the specification of non-functional requirements and user knowledge in a non-invasive way in the exploration of suitable implementations. The design-flow incorporates several tools, such as a LARA frontend, a hardware/software partitioning tool, an aspect weaver, cost estimator, and a source-level transformation engine. All these components can be coordinated as part of an elaborate application mapping strategy using LARA. In this demonstration, we illustrate how non-functional cross-cutting concerns such as runtime monitorization and performance are codified and described in LARA and how the weaving process affects selected applications. Furthermore, we also explain how third-party tools, such as gprof, can be incorporated into the design-flow and aspect description, for instance, to affect the hardware/software partitioning process. We demonstrate how LARA can be used to extract run-time information, such as range values of variables, and can control code transformations and compiler optimizations addressing customized implementations of the corresponding computations on FPGAs. © 2012 ACM.
Cardoso JMP, Carvalho T, Coutinho JGF, et al., 2012, LARA: An aspect-oriented programming language for embedded systems, Pages: 179-190
The development of applications for high-performance embedded systems is typically a long and error-prone process. In addition to the required functions, developers must consider various and often conflicting non-functional application requirements such as performance and energy efficiency. The complexity of this process is exacerbated by the multitude of target architectures and the associated retargetable mapping tools. This paper introduces an Aspect-Oriented Programming (AOP) approach that conveys domain knowledge and non-functional requirements to optimizers and mapping tools. We describe a novel AOP language, LARA, which allows the specification of compilation strategies to enable efficient generation of software code and hardware cores for alternative target architectures. We illustrate the use of LARA for code instrumentation and analysis, and for guiding the application of compiler and hardware synthesis optimizations. An important LARA feature is its capability to deal with different join points, action models, and attributes, and to generate an aspect intermediate representation. We present examples of our aspect-oriented hardware/software design flow for mapping real-life application codes to embedded platforms based on Field Programmable Gate Array (FPGA) technology. © 2012 ACM.
Tse AHT, Thomas D, Luk W, 2012, Design Exploration of Quadrature Methods in Option Pricing, IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, Vol: 20, Pages: 818-826, ISSN: 1063-8210
Liu Q, Luk W, 2012, Heterogeneous systems for energy efficient scientific computing, Pages: 64-75, ISSN: 0302-9743
This paper introduces a novel approach for exploring heterogeneous computing engines which include GPUs and FPGAs as accelerators. Our goal is to systematically automate finding solutions for such engines that maximize energy efficiency while meeting requirements in throughput and in resource constraints. The proposed approach, based on a linear programming model, enables optimization of system throughput and energy efficiency, and analysis of energy efficiency sensitivity and power consumption issues. It can be used in evaluating current and future computing hardware and interfaces to identify appropriate combinations. A heterogeneous system containing a CPU, a GPU and an FPGA with a PCI Express interface is studied based on the High Performance Linpack application. Results indicate that such a heterogeneous computing system is able to provide energy-efficient solutions to scientific computing with various performance demands. The improvement of system energy efficiency is more sensitive to some of the system components, for example in the studied system concurrently improving the energy efficiency of the interface and the GPU by 10 times could lead to over 10 times improvement of the system energy efficiency. © 2012 Springer-Verlag.
Tse AHT, Chow GCT, Jin Q, et al., 2012, Optimising performance of quadrature methods with reduced precision, Pages: 251-263, ISSN: 0302-9743
This paper presents a generic precision optimisation methodology for quadrature computation targeting reconfigurable hardware to maximise performance at a given error tolerance level. The proposed methodology optimises performance by considering integration grid density versus mantissa size of floating-point operators. The optimisation provides the number of integration points and mantissa size with maximised throughput while meeting given error tolerance requirement. Three case studies show that the proposed reduced precision designs on a Virtex-6 SX475T FPGA are up to 6 times faster than comparable FPGA designs with double precision arithmetic. They are up to 15.1 times faster and 234.9 times more energy efficient than an i7-870 quad-core CPU, and are 1.2 times faster and 42.2 times more energy efficient than a Tesla C2070 GPU. © 2012 Springer-Verlag.
Efficient communication between nodes is critical for achieving high performance in a computer cluster. Based on a dedicated inter-accelerator network, we enhance this communication with advanced networking functions, such as broadcasting and priority routing. This work enables decoupling user applications from physical network implementations, improving overall communication efficiency and modularity. A performance model is introduced taking into account application and platform specific parameters. Experiments are performed for various network configurations and application patterns. The results show up to a 55% reduction of communication time when employing our approach. © 2012 Springer-Verlag.
Jin Q, Dong D, Tse AHT, et al., 2012, Multi-level customisation framework for curve based Monte Carlo financial simulations, Pages: 187-201, ISSN: 0302-9743
One of the main challenges when accelerating financial applications using reconfigurable hardware is the management of design complexity. This paper proposes a multi-level customisation framework for automatic generation of complex yet highly efficient curve based financial Monte Carlo simulators on reconfigurable hardware. By identifying multiple levels of functional specialisations and the optimal data format for the Monte Carlo simulation, we allow different levels of programmability in our framework to retain good performance and support multiple applications. Designs targeting a Virtex-6 SX475T FPGA generated by our framework are about 40 times faster than single-core software implementations on an i7-870 quad-core CPU at 2.93 GHz; they are over 10 times faster and 20 times more energy efficient than 4-core implementations on the same i7-870 quad-core CPU, and are over three times more energy efficient and 36% faster than a highly optimised implementation on an NVIDIA Tesla C2070 GPU at 1.15 GHz. In addition, our framework is platform independent and can be extended to support CPU and GPU applications. © 2012 Springer-Verlag.
Liu Q, Todman T, Luk W, et al., 2012, Automated Mapping of the MapReduce Pattern onto Parallel Computing Platforms, JOURNAL OF SIGNAL PROCESSING SYSTEMS FOR SIGNAL IMAGE AND VIDEO TECHNOLOGY, Vol: 67, Pages: 65-78, ISSN: 1939-8018
Chau TCP, Luk W, Cheung PYK, 2012, Roberts, ACM SIGARCH Computer Architecture News, Vol: 40, Pages: 10-10, ISSN: 0163-5964
Tsoi KH, Becker T, Luk W, 2012, Modelling reconfigurable systems in event driven simulation, Pages: 34-34, ISSN: 0163-5964
Yiu KFC, Lu Y, Ho CH, et al., 2012, Reconfigurable FPGA-based switching path frequency-domain echo canceller with applications to voice control device, DIGITAL SIGNAL PROCESSING, Vol: 22, Pages: 376-390, ISSN: 1051-2004
© Springer Science+Business Media B.V. 2012. In the last decade automotive audio has been gaining great attention by the scientific and industrial communities. In this context, a new approach to test and develop advanced audio algorithms for an heterogeneous embedded platform has been proposed within the European hArtes project. A real audio laboratory installed in a real car (hArtes CarLab) has been developed employing professional audio equipment. The algorithms can be tested and validated on a PC exploiting each application as a plug-in of the real time NU-Tech framework. Then a set of tools (hArtes Toolchain) can be used to generate code for the embedded platform starting from the plug-in implementation. An overview of the whole system is here presented, taking into consideration a complete set of audio algorithms developed for the advanced car infotainment system (ACIS) that is composed of three main different applications regarding the In Car listening and communication experience. Starting from a high level description of the algorithms, several implementations on different levels of hardware abstraction are presented, along with empirical results on both the design process undergone and the performance results achieved.
Heinrich G, Logemann F, Hahn V, et al., 2012, Audio array processing for telepresence, Hardware/Software Co-design for Heterogeneous Multi-core Platforms: The hArtes Toolchain, Pages: 125-153, ISBN: 9789400714052
© Springer Science+Business Media B.V. 2012. This chapter presents embedded implementations of two audio array processing algorithms for a telepresence application as usage examples of the hArtes tool-chain and platform. The first algorithm, multi-channel wide-band beamforming, may be used to record an acoustic field in a room with an array of microphones, the second one, wave-field synthesis, to render an acoustic field with an array of loudspeakers. While these algorithms have parallelisms and kernel functions typical for their algorithm class, they are chosen to be simple in structure, which makes it easier to follow implementation considerations. Starting from an overview of the application and structure of the algorithms in question, several implementations on different levels of hardware abstraction are presented, along with empirical results on both the design process supported and the processing performance achieved.
Santambrogio MD, Pnevmatikatos D, Papadimitriou K, et al., 2012, Smart Technologies for Effective Reconfiguration: The FASTER approach, 2012 7TH INTERNATIONAL WORKSHOP ON RECONFIGURABLE AND COMMUNICATION-CENTRIC SYSTEMS-ON-CHIP (RECOSOC)
Cardoso JMP, Teixeira J, Alves JC, et al., 2012, Specifying Compiler Strategies for FPGA-based Systems, 20th IEEE International Symposium on Field-Programmable Custom Computing Machines (FCCM), Publisher: IEEE, Pages: 192-199
Shan Y, Wang Z, Wang W, et al., 2012, FPGA based Memory Efficient High Resolution Stereo Vision System for Video Tolling, 11th International Conference on Field-Programmable Technology (FPT), Publisher: IEEE, Pages: 29-32
Papadimitriou K, Pilato C, Pnevmatikatos D, et al., 2012, Novel Design Methods and a Tool Flow for Unleashing Dynamic Reconfiguration, 15th IEEE International Conference on Computational Science and Engineering (CSE) / 10th IEEE/IFIP International Conference on Embedded and Ubiquitous Computing (EUC), Publisher: IEEE, Pages: 391-398, ISSN: 1949-0828
Niu X, Tsoi KH, Luk W, 2012, Self-Adaptive Heterogeneous Cluster with Wireless Network, 26th IEEE International Parallel and Distributed Processing Symposium (IPDPS), Publisher: IEEE, Pages: 306-311, ISSN: 2164-7062
Sato Y, Inoguchi Y, Luk W, et al., 2012, Evaluating Reconfigurable Dataflow Computing Using the Himeno Benchmark, International Conference on Reconfigurable Computing and FPGAs (ReConFig), Publisher: IEEE, ISSN: 2325-6532
Todman T, Boehm P, Luk W, 2012, Verification of streaming hardware and software codesigns, 11th International Conference on Field-Programmable Technology (FPT), Publisher: IEEE, Pages: 147-150
Guo C, Fu H, Luk W, 2012, A Fully-Pipelined Expectation-Maximization Engine for Gaussian Mixture Models, 11th International Conference on Field-Programmable Technology (FPT), Publisher: IEEE, Pages: 182-189
Chow GCT, Luk W, Leong PHW, 2012, A Mixed Precision Methodology for Mathematical Optimisation, 20th IEEE International Symposium on Field-Programmable Custom Computing Machines (FCCM), Publisher: IEEE, Pages: 33-36
Todman T, Luk W, 2012, Reconfigurable design automation by high-level exploration, 23rd IEEE International Conference on Application-Specific Systems, Architectures and Processors (ASAP), Publisher: IEEE, Pages: 185-188, ISSN: 1063-6862
Kurek M, Luk W, 2012, Parametric Reconfigurable Designs with Machine Learning Optimizer, 11th International Conference on Field-Programmable Technology (FPT), Publisher: IEEE, Pages: 109-112
Wang Y, Yan J, Zhou X, et al., 2012, A Partially Reconfigurable Architecture Supporting Hardware Threads, 11th International Conference on Field-Programmable Technology (FPT), Publisher: IEEE, Pages: 269-276
Coutinho JGF, Bhattacharya S, Luk W, et al., 2012, Resource-Efficient Designs using an Aspect-Oriented Approach, 15th IEEE International Conference on Computational Science and Engineering (CSE) / 10th IEEE/IFIP International Conference on Embedded and Ubiquitous Computing (EUC), Publisher: IEEE, Pages: 399-406, ISSN: 1949-0828
Betkaoui B, Wang Y, Thomas DB, et al., 2012, A Reconfigurable Computing Approach for Efficient and Scalable Parallel Graph Exploration, 23rd IEEE International Conference on Application-Specific Systems, Architectures and Processors (ASAP), Publisher: IEEE, Pages: 8-15, ISSN: 1063-6862
Chow GCT, Tse AHT, Jin Q, et al., 2012, A Mixed Precision Monte Carlo Methodology for Reconfigurable Accelerator Systems, 20th ACM/SIGDA International Symposium on Field-Programmable Gate Arrays (FPGA), Publisher: ASSOC COMPUTING MACHINERY, Pages: 57-66
This data is extracted from the Web of Science and reproduced under a licence from Thomson Reuters. You may not copy or re-distribute this data in whole or in part without the written consent of the Science business of Thomson Reuters.