Publications

Duarte RP, Bouganis C-S, 2016, Variation-Aware Optimisation for Reconfigurable Cyber-Physical Systems., 7th IFIP WG 5.5/SOCOLNET Advanced Doctoral Conference on Computing, Electrical and Industrial Systems, DoCEIS 2016, Publisher: Springer, Pages: 237-252

Cyber-Physical Systems are present in many industries such as aerospace, automotive, health-care and transportation, and over time they have become critical and require high levels of resiliency and fault tolerance. Often they are implemented on reconfigurable logic due to IP design reutilisation, high-performance, and low-cost. Nevertheless, the continuous technology shrinking and the increasing demand for systems that operate under different power profiles with high-performance has led to implementations operating below the maximum performance offered by a particular technology. Design tools are conservative in the estimation of the maximum performance that can be achieved by a design when placed on a device, accounting for any variability in the fabrication process of the device. This work takes a new view on the performance improvement of circuit designs by pushing them into the error prone regime, as defined by the synthesis tools, and by investigating methodologies that reduce the impact of timing errors at the output of the system. In this work two novel error reduction techniques are proposed to address this problem. One is based on reduced-precision redundancy and the other on an error optimisation framework that uses information from a prior characterisation of the device. Both of these methods allow to achieve graceful degradation in performance whilst variation increases.

Abstract
Cite

Conference paper

Dasu A, Bouganis C, Gorgon M, Bonato Vet al., 2016, Preface, Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), Vol: 9625, Pages: V-VI, ISSN: 0302-9743

Cite

Journal article

Duarte RP, Bouganis C-S, 2015, ARC 2014 over-clocking KLT designs on FPGAs under process, voltage, and temperature variation, ACM Transactions on Reconfigurable Technology and Systems, Vol: 9, ISSN: 1936-7414

Karhunen-Loeve Transformation is a widely used algorithm in signal processing that often implemented with high-throughput requisites. This work presents a novel methodology to optimise KLT designs on FPGAs that outperform typical design methodologies, through a prior characterisation of the arithmetic units in the datapath of the circuit under various operating conditions. Limited by the ever-increasing process variation, the delay models available in synthesis tools are no longer suitable for extreme performance optimisation of designs, and as they are generic, they need to consider the worst-case performance for a given fabrication process. Hence, they heavily penalise the maximum possible achieved performance of a design by leaving safety margin. This work presents a novel unified optimisation framework which contemplates a prior characterisation of the embedded multipliers on the target FPGA device under process, voltage, and temperature variation. The proposed framework allows a design space exploration leading to designs without any latency overheads that achieve high throughput while producing less errors than typical methodologies, operating with the same throughput. Experimental results demonstrate that the proposed methodology outperforms the typical implementation in three real-life design strategies: high performance, low power, and temperature variation; and it produced circuit designs that performed up to 18dB better when over-clocked.

Journal article

Scicluna N, Bouganis C-S, 2015, ARC 2014: a multidimensional FPGA-based parallel DBSCAN architecture, ACM Transactions on Reconfigurable Technology and Systems, Vol: 9, ISSN: 1936-7414

Clustering large numbers of data points is a very computationally demanding task that often needs to be accelerated in order to be useful in practical applications. This work focuses on the Density-Based Spatial Clustering of Applications with Noise (DBSCAN) algorithm, which is one of the state-of-the-art clustering algorithms, and targets its acceleration using an FPGA device. The article presents an optimized, scalable, and parameterizable architecture that takes advantage of the internal memory structure of modern FPGAs in order to deliver a high-performance clustering system. Post-synthesis simulation results show that the developed system can obtain mean speedups of 31× in real-world tests and 202× in synthetic tests when compared to state-of-the-art software counterparts running on a quad-core 3.4GHz Intel i7-2600k. Additionally, this implementation is also capable of clustering data with any number of dimensions without impacting the performance.

Journal article

Mingas G, Bouganis C-S, 2015, Population-Based MCMC on Multi-Core CPUs, GPUs and FPGAs, IEEE Transactions on Computers, Vol: 65, Pages: 1283-1296, ISSN: 0018-9340

Markov Chain Monte Carlo (MCMC) is a method to draw samples from a given probability distribution. Its frequent use for solving probabilistic inference problems, where big-scale data are repeatedly processed, means that MCMC runtimes can be unacceptably large. This paper focuses on population-based MCMC, a popular family of computationally intensive MCMC samplers; we propose novel, highly optimized accelerators in three parallel hardware platforms (multi-core CPUs, GPUs and FPGAs), in order to address the performance limitations of sequential software implementations. For each platform, we jointly exploit the nature of the underlying hardware and the special characteristics of population-based MCMC. We focus particularly on the use of custom arithmetic precision, introducing two novel methods which employ custom precision in the largest part of the algorithm in order to reduce runtime, without causing sampling errors. We apply these methods to all platforms. The FPGA accelerators are up to 114x faster than multi-core CPUs and up to 53x faster than GPUs when doing inference on mixture models.

Journal article

Kyrkou C, Bouganis C-S, Theocharides T, Polycarpou MMet al., 2015, Embedded Hardware-Efficient Real-Time Classification With Cascade Support Vector Machines, IEEE Transactions on Neural Networks and Learning Systems, Vol: 27, Pages: 99-112, ISSN: 2162-2388

Cascade support vector machines (SVMs) are optimized to efficiently handle problems, where the majority of the data belong to one of the two classes, such as image object classification, and hence can provide speedups over monolithic (single) SVM classifiers. However, SVM classification is a computationally demanding task and existing hardware architectures for SVMs only consider monolithic classifiers. This paper proposes the acceleration of cascade SVMs through a hybrid processing hardware architecture optimized for the cascade SVM classification flow, accompanied by a method to reduce the required hardware resources for its implementation, and a method to improve the classification speed utilizing cascade information to further discard data samples. The proposed SVM cascade architecture is implemented on a Spartan-6 field-programmable gate array (FPGA) platform and evaluated for object detection on 800 × 600 (Super Video Graphics Array) resolution images. The proposed architecture, boosted by a neural network that processes cascade information, achieves a real-time processing rate of 40 frames/s for the benchmark face detection application. Furthermore, the hardware-reduction method results in the utilization of 25% less FPGA custom-logic resources and 20% peak power reduction compared with a baseline implementation.

Journal article

Bin Rabieah M, Bouganis C-S, 2015, FPGA Based Nonlinear Support Vector Machine Training Using an Ensemble Learning, 25th International Conference on Field Programmable Logic and Applications, Publisher: IEEE, ISSN: 1946-1488

Author Web Link
Cite
Citations: 1

Conference paper

Venieris SI, Mingas G, Bouganis C-S, 2015, Towards Heterogeneous Solvers for Large-Scale Linear Systems, 25th International Conference on Field Programmable Logic and Applications, Publisher: IEEE, ISSN: 1946-1488

Conference paper

Jin Y, Bouganis C-S, 2015, Robust Multi-Image Based Blind Face Hallucination, IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Publisher: IEEE, Pages: 5252-5260, ISSN: 1063-6919

Author Web Link
Cite
Citations: 17

Conference paper

Liu S, Mingas G, Bouganis C-S, 2015, An Exact MCMC Accelerator Under Custom Precision Regimes, International Conference on Field Programmable Technology (FTP), Publisher: IEEE, Pages: 120-127

Author Web Link
Cite
Citations: 6

Conference paper

Angelopoulou ME, Bouganis C-S, 2014, Vision-Based Egomotion Estimation on FPGA for Unmanned Aerial Vehicle Navigation, IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, Vol: 24, Pages: 1070-1083, ISSN: 1051-8215

Author Web Link
Cite
Citations: 14

Journal article

Sourdis I, Strydis C, Armato A, Bouganis CS, Falsafi B, Gaydadjiev GN, Isaza S, Malek A, Mariani R, Pagliarini S, Pnevmatikatos DN, Pradhan DK, Rauwerda G, Seepers RM, Shafik RA, Smaragdos G, Theodoropoulos D, Tzilis S, Vavouras Met al., 2014, DeSyRe: On-demand adaptive and reconfigurable fault-tolerant SoCs, Pages: 312-317, ISSN: 0302-9743

The DeSyRe project builds on-demand adaptive, reliable Systems-on-Chips. In response to the current semiconductor technology trends thatmake chips becoming less reliable, DeSyRe describes a newgeneration of by design reliable systems, at a reduced power and performance cost. This is achieved through the following main contributions. DeSyRe defines a fault-tolerant system architecture built out of unreliable components, rather than aiming at totally fault-free and hence more costly chips. In addition, DeSyRe systems are on-demand adaptive to various types and densities of faults, as well as to other system constraints and application requirements. For leveraging on-demand adaptation/customization and reliability at reduced cost, a new dynamically reconfigurable substrate is designed and combined with runtime system software support. The above define a generic and repeatable design framework, which is applied to two medical SoCs with high reliability constraints and diverse performance and power requirements. One of the main goals of the DeSyRe project is to increase the availability of SoC components in the presence of permanents faults, caused at manufacturing time or due to device aging. A mix of coarse- and fine-grain reconfigurable hardware substrate is designed to isolate and bypass faulty component parts. The flexibility provided by the DeSyRe reconfigurable substrate is exploited at runtime by system optimization heuristics,which decide tomodify component configurationwhen a permanent fault is detected, providing graceful degradation. © 2014 Springer International Publishing Switzerland.

Abstract
Cite

Conference paper

Scicluna N, Bouganis CS, 2014, FPGA-based parallel DBSCAN architecture, Pages: 1-12, ISSN: 0302-9743

Clustering of a large number of data points is a computational demanding task that often needs the be accelerated in order to be useful in practice. The focus of this work is on the Density-Based Spatial Clustering of Applications with Noise (DBSCAN) algorithm, which is one of the state-of-the-art clustering algorithms, targeting its acceleration using an FPGA device. The paper presents a novel, optimised and scalable architecture that takes advantage of the internal memory structure of modern FPGAs in order to deliver a high performance clustering system. Results show that the developed system can obtain average speed-ups of 32x in real-world tests and 202x in synthetic tests when compared to state-of-the-art software counterparts. © 2014 Springer International Publishing Switzerland.

Abstract
Cite
Citations: 5

Conference paper

Duarte RP, Bouganis CS, 2014, A Unified framework for over-clocking linear projections on FPGAs under PVT variation, Pages: 49-60, ISSN: 0302-9743

Linear Projection is a widely used algorithm often implemented with high throughput requisites. This work presents a novel methodology to optimise Linear Projection designs that outperform typical design methodologies through a prior characterisation of the arithmetic units in the data path of the circuit under various operating conditions. Limited by the ever increasing process variation, the delay models available in synthesis tools are no longer suitable for performance optimisation of designs, as they are generic and only take into account the worst case variation for a given fabrication process. Hence, they heavily penalise the optimisation strategy of a design by leaving a gap in performance. This work presents a novel unified optimisation framework which contemplates a prior characterisation of the embedded multipliers on the target device under PVT variation. The proposed framework creates designs that achieve high throughput while producing less errors than typical methodologies. The results of a case study reveal that the proposed methodology outperforms the typical implementation in 3 real-life design strategies: high performance, low power and temperature variation. The proposed methodology produced Linear Projection designs that were able to perform up to 18 dB better than the reference methodology. © 2014 Springer International Publishing Switzerland.

Abstract
Cite
Citations: 5

Conference paper

Sourdis I, Strydis C, Armato A, Bouganis CS, Falsafi B, Gaydadjiev GN, Isaza S, Malek A, Mariani R, Pagliarini S, Pnevmatikatos DN, Pradhan DK, Rauwerda G, Seepers RM, Shafik RA, Smaragdos G, Theodoropoulos D, Tzilis S, Vavouras Met al., 2014, DeSyRe: On-demand adaptive and reconfigurable fault-tolerant SoCs, Pages: 312-317, ISSN: 0302-9743

The DeSyRe project builds on-demand adaptive, reliable Systems-on-Chips. In response to the current semiconductor technology trends thatmake chips becoming less reliable, DeSyRe describes a newgeneration of by design reliable systems, at a reduced power and performance cost. This is achieved through the following main contributions. DeSyRe defines a fault-tolerant system architecture built out of unreliable components, rather than aiming at totally fault-free and hence more costly chips. In addition, DeSyRe systems are on-demand adaptive to various types and densities of faults, as well as to other system constraints and application requirements. For leveraging on-demand adaptation/customization and reliability at reduced cost, a new dynamically reconfigurable substrate is designed and combined with runtime system software support. The above define a generic and repeatable design framework, which is applied to two medical SoCs with high reliability constraints and diverse performance and power requirements. One of the main goals of the DeSyRe project is to increase the availability of SoC components in the presence of permanents faults, caused at manufacturing time or due to device aging. A mix of coarse- and fine-grain reconfigurable hardware substrate is designed to isolate and bypass faulty component parts. The flexibility provided by the DeSyRe reconfigurable substrate is exploited at runtime by system optimization heuristics,which decide tomodify component configurationwhen a permanent fault is detected, providing graceful degradation. © 2014 Springer International Publishing Switzerland.

Abstract
Cite
Citations: 1

Conference paper

Liu J, Bouganis C, Cheung PYK, 2014, Kernel-based Adaptive Image Sampling, 9th International Conference on Computer Vision Theory and Applications (VISAPP), Publisher: IEEE, Pages: 25-32

Author Web Link
Cite
Citations: 2

Conference paper

Duarte RP, Bouganis C-S, 2014, Zero-Latency Datapath Error Correction Framework for Over-Clocking DSP Applications on FPGAs, 2014 International Conference on Reconfigurable Computing and FAGAs, Publisher: IEEE, ISSN: 2325-6532

Conference paper

Duarte RP, Bouganis C-S, 2014, Over-Clocking of Linear Projection Designs Through Device Specific Optimisations, 28th IEEE International Parallel & Distributed Processing Symposium Workshops (IPDPSW), Publisher: IEEE, Pages: 189-198

Conference paper

Liu S, Mingas G, Bouganis C-S, 2014, Parallel Resampling for Particle Filters on FPGAs, International Conference on Field Programmable Technology, Publisher: IEEE, Pages: 191-198

Author Web Link
Cite
Citations: 15

Conference paper

Liu J, Bouganis C, Cheung PYK, 2014, Image Progressive Acquisition for Hardware Systems, Design, Automation and Test in Europe Conference and Exhibition (DATE), Publisher: IEEE, ISSN: 1530-1591

Conference paper

Cheng C, Bouganis C-S, 2014, Memory Optimisation for Hardware Induction of Axis-parallel Decision Tree, 2014 International Conference on Reconfigurable Computing and FAGAs, Publisher: IEEE, ISSN: 2325-6532

Conference paper

Duarte RP, Bouganis C-S, 2014, Pushing the performance boundary of linear projection designs through device specific optimisations (abstract only)., Publisher: ACM, Pages: 245-245

Conference paper

Sourdis I, Strydis C, Armato A, Bouganis CS, Falsafi B, Gaydadjiev GN, Isaza S, Malek A, Mariani R, Pnevmatikatos D, Pradhan DK, Rauwerda G, Seepers RM, Shafik RA, Sunesen K, Theodoropoulos D, Tzilis S, Vavouras Met al., 2013, DeSyRe: On-demand system reliability, MICROPROCESSORS AND MICROSYSTEMS, Vol: 37, Pages: 981-1001, ISSN: 0141-9331

Author Web Link
Cite
Citations: 10

Journal article

Powell A, Savvas-Bouganis C, Cheung PYK, 2013, High-level power and performance estimation of FPGA-based soft processors and its application to design space exploration, JOURNAL OF SYSTEMS ARCHITECTURE, Vol: 59, Pages: 1144-1156, ISSN: 1383-7621

Author Web Link
Cite
Citations: 10

Journal article

Kyrkou C, Theocharides T, Bouganis CS, 2013, A hardware-efficient architecture for embedded real-time cascaded support vector machines classification, Pages: 341-342

This work presents an optimized architecture for cascaded SVM processing, along with a hardware reduction method for the implementation of the additional stages in the cascade, leading to significant improvements. The architecture was implemented on a Virtex 5 FPGA platform and evaluated using face detection as the target application on 640x480 resolution images. Additionally, it was compared against implementations of the same cascade processing architecture but without using the reduction method, and a single parallel SVM classifier. The proposed architecture achieves an average performance of 70 frames-per-second, demonstrating a speed-up of 5x over the single parallel SVM classifier. Furthermore, the hardware reduction method results in the utilization of 43% less hardware resources, with only 0.7% reduction in classification accuracy. © 2013 Authors.

Abstract
Cite
Citations: 5

Conference paper

Sourdis I, Bouganis CS, Pericas M, 2013, Guest editorial: Workshop on Reconfigurable Computing, JOURNAL OF SYSTEMS ARCHITECTURE, Vol: 59, Pages: 77-77, ISSN: 1383-7621

Journal article

Kyrkou C, Theocharides T, Bouganis C-S, 2013, An Embedded Hardware-Efficient Architecture for Real-Time Cascade Support Vector Machine Classification, 13th International Conference on Embedded Computer Systems - Architectures, Modeling and Simulation (IC-SAMOS), Publisher: IEEE, Pages: 129-136

Author Web Link
Cite
Citations: 10

Conference paper

Kyrkou C, Bouganis C-S, Theocharides T, 2013, FPGA-based acceleration of cascaded support vector machines for embedded applications (abstract only)., Publisher: ACM, Pages: 267-267

Conference paper

Jin Y, Bouganis C, 2013, FACE HALLUCINATION REVISITED: A JOINT FRAMEWORK, 20th IEEE International Conference on Image Processing (ICIP), Publisher: IEEE, Pages: 981-985, ISSN: 1522-4880

Author Web Link
Cite
Citations: 2

Conference paper

Liu J, Bouganis C, Cheung PK, 2013, Domain-specific Progressive Sampling of Face Images, 1st IEEE Global Conference on Signal and Information Processing (GlobalSIP), Publisher: IEEE, Pages: 1021-1024, ISSN: 2376-4066

Conference paper

ProfessorChristos-SavvasBouganis

Contact

Location

Summary