Publications

Cheng C, Bouganis C-S, 2013, ACCELERATING RANDOM FOREST TRAINING PROCESS USING FPGA, 23rd International Conference on Field Programmable Logic and Applications (FPL), Publisher: IEEE, ISSN: 1946-1488

Author Web Link
Cite
Citations: 3

Conference paper

Mingas G, Rahman F, Bouganis C-S, 2013, On Optimizing the Arithmetic Precision of MCMC Algorithms, 21st Annual International IEEE Symposium on Field-Programmable Custom Computing Machines (FCCM), Publisher: IEEE, Pages: 181-188

Author Web Link
Cite
Citations: 5

Conference paper

Duarte RP, Bouganis CS, 2012, High-level linear projection circuit design optimization framework for FPGAs under over-clocking, Pages: 723-726

Frequently, the high-level algorithm parameter selection and its mapping into hardware are considered to be independent processes, often leading to suboptimal solutions. When DSP applications with real-time constraints are targeted, it is often desirable the resulting hardware system to be clocked at as high frequency as possible. Even though the trend in modern devices is to provide a fabric that can support higher frequencies, its variability makes the design tools to be pessimistic about maximum clock frequency estimates. The proposed framework optimizes and mitigates the probabilistic behaviour of digital circuits, by trying to expose the impact of variability of the fabric into high-level algorithmic specifications. FPGAs are well positioned to tackle this problem because they can be reconfigured, allowing an off-line characterization of the specific device before implementing the complete optimized circuit on the same device. Circuits generated by the proposed framework outperform typical implementations, by minimizing area, errors, and maximizing its operating clock frequency. An example of a linear projection circuit, over-clocked by 232%, shows savings up to 39% in hardware resources for the same target PSNR over traditional implementation. © 2012 IEEE.

Abstract
Cite
Citations: 6

Conference paper

Powell A, Bouganis CS, Cheung PYK, 2012, Early performance estimation of image compression methods on soft processors, Pages: 587-590

This paper presents a power and execution time estimation framework for an FPGA-based soft processor when considering the implementation of image compression techniques. Using the proposed framework, a quick power consumption and execution time estimate can be obtained early in the design phase allowing system designers to estimate these performance metrics without the need of implementing the algorithm or generating all possible soft processor architectures. This estimate is performed using both high-level algorithm parameters and soft processor architecture parameters. For system designers this can result in fast design space exploration. The model can predict the execution time of an algorithm with an average of 139% less relative error than predictions using only architecture parameters with the same framework. © 2012 IEEE.

Abstract
Cite
Citations: 3

Conference paper

Sourdis I, Strydis C, Bouganis CS, Falsafi B, Gaydadjiev GN, Malek A, Mariani R, Pnevmatikatos D, Pradhan DK, Rauwerda G, Sunesen K, Tzilis Set al., 2012, The DeSyRe project: On-demand system reliability, Pages: 335-342

The DeSyRe project builds on-demand adaptive and reliable Systems-on-Chips (SoCs). As fabrication technology scales down, chips are becoming less reliable, thereby incurring increased power and performance costs for fault tolerance. To make matters worse, power density is becoming a significant limiting factor in SoC design, in general. In the face of such changes in the technological landscape, current solutions for fault tolerance are expected to introduce excessive overheads in future systems. Moreover, attempting to design and manufacture a totally defect-/fault-free system, would impact heavily, even prohibitively, the design, manufacturing, and testing costs, as well as the system performance and power consumption. In this context, DeSyRe will deliver a new generation of systems that are reliable by design at well-balanced power, performance, and design costs. © 2012 IEEE.

Abstract
Cite
Citations: 1

Conference paper

Papadonikolakis M, Bouganis C, 2012, Novel Cascade FPGA Accelerator for Support Vector Machines Classification, IEEE Transactions on Neural Networks and Learning Systems, Vol: 23, Pages: 1042-1052

Cite

Journal article

Mingas G, Bouganis CS, 2012, Parallel tempering MCMC acceleration using reconfigurable hardware, Pages: 227-238, ISSN: 0302-9743

Markov Chain Monte Carlo (MCMC) is a family of algorithms which is used to draw samples from arbitrary probability distributions in order to estimate - otherwise intractable - integrals. When the distribution is complex, simple MCMC becomes inefficient and advanced variations are employed. This paper proposes a novel FPGA architecture to accelerate Parallel Tempering, a computationally expensive, popular MCMC method, which is designed to sample from multimodal distributions. The proposed architecture can be used to sample from any distribution. Moreover, the work demonstrates that MCMC is robust to reductions in the arithmetic precision used to evaluate the sampling distribution and this robustness is exploited to improve the FPGA's performance. A 1072x speedup compared to software and a 3.84x speedup compared to a GPGPU implementation are achieved when performing Bayesian inference for a mixture model without any compromise on the quality of results, opening the way for the handling of previously intractable problems. © 2012 Springer-Verlag.

Abstract
Cite
Citations: 13

Conference paper

Mingas G, Bouganis C-S, 2012, A Custom Precision Based Architecture for Accelerating Parallel Tempering MCMC on FPGAs Without Introducing Sampling Error, 20th IEEE International Symposium on Field-Programmable Custom Computing Machines (FCCM), Publisher: IEEE, Pages: 153-156

Author Web Link
Cite
Citations: 11

Conference paper

Cheng C, Bouganis CS, 2011, An FPGA-based object detector with dynamic workload balancing

In recent years, object detection has been more frequently integrated with other vision processing functions, acting for acquisition of region of interest and is widely adopted in portable devices such as digital camera capable for automatic focusing on faces. In applications targeting those devices, limitations in both hardware resources and power supply mean an efficient utilization of hardware resource is of significance. In this paper a novel hardware architecture for Viola and Jones object detectior is proposed. The novel feature of the architecture is that it features a mechanism of dynamic workload balancing, which adaptively re-distributes the workload among available processing units, thus achieving highly efficient utilization of hardware resource. The obtained results demonstrate that the proposed system can achieve high utilisation of the dedicated resources leading to high performance over resource ratio. © 2011 IEEE.

Abstract
Cite
Citations: 2

Conference paper

Tian X, Bouganis CS, 2011, A run-time adaptive FPGA architecture for Monte Carlo simulations, Pages: 116-122

Field Programmable Gate Arrays (FPGAs) are now considered to be one of the preferred computing platforms for high performance computing applications, such as Monte Carlo simulations, due to their large computational power and low power consumption. Unlike other state-of-the-art computing platforms, such as General Purpose Processors (GPPs) and General Purpose Graphics Processing Units (GPGPU), FPGAs can moreover exploit the applications' requirements with respect to the employed number representation scheme, with the potential to lead to considerable area savings and throughput increases. This work proposes a novel FPGA-based architecture for Monte Carlo simulations that monitors and configures the number representation of the system during run-time in order to accommodate the dynamics of the system under investigation, resulting to a considerable boost on the overall performance of the system compared to a conventional system. In order to evaluate the efficacy of the proposed architecture, the GARCH model from the financial industry is considered as a case study. The results demonstrate that an average of ∼1.35x throughput per resource unit improvement is achieved compared to conventional parallel arithmetic implementation. © 2011 IEEE.

Abstract
Cite
Citations: 8

Conference paper

Angelopoulou M, Bouganis C-S, Cheung PYK, 2011, Blur Identification with Assumption Validation for Sensor-based Video Reconstruction and its Implementation on FPGA, IET Computers & Digital Techniques

Cite

Journal article

Angelopoulou ME, Bouganis C-S, 2011, FEATURE SELECTION WITH GEOMETRIC CONSTRAINTS FOR VISION-BASED UNMANNED AERIAL VEHICLE NAVIGATION, 18th IEEE International Conference on Image Processing (ICIP), Publisher: IEEE, ISSN: 1522-4880

Author Web Link
Cite
Citations: 1

Conference paper

Jones DH, Powell A, Bouganis CS, Cheung PYKet al., 2010, GPU versus FPGA for high productivity computing, Pages: 119-124

Heterogeneous or co-processor architectures are becoming an important component of high productivity computing systems (HPCS). In this work the performance of a GPU based HPCS is compared with the performance of a commercially available FPGA based HPC. Contrary to previous approaches that focussed on specific examples, a broader analysis is performed by considering processes at an architectural level. A set of benchmarks is employed that use different process architectures in order to exploit the benefits of each technology. These include the asynchronous pipelines common to "map" tasks, a partially synchronous tree common to "reduce" tasks and a fully synchronous, fully connected mesh. We show that the GPU is more productive than the FPGA architecture for most of the benchmarks and conclude that FPGA-based HPCS is being marginalised by GPUs. © 2010 IEEE.

Abstract
Cite
Citations: 46

Conference paper

Saiprasert C, Bouganis C-S, Constantinides GA, 2010, Mapping Multiple Multivariate Gaussian Random Number Generators on an FPGA, Proc. Field-Programmable Logic 2010

Cite

Conference paper

Papadonikolakis M, Bouganis CS, 2010, A novel FPGA-based SVM classifier, Pages: 283-286

Support Vector Machines (SVMs) are a powerful supervised learning tool, providing state-of-the-art accuracy at a cost of high computational complexity. The SVM classification suffers from linear dependencies on the number of the Support Vectors and the problem's dimensionality. In this work, we propose a scalable FPGA architecture for the acceleration of SVM classification, which exploits the device heterogeneity and the dynamic range diversities among the dataset attributes. Furthermore, this work introduces the first FPGA-oriented cascade SVM classifier scheme, which intensifies the custom-arithmetic properties of the heterogeneous architecture and boosts the classification performance even more. The implementation results demonstrate the efficiency of the heterogeneous architecture, presenting a speed-up factor of 2-3 orders of magnitude, compared to the CPU implementation, while outperforming other proposed FPGA and GPU approaches by more than 7 times. © 2010 IEEE.

Abstract
Cite
Citations: 43

Conference paper

Saiprasert C, Bouganis C, Constantinides GA, 2010, An Optimized Hardware Architecture of a Multivariate Gaussian Random Number Generator, ACM Transactions on Reconfigurable Technology and Systems, Vol: 4, ISSN: 1936-7414

Monte Carlo simulation is one of the most widely used techniques for computationally intensive simulations in mathematical analysis and modeling. A multivariate Gaussian random number generator is one of the main building blocks of such a system. Field Programmable Gate Arrays (FPGAs) are gaining increased popularity as an alternative means to the traditional general purpose processors targeting the acceleration of the computationally expensive random number generator block. This article presents a novel approach for mapping a multivariate Gaussian random number generator onto an FPGA by optimizing the computational path in terms of hardware resource usage subject to an acceptable error in the approximation of the distribution of interest. The proposed approach is based on the eigenvalue decomposition algorithm which leads to a design with different precision requirements in the computational paths. An analysis on the impact of the error due to truncation/rounding operation along the computational path is performed and an analytical expression of the error inserted into the system is presented. Based on the error analysis, three algorithms that optimize the resource utilization and at the same time minimize the error in the output of the system are presented and compared. Experimental results reveal that the hardware resource usage on an FPGA as well as the error in the approximation of the distribution of interest are significantly reduced by the use of the optimization techniques introduced in the proposed approach.

Journal article

Bouganis C, Pournara I, Cheung PYK, 2010, Exploration of Heterogeneous FPGAs for Mapping Linear Projection Designs, IEEE Transactions on Very Large Scale Integration (VLSI) Systems, Vol: 18

Journal article

Jones DH, Powell A, Bouganis C-S, Cheung PYKet al., 2010, A Salient Region Detector for GPU Using a Cellular Automata Architecture, 17th International Conference on Neural Information Processing, Publisher: SPRINGER-VERLAG BERLIN, Pages: 501-508, ISSN: 0302-9743

Author Web Link
Cite
Citations: 1

Conference paper

Papadonikolakis M, Bouganis C-S, 2010, A Heterogeneous FPGA Architecture for Support Vector Machine Training, 18th IEEE Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM), Publisher: IEEE COMPUTER SOC, Pages: 211-214

Author Web Link
Cite
Citations: 15

Conference paper

Saiprasert C, Bouganis C, Constantinides GA, 2010, Design of a Financial Application Driven Multivariate Gaussian Random Number Generator for an FPGA, Pages: 182-193

Cite

Conference paper

Bailey D, Bouganis C, 2009, Implementation of a Foveal Vision Mapping

Cite

Conference paper

Angelopoulou M, Bouganis CS, Cheung PYK, Constantinides GAet al., 2009, Robust Real-Time Super-Resolution on FPGA and an Application to Video Enhancement, ACM Transactions on Reconfigurable Technology and Systems (TRETS), Vol: 2

The high density image sensors of state-of-the-art imaging systems provide outputs with high spatial resolution, but require long exposure times. This limits their applicability, due to the motion blur effect. Recent technological advances have lead to adaptive image sensors that can combine several pixels together in real time to form a larger pixel. Larger pixels require shorter exposure times and produce high-frame-rate samples with reduced motion blur. This work proposes combining an FPGA with an adaptive image sensor to produce an output of high resolution both in space and time. The FPGA is responsible for the spatial resolution enhancement of the high-frame-rate samples using super-resolution (SR) techniques in real time. To achieve it, this article proposes utilizing the Iterative Back Projection (IBP) SR algorithm. The original IBP method is modified to account for the presence of noise, leading to an algorithm more robust to noise. An FPGA implementation of this algorithm is presented. The proposed architecture can serve as a general purpose real-time resolution enhancement system, and its performance is evaluated under various noise levels.

Journal article

M Angelopoulou CB, Cheung PYK, 2009, A sensor-based approach to linear blur identification for real-time video enhancement

Cite

Conference paper

Liu Y, Bouganis CS, Cheung PYK, 2009, Hardware architectures for eigenvalue computation of real symmetric matrices, IET Proceeding on Computers & Digital Techniques, Vol: 3, Pages: 72-84

Computation of eigenvalues is essential in many applications in the fields of science and engineering. When the application of interest requires the computation of eigenvalues of high throughput or real-time performance, a hardware implementation of an eigenvalue computation block is often employed. The problem of eigenvalue computation of real symmetric matrices is focused upon. For the general case of a symmetric matrix eigenvalue problem, the approximate Jacobi method is proposed, where for the special case of a 3times3 symmetric matrix, an algebraic-based method is introduced. The proposed methods are compared with various other approaches reported in the literature. Results obtained by mapping the above architectures on a field programmable gate array device illustrate the advantages of the proposed methods over the existing ones.

Abstract
Cite

Journal article

Bouganis CS, Park SB, Constantinides GA, Cheung PYKet al., 2009, Synthesis and Optimization of 2D Filter Designs for Heterogeneous FPGAs, ACM Transactions on Reconfigurable Technology and Systems (TRETS), Vol: 1, ISSN: 1936-7406

Many image processing applications require fast convolution of an image with one or more 2D filters. Field-Programmable Gate Arrays (FPGAs) are often used to achieve this goal due to their fine grain parallelism and reconfigurability. However, the heterogeneous nature of modern reconfigurable devices is not usually considered during design optimization. This article proposes an algorithm that explores the space of possible implementation architectures of 2D filters, targeting the minimization of the required area, by optimizing the usage of the different components in a heterogeneous device. This is achieved by exploring the heterogeneous nature of modern reconfigurable devices using a Singular Value Decomposition based algorithm, which provides an efficient mapping of filter's implementation requirements to the heterogeneous components of modern FPGAs. In the case of multiple 2D filters, the proposed algorithm also exploits any redundancy that exists within each filter and between different filters in the set, leading to designs with minimized area. Experiments with real filter sets from computer vision applications demonstrate an average of up to 38% reduction in the required area.

Journal article

M Papadonikolakis CB, Constantinides GA, 2009, Performance Comparison of GPU and FPGA Architectures for the SVM Training Problem, Pages: 388-391

Cite

Conference paper

Bailey DG, Bouganis C-S, 2009, Tracking Performance of a Foveated Vision System, 4th International Conference on Autonomous Robots and Agents, Publisher: IEEE, Pages: 675-+

Conference paper

Saiprasert C, Bouganis C, Constantinides GA, 2009, Word-length Optimization and Error Analysis of a Multivariate Gaussian Random Number Generator, Pages: 231-242

Cite

Conference paper

Bailey D, Bouganis C, 2009, Vision sensor with an active digital fovea, Recent Advances in Sensing Technology: LNEE, Vol: 49, Pages: 91-111

Cite

Journal article

Woods R, Compton K, Bouganis C, Diniz PCet al., 2008, Lecture Notes in Computer Science: Preface, Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), Vol: 4943 LNCS, ISSN: 0302-9743

Cite

Journal article

ProfessorChristos-SavvasBouganis

Contact

Location

Summary