Publications

Davis JJ, Hung E, Levine JM, Stott EA, Cheung PYK, Constantinides GAet al., 2018, KAPow: High-accuracy, Low-overhead Online Per-module Power Estimation for FPGA Designs, ACM Transactions on Reconfigurable Technology and Systems, Vol: 11, Pages: 2:1-2:22, ISSN: 1936-7406

In an FPGA system-on-chip design, it is often insufficient to merely assess the power consumption of the entire circuit by compile-time estimation or runtime power measurement. Instead, to make better decisions, one must understand the power consumed by each module in the system. In this work, we combine measurements of register-level switching activity and system-level power to build an adaptive online model that produces live breakdowns of power consumption within the design. Online model refinement avoids time-consuming characterisation while also allowing the model to track long-term operating condition changes. Central to our method is an automated flow that selects signals predicted to be indicative of high power consumption, instrumenting them for monitoring. We named this technique KAPow, for 'K'ounting Activity for Power estimation, which we show to be accurate and to have low overheads across a range of representative benchmarks. We also propose a strategy allowing for the identification and subsequent elimination of counters found to be of low significance at runtime, reducing algorithmic complexity without sacrificing significant accuracy. Finally, we demonstrate an application example in which a module-level power breakdown can be used to determine an efficient mapping of tasks to modules and reduce system-wide power consumption by up to 7%.

Journal article

Davis J, Levine J, Stott E, Hung E, Cheung P, Constantinides GAet al., 2017, STRIPE: Signal Selection for Runtime Power Estimation, International Confererence on Field-programmable Logic and Applications (FPL) 2017, Publisher: IEEE

Knowledge of power consumption at a subsystem level can facilitate adaptive energy-saving techniques such as power gating, runtime task mapping and dynamic voltage and/or frequency scaling. While we have the ability to attribute power to an arbitrary hardware system's modules in real time, the selection of the particular signals to monitor for the purpose of power estimation within any given module has yet to be treated as a primary concern. In this paper, we show how the automatic analysis of circuit structure and behaviour inferred through vectored simulation can be used to produce high-quality rankings of signals' importance, with the resulting selections able to achieve lower power estimation error than those of prior work coupled with decreases in area, power and modelling complexity. In particular, by monitoring just eight signals per module (~0.3% of the total) across the 15 we examined, we demonstrate how to achieve runtime module-level estimation errors 1.5--6.9x lower than when reliant on the signal selections made in accordance with a more straightforward, previously published metric.

Conference paper

Davis JJ, Levine JM, Stott EA, Hung E, Cheung PYK, Constantinides GAet al., 2017, KOCL: Power Self-awareness for Arbitrary FPGA-SoC-accelerated OpenCL Applications, IEEE Design and Test, Vol: 34, Pages: 36-45, ISSN: 2168-2356

Given the need for developers to rapidly produce complex, high-performance and energy-efficient hardware systems, methods facilitating their intelligent runtime management are of ever-increasing importance. For energy optimization, such control decisions require knowledge of power usage at subsystem granularity. This information must be made accessible to developers now accustomed to creating systems from high-level descriptions, such as those written in OpenCL. To address these challenges, we introduce KOCL, a tool allowing OpenCL developers targeting FPGA-SoC devices to query live kernel-level power consumption using function calls embedded in their host code. KOCL is open-source, available online at https://github.com/PRiME-project/KOCL. To maximize accessibility, its use necessitates zero exposure to hardware.

Journal article

Hung E, Davis JJ, Levine JM, Stott EA, Cheung PYK, Constantinides GAet al., 2016, KAPow: A System Identification Approach to Online Per-Module Power Estimation in FPGA Designs, IEEE Symposium on Field-programmable Custom Computing Machines (FCCM) 2016, Publisher: IEEE, Pages: 56-63

In a modern FPGA system-on-chip design, it is often insufficient to simply assess the total power consumption of the entire circuit by design-time estimation or runtime power rail measurement. Instead, to make better runtime decisions, it is desirable to understand the power consumed by each individual module in the system. In this work, we combine board-level power measurements with register-level activity counting to build an online model that produces a breakdown of power consumption within the design. Online model refinement avoids the need for a time-consuming characterisation stage and also allows the model to track long-term changes to operating conditions. Our flow is named KAPow, a (loose) acronym for 'K'ounting Activity for Power estimation, which we show to be accurate, with per-module power estimates as close to +/-5mW of true measurements, and to have low overheads. We also demonstrate an application example in which a per-module power breakdown can be used to determine an efficient mapping of tasks to modules and reduce system-wide power consumption by over 8%.

Conference paper

Davis JJ, Hung E, Levine J, Stott E, Cheung PYK, Constantinides GAet al., 2016, Knowledge is Power: Module-level Sensing for Runtime Optimisation, ACM/SIGDA International Symposium on Field-programmable Gate Arrays (FPGA) 2016, Publisher: ACM, Pages: 276-276

We propose the compile-time instrumentation of coexisting modules---IP blocks, accelerators, etc.---implemented in FPGAs. The efficient mapping of tasks to execution units can then be achieved, for power and/or timing performance, by tracking dynamic power consumption and/or timing slack online at module-level granularity. Our proposed instrumentation is transparent, thereby not affecting circuit functionality. Power and timing overheads have proven to be small and tend to be outweighed by the exposed runtime benefits.

Conference paper

Yang S, Shafik R, Merrett G, Stott E, Levine J, Davis JJ, Al-Hashimi Bet al., 2015, Adaptive Energy Minimization of Embedded Heterogeneous Systems using Regression-based Learning, International Workshop on Power and Timing Modeling, Optimization and Simulation (PATMOS), Publisher: IEEE, Pages: 103-110

Modern embedded systems consist of heterogeneous computing resources with diverse energy and performance trade-offs. This is because these resources exercise the application tasks differently, generating varying workloads and energy consumption. As a result, minimizing energy consumption in these systems is challenging as continuous adaptation between application task mapping (i.e. allocating tasks among the computing resources) and dynamic voltage/frequency scaling (DVFS) is required. Existing approaches have limitations due to lack of such adaptation with practical validation (Table I). This paper addresses such limitation and proposes a novel adaptive energy minimization approach for embedded heterogeneous systems. Fundamental to this approach is a runtime model, generated through regression-based learning of energy/performance trade-offs between different computing resources in the system. Using this model, an application task is suitably mapped on a computing resource during runtime, ensuring minimum energy consumption for a given application performance requirement. Such mapping is also coupled with a DVFS control to adapt to performance and workload variations. The proposed approach is designed, engineered and validated on a Zynq-ZC702 platform, consisting of CPU, DSP and FPGA cores. Using several image processing applications as case studies, it was demonstrated that our proposed approach can achieve significant energy savings (>70%), when compared to the existing approaches.

Conference paper

Hung E, Levine J, Stott E, Constantinides G, Luk Wet al., 2015, Delay-Bounded Routing for Shadow Registers, 23rd ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, Publisher: Association for Computing Machinery., Pages: 56-65

The on-chip timing behaviour of synchronous circuits can be quantified at run-time by adding shadow registers, which allow designers to sample the most critical paths of a circuit at a different point in time than the user register would normally. In order to sample these paths precisely, the path skew between the user and the shadow register must be tightly controlled and consistent across all paths that are shadowed. Unlike a custom IC, FPGAs contain prefabricated resources from which composing an arbitrary routing delay is not trivial. This paper presents a method for inserting shadow registers with a minimum skew bound, whilst also reducing the maximum skew. To preserve circuit timing, we apply this to FPGA circuits post place-and-route, using only the spare resources left behind. We find that our techniques can achieve an average STA reported delay bound of ± 200ps on a Xilinx device despite incomplete timing information, and achieve <1ps accuracy against our own delay model.

Conference paper

Shi K, Boland D, Stott E, Bayliss S, Constantinides GAet al., 2014, Datapath Synthesis for Overclocking: Online Arithmetic for Latency-Accuracy Trade-offs, 51st ACM/EDAC/IEEE Design Automation Conference (DAC), Publisher: IEEE, ISSN: 0738-100X

Author Web Link
Cite
Citations: 6

Conference paper

Levine JM, Stott EA, Cheung PYK, 2014, Dynamic voltage & frequency scaling with online slack measurement., Publisher: ACM, Pages: 65-74

Conference paper

Stott E, Guan Z, Levine JM, Wong JSJ, Cheung PYKet al., 2013, Variation and Reliability in FPGAs, IEEE DESIGN & TEST, Vol: 30, Pages: 50-59, ISSN: 2168-2356

Author Web Link
Cite
Citations: 6

Journal article

Levine JM, Stott E, Constantinides GA, Cheung PYKet al., 2013, SMI: SLACK MEASUREMENT INSERTION FOR ONLINE TIMING MONITORING IN FPGAS, 23rd International Conference on Field Programmable Logic and Applications (FPL), Publisher: IEEE, ISSN: 1946-1488

Author Web Link
Cite
Citations: 1

Conference paper

Levine JM, Stott E, Constantinides GA, Cheung PYKet al., 2012, Online Measurement of Timing in Circuits: for Health Monitoring and Dynamic Voltage & Frequency Scaling, 20th IEEE International Symposium on Field-Programmable Custom Computing Machines (FCCM), Publisher: IEEE, Pages: 109-116

Author Web Link
Cite
Citations: 21

Conference paper

Stott EA, Cheung PYK, 2011, Improving FPGA Reliability with Wear-Levelling, Pages: 323-328-323-328

Cite

Conference paper

Stott EA, Sedcole NP, Cheung PYK, 2010, Fault tolerance and reliability in field-programmable gate arrays, Pages: 196-210-196-210

Cite

Conference paper

Stott EA, Wong JS, Sedcole NP, Cheung PYKet al., 2010, Degradation in FPGAs: measurement and modelling, International symposium on field programmable gate arrays, Pages: 229-238-229-238

Cite

Conference paper

Stott EA, Wong JS, Cheung PYK, 2010, Degradation Analysis and Mitigation in FPGAs, Pages: 428-433-428-433

Cite

Conference paper

Stott EA, Sedcole P, Cheung PYK, 2009, Modelling degradation in FPGA lookup tables, International Conference on Field-Programmable Technology, Pages: 443-446

Cite

Conference paper

Stott E, Sedcole P, Cheung PYK, 2008, Fault tolerant methods for reliability in FPGAs, International Conference on Field Programmable Logic and Applications, Publisher: IEEE, Pages: 415-420

Cite

Conference paper

DrEdwardStott

Contact

Location

Summary