Publications

Chen X, Knottenbelt WJ, 2015, A performance tree-based monitoring platform for clouds, Pages: 97-98

Cloud-based software systems are expected to deliver reli- able performance under dynamic workload while eficiently managing resources. Conventional monitoring frameworks provide limited support for exible and intuitive performance queries. In this paper, we present a prototype monitor- ing and control platform for clouds that is a better fit to the characteristics of cloud computing (e.g. extensible, user- defined, scalable). Service Level Objectives (SLOs) are ex- pressed graphically as Performance Trees, while violated SLOs trigger mitigating control actions.

Abstract
Cite
Citations: 1

Conference paper

Bradley J, Knottenbelt W, Thomas N, 2015, Electronic notes in theoretical computer science: Preface, Electronic Notes in Theoretical Computer Science, Vol: 310, Pages: 1-3, ISSN: 1571-0661

Cite

Journal article

, 2015, 8th International Conference on Performance Evaluation Methodologies and Tools, VALUETOOLS 2014, Bratislava, Slovakia, December 9-11, 2014, Publisher: ICST

Cite

Conference paper

, 2015, Proceedings of the Seventh International Workshop on the Practical Application of Stochastic Modelling, PASM 2014, Newcastle-upon-Tyne, UK, May 2014, Publisher: Elsevier

Cite

Conference paper

Kelly J, Batra N, Parson O, Dutta H, Knottenbelt W, Rogers A, Singh A, Srivastava Met al., 2014, NILMTK v0.2: A non-intrusive load monitoring toolkit for large scale data sets, BuildSys 2014 - Proceedings of the 1st ACM Conference on Embedded Systems for Energy-Efficient Buildings, Pages: 182-183

In this demonstration, we present an open source toolkit for evaluating non-intrusive load monitoring research; a field which aims to disaggregate a household's total electricity consumption into individual appliances. The toolkit contains: a number of importers for existing public data sets, a set of preprocessing and statistics functions, a benchmark disaggregation algorithm and a set of metrics to evaluate the performance of such algorithms. Specifically, this release of the toolkit has been designed to enable the use of large data sets by only loading individual chunks of the whole data set into memory at once for processing, before combining the results of each chunk.

Abstract
Cite
Citations: 36

Journal article

Kelly J, Knottenbelt W, 2014, Metadata for energy disaggregation, 2014 IEEE 38th International Computer Software and Applications Conference Workshops (COMPSACW), Publisher: IEEE

Energy disaggregation is the process of estimating the energy consumed by individual electrical appliances given only a time series of the whole-home power demand. Energy disaggregation researchers require datasets of the power demand from individual appliances and the whole-home power demand. Multiple such datasets have been released over the last few years but provide metadata in a disparate array of formats including CSV files and plain-text README files. At best, the lack of a standard metadata schema makes it unnecessarily time-consuming to write software to process multiple datasets and, at worse, the lack of a standard means that crucial information is simply absent from some datasets. We propose a metadata schema for representing appliances, meters, buildings, datasets, prior knowledge about appliances and appliance models. The schema is relational and provides a simple but powerful inheritance mechanism.

Conference paper

Danila R, Nika M, Wilding T, Knottenbelt WJet al., 2014, Uncertainty in on-the-fly epidemic fitting, EPEW 2014, Publisher: Springer International Publishing, Pages: 135-148, ISSN: 0302-9743

The modern world features a plethora of social, technological and biological epidemic phenomena. These epidemics now spread at unprecedented rates thanks to advances in industrialisation, transport and telecommunications. Effective real-time decision making and management of modern epidemic outbreaks depends on the two factors: the ability to determine epidemic parameters as the epidemic unfolds, and the ability to characterise rigorously the uncertainties inherent in these parameters. This paper presents a generic maximum-likelihoodbased methodology for online epidemic fitting of SIR models from a single trace which yields confidence intervals on parameter values. The method is fully automated and avoids the laborious manual efforts traditionally deployed in the modelling of biological epidemics. We present case studies based on both synthetic and real data.

Conference paper

Gandini A, Gribaudo M, Knottenbelt WJ, Osman R, Piazzolla Pet al., 2014, Performance evaluation of NoSQL databases, EPEW 2014, Publisher: Springer International Publishing, Pages: 16-29, ISSN: 0302-9743

NoSQL databases have emerged as a backend to support Big Data applications. NoSQL databases are characterized by horizontal scalability, schema-free data models, and easy cloud deployment. To avoid overprovisioning, it is essential to be able to identify the correct number of nodes required for a specific system before deployment. This paper benchmarks and compares three of the most common NoSQL databases: Cassandra, MongoDB and HBase. We deploy them on the Amazon EC2 cloud platform using different types of virtual machines and cluster sizes to study the effect of different configurations. We then compare the behavior of these systems to high-level queueing network models. Our results show that the models are able to capture the main performance characteristics of the studied databases and form the basis for a capacity planning tool for service providers and service users.

Conference paper

Nika M, Fiems D, De Turck K, Knottenbelt WJet al., 2014, Modelling interacting epidemics in overlapping populations, 21st International Conference, ASMTA 2014, Publisher: SPRINGER-VERLAG BERLIN, Pages: 33-45, ISSN: 0302-9743

Epidemic modelling is fundamental to our understanding of biological, social and technological spreading phenomena. As conceptual frameworks for epidemiology advance, it is important they are able to elucidate empirically-observed dynamic feedback phenomena involving interactions amongst pathogenic agents in the form of syndemic and counter-syndemic effects. In this paper we model the dynamics of two types of epidemics with syndemic and counter-syndemic interaction effects in multiple possibly-overlapping populations. We derive a Markov model whose fluid limit reduces to a set of coupled SIR-type ODEs. Its numerical solution reveals some interesting multimodal behaviours, as shown in our case studies.

Conference paper

Batra N, Kelly J, Parson O, Dutta H, Knottenbelt WJ, Rogers A, Singh A, Srivastava MBet al., 2014, NILMTK: an open source toolkit for non-intrusive load monitoring, e-Energy: Future energy systems, Publisher: ACM, Pages: 265-276

Non-intrusive load monitoring, or energy disaggregation, aims to separate household energy consumption data collected from a single point of measurement into appliance-level consumption data. In recent years, the field has rapidly expanded due to increased interest as national deployments of smart meters have begun in many countries. However, empirically comparing disaggregation algorithms is currently virtually impossible. This is due to the different data sets used, the lack of reference implementations of these algorithms and the variety of accuracy metrics employed. To address this challenge, we present the Non-intrusive Load Monitoring Toolkit (NILMTK); an open source toolkit designed specifically to enable the comparison of energy disaggregation algorithms in a reproducible manner. This work is the first research to compare multiple disaggregation approaches across multiple publicly available data sets. Our toolkit includes parsers for a range of existing data sets, a collection of preprocessing algorithms, a set of statistics for describing data sets, two reference benchmark disaggregation algorithms and a suite of accuracy metrics. We demonstrate the range of reproducible analyses which are made possible by our toolkit, including the analysis of six publicly available data sets and the evaluation of both benchmark disaggregation algorithms across such data sets.

Conference paper

Huang WC, Knottenbelt W, 2014, Low-overhead development of scalable resource- efficient software systems, Handbook of Research on Emerging Advancements and Technologies in Software Engineering, Pages: 81-105, ISBN: 9781466660274

As the variety of execution environments and application contexts increases exponentially, modern software is often repeatedly refactored to meet ever-changing non-functional requirements. Although programmer effort can be reduced through the use of standardised libraries, software adjustment for scalability, reliability, and performance remains a time-consuming and manual job that requires high levels of expertise. Previous research has proposed three broad classes of techniques to overcome these difficulties in specific application domains: probabilistic techniques, out of core storage, and parallelism. However, due to limited cross-pollination of knowledge between domains, the same or very similar techniques have been reinvented all over again, and the application of techniques still requires manual effort. This chapter introduces the vision of self-adaptive scalable resource-efficient software that is able to reconfigure itself with little other than programmer-specified Service-Level Objectives and a description of the resource constraints of the current execution environment. The approach is designed to be low-overhead from the programmer's perspective - indeed a naïve implementation should suffice. To illustrate the vision, the authors have implemented in C++ a prototype library of self-adaptive containers, which dynamically adjust themselves to meet non-functional requirements at run time and which automatically deploy mitigating techniques when resource limits are reached. The authors describe the architecture of the library and the functionality of each component, as well as the process of self-adaptation. They explore the potential of the library in the context of a case study, which shows that the library can allow a naïve program to accept large-scale input and become resource-aware with very little programmer overhead.

Abstract
Cite

Book chapter

Tsimashenka I, Knottenbelt WJ, Harrison PG, 2014, Controlling variability in split-merge systems and its impact on performance, Annals of Operations Research, Vol: 239, Pages: 569-588, ISSN: 1572-9338

We consider split–merge systems with heterogeneous subtask service times and limited output buffer space in which to hold completed but as yet unmerged subtasks. An important practical problem in such systems is to limit utilisation of the output buffer. This can be achieved by judiciously delaying the processing of subtasks in order to cluster subtask completion times. In this paper we present a methodology to find those deterministic subtask processing delays which minimise any given percentile of the difference in times of appearance of the first and the last subtasks in the output buffer. Technically this is achieved in three main steps: firstly, we define an expression for the distribution of the range of samples drawn from nn independent heterogeneous service time distributions. This is a generalisation of the well-known order statistic result for the distribution of the range of nn samples taken from the same distribution. Secondly, we extend our model to incorporate deterministic delays applied to the processing of subtasks. Finally, we present an optimisation scheme to find that vector of delays which minimises a given percentile of the range of arrival times of subtasks in the output buffer. We show the impact of applying the optimal delays on system stability and task response time. Two case studies illustrate the applicability of our approach.

Journal article

Chen X, Ho CP, Osman R, Harrison PG, Knottenbelt WJet al., 2014, Understanding, Modelling and Improving the Performance of Web Applications in Multi-core Virtualised Environments, th ACM/SPEC International Conference on Performance Engineering (ICPE 2014), Publisher: ACM Digital Library

Conference paper

Kelly J, Knottenbelt WJ, 2014, 'UK-DALE': A dataset recording UK Domestic Appliance-Level Electricity demand and whole-house demand., CoRR, Vol: abs/1404.0284

Cite

Journal article

Kelly J, Knottenbelt WJ, 2014, Metadata for Energy Disaggregation., CoRR, Vol: abs/1403.5946

Cite

Journal article

Kelly J, Batra N, Parson O, Dutta H, Knottenbelt WJ, Rogers A, Singh A, Srivastava MBet al., 2014, NILMTK v0.2: a non-intrusive load monitoring toolkit for large scale data sets: demo abstract., Publisher: ACM, Pages: 182-183

Conference paper

Osman R, Coulden D, Knottenbelt WJ, 2013, Performance modelling of concurrency control schemes for relational databases, Pages: 337-351, ISSN: 0302-9743

The performance of relational database systems is influenced by complex interdependent factors, which makes developing accurate models to evaluate their performance a challenging task. This paper presents a novel case study in which we develop a simple queueing Petri net model of a relational database system. The performance of the database system is evaluated for three different concurrency control schemes and compared to the results predicted by a queueing Petri net model. The results demonstrate the potential of our modelling approach in modelling database systems using relatively simple models that require minimal parameterization. Our models gave accurate approximations of the mean response times for shared and exclusive transactions with average prediction errors of 10% for high contention scenarios. © 2013 Springer-Verlag.

Abstract
Cite
Citations: 8

Conference paper

Marin A, Balsamo MS, Knottenbelt W, 2013, Preface, Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), Vol: 8168 LNCS, ISSN: 0302-9743

Cite

Journal article

Harrison PG, Hayden RA, Knottenbelt WJ, 2013, Product-forms in batch networks: Approximation and asymptotics, PERFORMANCE EVALUATION, Vol: 70, Pages: 822-840, ISSN: 0166-5316

Author Web Link
Cite
Citations: 5

Journal article

Tsimashenka I, Knottenbelt WJ, 2013, Trading off subtask dispersion and response time in split-merge systems, ASMTA 2013, Publisher: Springer Berlin Heidelberg, Pages: 431-442, ISSN: 0302-9743

In many real-world systems incoming tasks split into subtasks which are processed by a set of parallel servers. In such systems two metrics are of potential interest: response time and subtask dispersion. Previous research has been focused on the minimisation of one, but not both, of these metrics. In particular, in our previous work, we showed how the processing of selected subtasks can be delayed in order to minimise expected subtask dispersion and percentiles of subtask dispersion in the context of split-merge systems. However, the introduction of subtask delays obviously impacts adversely on task response time and maximum sustainable system throughput. In the present work, we describe a methodology for managing the trade off between subtask dispersion and task response time. The objective function of the minimisation is based on the product of expected subtask dispersion and expected task response time. Compared with our previous methodology, we show how our new technique can achieve comparable subtask dispersion with substantial improvements in expected task response time.

Conference paper

Anastasiou N, Knottenbelt W, 2013, PEPERCORN: Inferring performance models from location tracking data, Pages: 169-172, ISSN: 0302-9743

Stochastic performance models are widely used to analyse the performance of systems that process customers and resources. However, the construction of such models is traditionally manual and therefore expensive, intrusive and prone to human error. In this paper we introduce PEPERCORN, a Petri Net Performance Model (PNPM) construction tool, which, given a dataset of raw location tracking traces obtained from a customer-processing system, automatically formulates and parameterises a corresponding Coloured Generalised Stochastic Petri Net (CGSPN) performance model. © 2013 Springer-Verlag.

Abstract
Cite
Citations: 2

Conference paper

Bradley J, Heljanko K, Knottenbelt W, Thomas Net al., 2013, Preface, Electronic Notes in Theoretical Computer Science, Vol: 296, Pages: 1-5, ISSN: 1571-0661

Cite

Journal article

Bar P, Benfredj R, Marks J, Ulevinov D, Wozniak B, Casale G, Knottenbelt WJet al., 2013, Towards a monitoring feedback loop for cloud applications, Pages: 43-44

Performance monitoring is fundamental to track cloud application health and service-level agreement compliance, but with the emergence of multi-cloud deployments, it may become increasingly important also to create a feedback loop between runtime operation in multi-clouds and design-time reasoning. This is because the developer needs to acquire more information on the specific performance features of a cloud platform to better leverage its specificities. To support this goal, we have developed a set of open source components that extract quality-of-service (QoS) data from a target Java application using JMX, aggregate it in a time-series database, and finally deliver it in a prototype Java dashboard that may be integrated in a development environment, such as Eclipse, to display either live or historical QoS data. The architecture is not only limited to collection, aggregation, and display of QoS data, but it also allows the evaluation of hierarchical queries expressed using the Performance Trees graphical language. It is our intention that this will provide a cloud-independent uniform interface for developers to specify monitoring queries. Initial evaluation suggests that Cube on MongoDB provides appropriate scalability for this application.

Abstract
Cite
Citations: 3

Conference paper

Spanias D, Knottenbelt WJ, 2013, Predicting the outcomes of tennis matches using a low-level point model, IMA JOURNAL OF MANAGEMENT MATHEMATICS, Vol: 24, Pages: 311-320, ISSN: 1471-678X

Author Web Link
Cite
Citations: 15

Journal article

Noon E, Knottenbelt WJ, Kuhn D, 2013, Kelly's fractional staking updated for betting exchanges, IMA JOURNAL OF MANAGEMENT MATHEMATICS, Vol: 24, Pages: 283-299, ISSN: 1471-678X

Author Web Link
Cite
Citations: 2

Journal article

Coulden D, Osman R, Knottenbelt WJ, 2013, Performance modelling of database contention using queueing Petri nets, Pages: 331-334

Most performance evaluation studies of database systems are high level studies limited by the expressiveness of their modelling formalisms. In this paper, we illustrate the potential of Queueing Petri Nets as a successor of traditionally-adopted modelling formalisms in evaluating the complexities of database systems. This is demonstrated through the construction and analysis of a Queueing Petri Net model of table-level database locking. We show that this model predicts mean response times better than a corresponding Petri net model. © 2013 ACM.

Abstract
Cite
Citations: 11

Conference paper

Anastasiou N, Knottenbelt WJ, 2013, Deriving Coloured Generalised Stochastic Petri Net Performance Models from High-Precision Location Tracking Data, 4th ACM/SPEC International Conference on Performance Engineering (ICPE 2013), Publisher: ACM, Pages: 375-386

Stochastic performance models are widely used to analyse the performability of systems that involve the flow and processing of customers and resources. However, model formulation and parameterisation are traditionally manual and thus expensive, intrusive and error-prone. Our earlier work has demonstrated the feasibility of automated performance model construction from location tracking data. In particular, we presented a methodology based on a four-stage data processing pipeline, which automatically constructs Generalised Stochastic Petri Net}(GSPN) performance models from an input dataset consisting of raw location tracking traces. This pipeline was subsequently enhanced with a presence-based synchronisation detection mechanism.This paper introduces Coloured Generalised Stochastic Petri Nets (CGSPNs) into our methodology in order to provide support for multiple customer classes and service cycles. Distinct token types are used to model customers of different classes, while Johnson's algorithm for enumerating elementary cycles in a directed graph is employed to detect the presence of service cycles. Coloured tokens are also used to accurately reflect customer routing after the completion of a service cycle. A third extension enables the calculation and representation of the inter-routing probability of the customer flowbetween the system's service centres. We evaluate these extensions and their integration in our existing methodology via a case study ofa simplified model of an Accident and Emergency (A&E) department. The case study is based onsynthetic location tracking data, generated using an extended version of the LocTrackJINQS location-aware queueing network simulator.

Conference paper

Dingle N, Knottenbelt W, Spanias D, 2013, On the (page) ranking of professional tennis players, Pages: 237-247, ISSN: 0302-9743

We explore the relationship between official rankings of professional tennis players and rankings computed using a variant of the PageRank algorithm as proposed by Radicchi in 2011. We show Radicchi's equations follow a natural interpretation of the PageRank algorithm and present up-to-date comparisons of official rankings with PageRank-based rankings for both the Association of Tennis Professionals (ATP) and Women's Tennis Association (WTA) tours. For top-ranked players these two rankings are broadly in line; however, there is wide variation in the tail which leads us to question the degree to which the official ranking mechanism reflects true player ability. For a 390-day sample of recent tennis matches, PageRank-based rankings are found to be better predictors of match outcome than the official rankings. © 2013 Springer-Verlag.

Abstract
Cite
Citations: 12

Conference paper

Tsimashenka I, Knottenbelt WJ, 2013, Reduction of subtask dispersion in fork-join systems, Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), Vol: 8168 LNCS, Pages: 325-336, ISSN: 0302-9743

Fork-join and split-merge queueing systems are well-known abstractions of parallel systems in which each incoming task splits into subtasks that are processed by a set of parallel servers. A task exits the system when all of its subtasks have completed service. Two key metrics of interest in such systems are task response time and subtask dispersion. This paper presents a technique applicable to a class of fork-join systems with heterogeneous exponentially distributed service times that is able to reduce subtask dispersion with only a marginal increase in task response time. Achieving this is challenging since the unsynchronised operation of fork-join systems naturally militates against low subtask dispersion. Our approach builds on our earlier research examining subtask dispersion and response time in split-merge systems, and involves the frequent application and updating of delays to the subtasks at the head of the parallel service queues. Numerical results show the ability to reduce dispersion in fork-join systems to levels comparable with or below that observed in all varieties of split-merge systems while retaining the response time and throughput benefits of a fork-join system. © 2013 Springer-Verlag Berlin Heidelberg.

Abstract
Cite
Citations: 5

Journal article

Nika M, Ivanova G, Knottenbelt WJ, 2013, On celebrity, epidemiology and the internet, Pages: 175-183

The proliferation of the internet has created new opportunities to study the mechanisms behind the emergence and dynamic behaviour of online popularity and celebrity. In this paper we examine how common epidemic models, specifically SIR and SEIR models, can be applied to model the evolution of outbreaks of celebrity interest on the internet. A major challenge when using such models is to parameterise them to fit data as an outbreak unfolds over time, without knowing the initial number of susceptibles in the target population. We present a methodology capable of fitting the model's parameters from a single trace, while the outbreak unfolds, and of forecasting the epidemic's progression in the coming days. We present results on three kinds of data: simulated epidemic data, data from a real Influenza virus outbreak and data from music artists BitTorrent download and YouTube video views activity.

Abstract
Cite
Citations: 6

Conference paper

ProfessorWilliamKnottenbelt

Contact

Location

Summary