Publications from our Researchers
Several of our current PhD candidates and fellow researchers at the Data Science Institute have published, or in the proccess of publishing, papers to present their research.
- Showing results for:
- Reset all filters
Journal articleGomez-Romero J, Fernandez-Basso CJ, Cambronero MV, et al., 2019,
A probabilistic algorithm for predictive control with full-complexity models in non-residential buildings, IEEE Access, Vol: 7, Pages: 38748-38765, ISSN: 2169-3536
Despite the increasing capabilities of information technologies for data acquisition and processing, building energy management systems still require manual configuration and supervision to achieve optimal performance. Model predictive control (MPC) aims to leverage equipment control – particularly heating, ventilation and air conditioning (HVAC)– by using a model of the building to capture its dynamic characteristics and to predict its response to alternative control scenarios. Usually, MPC approaches are based on simplified linear models, which support faster computation but also present some limitations regarding interpretability, solution diversification and longer-term optimization. In this work, we propose a novel MPC algorithm that uses a full-complexity grey-box simulation model to optimize HVAC operation in non-residential buildings. Our system generates hundreds of candidate operation plans, typically for the next day, and evaluates them in terms of consumption and comfort by means of a parallel simulator configured according to the expected building conditions (weather, occupancy, etc.) The system has been implemented and tested in an office building in Helsinki, both in a simulated environment and in the real building, yielding energy savings around 35% during the intermediate winter season and 20% in the whole winter season with respect to the current operation of the heating equipment.
Journal articleRassouli B, Rosas FE, Gunduz D, 2019,
Data disclosure under perfect sample privacy
Perfect data privacy seems to be in fundamental opposition to the economicaland scientific opportunities associated with extensive data exchange. Defyingthis intuition, this paper develops a framework that allows the disclosure ofcollective properties of datasets without compromising the privacy ofindividual data samples. We present an algorithm to build an optimal disclosurestrategy/mapping, and discuss it fundamental limits on finite andasymptotically large datasets. Furthermore, we present explicit expressions tothe asymptotic performance of this scheme in some scenarios, and study caseswhere our approach attains maximal efficiency. We finally discuss suboptimalschemes to provide sample privacy guarantees to large datasets with a reducedcomputational cost.
Journal articleCreswell A, Bharath AA, 2019,
Unsupervised learning is of growing interest because it unlocks the potential held in vast amounts of unlabeled data to learn useful representations for inference. Autoencoders, a form of generative model, may be trained by learning to reconstruct unlabeled input data from a latent representation space. More robust representations may be produced by an autoencoder if it learns to recover clean input samples from corrupted ones. Representations may be further improved by introducing regularization during training to shape the distribution of the encoded data in the latent space. We suggest denoising adversarial autoencoders (AAEs), which combine denoising and regularization, shaping the distribution of latent space using adversarial training. We introduce a novel analysis that shows how denoising may be incorporated into the training and sampling of AAEs. Experiments are performed to assess the contributions that denoising makes to the learning of representations for classification and sample synthesis. Our results suggest that autoencoders trained using a denoising criterion achieve higher classification performance and can synthesize samples that are more consistent with the input data than those trained without a corruption process.
Journal articleRueda R, Cuéllar M, Molina-Solana M, et al., 2019,
This work addresses the problem of energy consumption time series forecasting. In our approach, a set of time series containing energy consumption data is used to train a single, parameterised prediction model that can be used to predict future values for all the input time series. As a result, the proposed method is able to learn the common behaviour of all time series in the set (i.e., a fingerprint) and use this knowledge to perform the prediction task, and to explain this common behaviour as an algebraic formula. To that end, we use symbolic regression methods trained with both single- and multi-objective algorithms. Experimental results validate this approach to learn and model shared properties of different time series, which can then be used to obtain a generalised regression model encapsulating the global behaviour of different energy consumption time series.
Journal articleRobinson R, Valindria VV, Bai W, et al., 2019,
Automated quality control in image segmentation: application to the UK Biobank cardiac MR imaging study, Journal of Cardiovascular Magnetic Resonance, Vol: 21, ISSN: 1097-6647
Background: The trend towards large-scale studies including population imaging poses new challenges in terms of quality control (QC). This is a particular issue when automatic processing tools, e.g. image segmentation methods, are employed to derive quantitative measures or biomarkers for later analyses. Manual inspection and visual QC of each segmentation isn't feasible at large scale. However, it's important to be able to automatically detect when a segmentation method fails so as to avoid inclusion of wrong measurements into subsequent analyses which could lead to incorrect conclusions. Methods: To overcome this challenge, we explore an approach for predicting segmentation quality based on Reverse Classification Accuracy, which enables us to discriminate between successful and failed segmentations on a per-cases basis. We validate this approach on a new, large-scale manually-annotated set of 4,800 cardiac magnetic resonance scans. We then apply our method to a large cohort of 7,250 cardiac MRI on which we have performed manual QC. Results: We report results used for predicting segmentation quality metrics including Dice Similarity Coefficient (DSC) and surface-distance measures. As initial validation, we present data for 400 scans demonstrating 99% accuracy for classifying low and high quality segmentations using predicted DSC scores. As further validation we show high correlation between real and predicted scores and 95% classification accuracy on 4,800 scans for which manual segmentations were available. We mimic real-world application of the method on 7,250 cardiac MRI where we show good agreement between predicted quality metrics and manual visual QC scores. Conclusions: We show that RCA has the potential for accurate and fully automatic segmentation QC on a per-case basis in the context of large-scale population imaging as in the UK Biobank Imaging Study.
Conference paperChen C, Bai W, Rueckert D, 2019,
Segmentation of the left atrium (LA) is crucial for assessing its anatomy in both pre-operative atrial fibrillation (AF) ablation planning and post-operative follow-up studies. In this paper, we present a fully automated framework for left atrial segmentation in gadolinium-enhanced magnetic resonance images (GE-MRI) based on deep learning. We propose a fully convolutional neural network and explore the benefits of multi-task learning for performing both atrial segmentation and pre/post ablation classification. Our results show that, by sharing features between related tasks, the network can gain additional anatomical information and achieve more accurate atrial segmentation, leading to a mean Dice score of 0.901 on a test set of 20 3D MRI images. Code of our proposed algorithm is available at https://github.com/cherise215/atria_segmentation_2018/.
Journal articleGilbert K, Bai W, Mauger C, et al., 2019,
Independent left ventricular morphometric atlases show consistent relationships with cardiovascular risk factors: A UK Biobank study, Scientific Reports, Vol: 9, ISSN: 2045-2322
Left ventricular (LV) mass and volume are important indicators of clinical and pre-clinical disease processes. However, much of the shape information present in modern imaging examinations is currently ignored. Morphometric atlases enable precise quantification of shape and function, but there has been no objective comparison of different atlases in the same cohort. We compared two independent LV atlases using MRI scans of 4547 UK Biobank participants: (i) a volume atlas derived by automatic non-rigid registration of image volumes to a common template, and (ii) a surface atlas derived from manually drawn epicardial and endocardial surface contours. The strength of associations between atlas principal components and cardiovascular risk factors (smoking, diabetes, high blood pressure, high cholesterol and angina) were quantified with logistic regression models and five-fold cross validation, using area under the ROC curve (AUC) and Akaike Information Criterion (AIC) metrics. Both atlases exhibited similar principal components, showed similar relationships with risk factors, and had stronger associations (higher AUC and lower AIC) than a reference model based on LV mass and volume, for all risk factors (DeLong p < 0.05). Morphometric variations associated with each risk factor could be quantified and visualized and were similar between atlases. UK Biobank LV shape atlases are robust to construction method and show stronger relationships with cardiovascular risk factors than mass and volume.
Journal articleJevnikar Z, Östling J, Ax E, et al., 2019,
BACKGROUND: Although several studies link high levels of IL-6 and soluble IL-6 receptor (sIL-6R) with asthma severity and decreased lung function, the role of IL-6 trans-signaling (IL-6TS) in asthma is unclear. OBJECTIVE: To explore the association between epithelial IL-6TS pathway activation and molecular and clinical phenotypes in asthma. METHODS: An IL-6TS gene signature, obtained from air-liquid interface (ALI) cultures of human bronchial epithelial cells stimulated with IL-6 and sIL-6R, was used to stratify lung epithelium transcriptomic data (U-BIOPRED cohorts) by hierarchical clustering. IL-6TS-specific protein markers were used to stratify sputum biomarker data (Wessex cohort). Molecular phenotyping was based on transcriptional profiling of epithelial brushings, pathway analysis and immunohistochemical analysis of bronchial biopsies. RESULTS: Activation of IL-6TS in ALI cultures reduced epithelial integrity and induced a specific gene signature enriched in genes associated with airway remodeling. The IL-6TS signature identified a subset of IL-6TS High asthma patients with increased epithelial expression of IL-6TS inducible genes in absence of systemic inflammation. The IL-6TS High subset had an overrepresentation of frequent exacerbators, blood eosinophilia, and submucosal infiltration of T cells and macrophages. In bronchial brushings, TLR pathway genes were up-regulated while the expression of tight junction genes was reduced. Sputum sIL-6R and IL-6 levels correlated with sputum markers of remodeling and innate immune activation, in particular YKL-40, MMP3, MIP-1β, IL-8 and IL-1β. CONCLUSIONS: Local lung epithelial IL-6TS activation in absence of type 2 airway inflammation defines a novel subset of asthmatics and may drive airway inflammation and epithelial dysfunction in these patients.
Journal articleSimpson AJ, Hekking P-P, Shaw DE, et al., 2019,
Conference paperFernando S, Birch D, Molina-Solana M, et al., 2019,
Journal articleOehmichen A, Hua K, Diaz Lopez JA, et al., 2019,
Journal articleBalaban G, Halliday BP, Costa CM, et al., 2018,
Fibrosis Microstructure Modulates Reentry in Non-ischemic Dilated Cardiomyopathy: Insights From Imaged Guided 2D Computational Modeling, Frontiers in Physiology, Vol: 9, ISSN: 1664-042X
Aims: Patients who present with non-ischemic dilated cardiomyopathy (NIDCM) andenhancement on late gadolinium magnetic resonance imaging (LGE-CMR), are at highrisk of sudden cardiac death (SCD). Further risk stratification of these patients basedon LGE-CMR may be improved through better understanding of fibrosis microstructure.Our aim is to examine variations in fibrosis microstructure based on LGE imaging, andquantify the effect on reentry inducibility and mechanism. Furthermore, we examine therelationship between transmural activation time differences and reentry.Methods and Results: 2D Computational models were created from a single short axisLGE-CMR image, with 401 variations in fibrosis type (interstitial, replacement) and density,as well as presence or absence of reduced conductivity (RC). Transmural activationtimes (TAT) were measured, as well as reentry incidence and mechanism. Reentrieswere inducible above specific density thresholds (0.8, 0.6 for interstitial, replacementfibrosis). RC reduced these thresholds (0.3, 0.4 for interstitial, replacement fibrosis) andincreased reentry incidence (48 no RC vs. 133 with RC). Reentries were classified as rotor,micro-reentry, or macro-reentry and depended on fibrosis micro-structure. Differencesin TAT at coupling intervals 210 and 500ms predicted reentry in the models (sensitivity89%, specificity 93%). A sensitivity analysis of TAT and reentry incidence showed thatthese quantities were robust to small changes in the pacing location.Conclusion: Computational models of fibrosis micro-structure underlying areas ofLGE in NIDCM provide insight into the mechanisms and inducibility of reentry, andtheir dependence upon the type and density of fibrosis. Transmural activation times,measured at the central extent of the scar, can potentially differentiate microstructureswhich support reentry.
Journal articlede Montjoye Y-A, Gambs S, Blondel V, et al., 2018,
The breadcrumbs we leave behind when using our mobile phones—who somebody calls, for how long, and from where—contain unprecedented insights about us and our societies. Researchers have compared the recent availability of large-scale behavioral datasets, such as the ones generated by mobile phones, to the invention of the microscope, giving rise to the new field of computational social science.
Journal articleGomez-Romero J, Molina-Solana MJ, Oehmichen A, et al., 2018,
Knowledge graphs are an increasingly important source of data and context information in Data Science. A first step in data analysis is data exploration, in which visualization plays a key role. Currently, Semantic Web technologies are prevalent for modelling and querying knowledge graphs; however, most visualization approaches in this area tend to be overly simplified and targeted to small-sized representations. In this work, we describe and evaluate the performance of a Big Data architecture applied to large-scale knowledge graph visualization. To do so, we have implemented a graph processing pipeline in the Apache Spark framework and carried out several experiments with real-world and synthetic graphs. We show that distributed implementations of the graph building, metric calculation and layout stages can efficiently manage very large graphs, even without applying partitioning or incremental processing strategies.
Journal articleMolina-Solana M, Kennedy M, Amador Diaz Lopez J, 2018,
Organization of companies and their HR departments are becoming hugely affected by recent advancements in computational power and Artificial Intelligence, with this trend likely to dramatically rise in the next few years. This work presents foo.castr, a tool we are developing to visualise, communicate and facilitate the understanding of the impact of these advancements in the future of workforce. It builds upon the idea that particular tasks within job descriptions will be progressively taken by computers, forcing the shaping of human jobs. In its current version, foo.castr presents three different scenarios to help HR departments planning potential changes and disruptions brought by the adoption of Artificial Intelligence.
This data is extracted from the Web of Science and reproduced under a licence from Thomson Reuters. You may not copy or re-distribute this data in whole or in part without the written consent of the Science business of Thomson Reuters.