Publications from our Researchers

Several of our current PhD candidates and fellow researchers at the Data Science Institute have published, or in the proccess of publishing, papers to present their research.  

Search or filter publications

Filter by type:

Filter by publication type

Filter by year:

to

Results

  • Showing results for:
  • Reset all filters

Search results

  • Journal article
    Cofré R, Videla L, Rosas F, 2019,

    An introduction to the non-equilibrium steady states of maximum entropy spike trains

    , Entropy, Vol: 21, Pages: 1-28, ISSN: 1099-4300

    Although most biological processes are characterized by a strong temporal asymmetry, several popular mathematical models neglect this issue. Maximum entropy methods provide a principled way of addressing time irreversibility, which leverages powerful results and ideas from the literature of non-equilibrium statistical mechanics. This tutorial provides a comprehensive overview of these issues, with a focus in the case of spike train statistics. We provide a detailed account of the mathematical foundations and work out examples to illustrate the key concepts and results from non-equilibrium statistical mechanics.

  • Journal article
    Oehmichen A, Hua K, Lopez JAD, Molina-Solana M, Gomez-Romero J, Guo Yet al., 2019,

    Not all lies are equal. A study into the engineering of political misinformation in the 2016 US presidential election

    , IEEE Access, Vol: 7, Pages: 126305-126314, ISSN: 2169-3536

    We investigated whether and how political misinformation is engineered using a datasetof four months worth of tweets related to the 2016 presidential election in the United States. The datacontained tweets that achieved a significant level of exposure and was manually labelled into misinformationand regular information. We found that misinformation was produced by accounts that exhibit differentcharacteristics and behaviour from regular accounts. Moreover, the content of misinformation is more novel,polarised and appears to change through coordination. Our findings suggest that engineering of politicalmisinformation seems to exploit human traits such as reciprocity and confirmation bias. We argue thatinvestigating how misinformation is created is essential to understand human biases, diffusion and ultimatelybetter produce public policy.

  • Conference paper
    Truong N, Sun K, Guo Y,

    Blockchain-based personal data management: from fiction to solution

    , The 18th IEEE International Symposium on Network Computing and Applications (NCA 2019), Publisher: IEEE

    The emerging blockchain technology has enabledvarious decentralised applications in a trustless environmentwithout relying on a trusted intermediary. It is expected as apromising solution to tackle sophisticated challenges on personaldata management, thanks to its advanced features such as im-mutability, decentralisation and transparency. Although certainapproaches have been proposed to address technical difficultiesin personal data management; most of them only provided pre-liminary methodological exploration. Alarmingly, when utilisingBlockchain for developing a personal data management system,fictions have occurred in existing approaches and been promul-gated in the literature. Such fictions are theoretically doable;however, by thoroughly breaking down consensus protocols andtransaction validation processes, we clarify that such existingapproaches are either impractical or highly inefficient due tothe natural limitations of the blockchain and Smart Contractstechnologies. This encourages us to propose a feasible solution inwhich such fictions are reduced by designing a novel systemarchitecture with a blockchain-based “proof of permission”protocol. We demonstrate the feasibility and efficiency of theproposed models by implementing a clinical data sharing servicebuilt on top of a public blockchain platform. We believe thatour research resolves existing ambiguity and take a step furtheron providing a practically feasible solution for decentralisedpersonal data management.

  • Conference paper
    Gadotti A, Houssiau F, Rocher L, Livshits B, de Montjoye Y-Aet al.,

    When the Signal is in the Noise: Exploiting Diffix's Sticky Noise

    , 28th USENIX Security Symposium (USENIX Security '19), Publisher: USENIX

    Anonymized data is highly valuable to both businesses andresearchers. A large body of research has however shown thestrong limits of the de-identification release-and-forget model,where data is anonymized and shared. This has led to the de-velopment of privacy-preserving query-based systems. Basedon the idea of “sticky noise”, Diffix has been recently pro-posed as a novel query-based mechanism satisfying alone theEU Article 29 Working Party’s definition of anonymization.According to its authors, Diffix adds less noise to answersthan solutions based on differential privacy while allowingfor an unlimited number of queries.This paper presents a new class of noise-exploitation at-tacks, exploiting the noise added by the system to infer privateinformation about individuals in the dataset. Our first differen-tial attack uses samples extracted from Diffix in a likelihoodratio test to discriminate between two probability distributions.We show that using this attack against a synthetic best-casedataset allows us to infer private information with 89.4% ac-curacy using only 5 attributes. Our second cloning attack usesdummy conditions that conditionally strongly affect the out-put of the query depending on the value of the private attribute.Using this attack on four real-world datasets, we show thatwe can infer private attributes of at least 93% of the users inthe dataset with accuracy between 93.3% and 97.1%, issuinga median of 304 queries per user. We show how to optimizethis attack, targeting 55.4% of the users and achieving 91.7%accuracy, using a maximum of only 32 queries per user.Our attacks demonstrate that adding data-dependent noise,as done by Diffix, is not sufficient to prevent inference ofprivate attributes. We furthermore argue that Diffix alone failsto satisfy Art. 29 WP’s definition of anonymization. We con-clude by discussing how non-provable privacy-preserving systems can be combined with fundamental security principlessuch as defense-in

  • Journal article
    Rocher L, Hendrickx J, de Montjoye Y-A, 2019,

    Estimating the success of re-identifications in incomplete datasets using generative models

    , Nature Communications, ISSN: 2041-1723
  • Conference paper
    Fernando S, Birch D, Molina-Solana M, McIlwraith D, Guo Yet al., 2019,

    Compositional Microservices for Immersive Social Visual Analytics

    , Pages: 216-223, ISSN: 1093-9547

    © 2019 IEEE. As humans, we have developed to process highly complex visual data from our surroundings. This is why data visualization and interaction is one of the quickest ways to facilitate investigation and communicate understanding. To perform visual analytics effectively at the big data scale it is crucial that we develop an integrated processing and visualization ecosystem. However, to date, in Large High-Resolution Display (LHRD) environments the worlds of data processing and visualization remain largely disconnected. In this paper, we propose a common architectural approach to enable integrated data processing and distributed visualization via the composition of discrete microservices. Each of these microservices provides a very specific clearly-defined function, such as analyzing data, creating a visualization, sharding data or providing a synchronization source. By defining common transport, data and API formats we enable the composition of these microservices from processing raw data through to analytics, visualization and rendering. This compositionality, inspired by successful data-driven visualization frameworks provides a common platform for immersive social visual analytics.

  • Report
    Crémer J, de Montjoye Y-A, Schweitzer H, 2019,

    Competition policy for the digital era

    , Competition policy for the digital era, Brussels, Publisher: EU Publications
  • Conference paper
    Jain S, Bensaid E, de Montjoye Y-A, 2019,

    UNVEIL: capture and visualise WiFi data leakages

    , The Web Conference 2019, Publisher: ACM, Pages: 3550-3554

    In the past few years, numerous privacy vulnerabilities have been discovered in the WiFi standards and their implementations for mobile devices. These vulnerabilities allow an attacker to collect large amounts of data on the device user, which could be used to infer sensitive information such as religion, gender, and sexual orientation. Solutions for these vulnerabilities are often hard to design and typically require many years to be widely adopted, leaving many devices at risk.In this paper, we present UNVEIL - an interactive and extendable platform to demonstrate the consequences of these attacks. The platform performs passive and active attacks on smartphones to collect and analyze data leaked through WiFi and communicate the analysis results to users through simple and interactive visualizations.The platform currently performs two attacks. First, it captures probe requests sent by nearby devices and combines them with public WiFi location databases to generate a map of locations previously visited by the device users. Second, it creates rogue access points with SSIDs of popular public WiFis (e.g. _Heathrow WiFi, Railways WiFi) and records the resulting internet traffic. This data is then analyzed and presented in a format that highlights the privacy leakage. The platform has been designed to be easily extendable to include more attacks and to be easily deployable in public spaces. We hope that UNVEIL will help raise public awareness of privacy risks of WiFi networks.

  • Journal article
    Brinkman P, Wagener AH, Hekking P-P, Bansal AT, Maitland-van der Zee A-H, Wang Y, Weda H, Knobel HH, Vink TJ, Rattray NJ, D'Amico A, Pennazza G, Santonico M, Lefaudeux D, De Meulder B, Auffray C, Bakke PS, Caruso M, Chanez P, Chung KF, Corfield J, Dahlen S-E, Djukanovic R, Geiser T, Horvath I, Krug N, Musial J, Sun K, Riley JH, Shaw DE, Sandstrom T, Sousa AR, Montuschi P, Fowler SJ, Sterk PJet al., 2019,

    Identification and prospective stability of electronic nose (eNose)-derived inflammatory phenotypes in patients with severe asthma

    , JOURNAL OF ALLERGY AND CLINICAL IMMUNOLOGY, Vol: 143, Pages: 1811-+, ISSN: 0091-6749
  • Journal article
    Gomez-Romero J, Fernandez-Basso CJ, Cambronero MV, Molina-Solana M, Campana JR, Ruiz MD, Martin-Bautista MJet al., 2019,

    A probabilistic algorithm for predictive control with full-complexity models in non-residential buildings

    , IEEE Access, Vol: 7, Pages: 38748-38765, ISSN: 2169-3536

    Despite the increasing capabilities of information technologies for data acquisition and processing, building energy management systems still require manual configuration and supervision to achieve optimal performance. Model predictive control (MPC) aims to leverage equipment control – particularly heating, ventilation and air conditioning (HVAC)– by using a model of the building to capture its dynamic characteristics and to predict its response to alternative control scenarios. Usually, MPC approaches are based on simplified linear models, which support faster computation but also present some limitations regarding interpretability, solution diversification and longer-term optimization. In this work, we propose a novel MPC algorithm that uses a full-complexity grey-box simulation model to optimize HVAC operation in non-residential buildings. Our system generates hundreds of candidate operation plans, typically for the next day, and evaluates them in terms of consumption and comfort by means of a parallel simulator configured according to the expected building conditions (weather, occupancy, etc.) The system has been implemented and tested in an office building in Helsinki, both in a simulated environment and in the real building, yielding energy savings around 35% during the intermediate winter season and 20% in the whole winter season with respect to the current operation of the heating equipment.

  • Journal article
    Rueda R, Cuéllar M, Molina-Solana M, Guo Y, Pegalajar Met al., 2019,

    Generalised regression hypothesis induction for energy consumption forecasting

    , Energies, Vol: 12, Pages: 1069-1069, ISSN: 1996-1073

    This work addresses the problem of energy consumption time series forecasting. In our approach, a set of time series containing energy consumption data is used to train a single, parameterised prediction model that can be used to predict future values for all the input time series. As a result, the proposed method is able to learn the common behaviour of all time series in the set (i.e., a fingerprint) and use this knowledge to perform the prediction task, and to explain this common behaviour as an algebraic formula. To that end, we use symbolic regression methods trained with both single- and multi-objective algorithms. Experimental results validate this approach to learn and model shared properties of different time series, which can then be used to obtain a generalised regression model encapsulating the global behaviour of different energy consumption time series.

  • Journal article
    Jevnikar Z, Östling J, Ax E, Calvén J, Thörn K, Israelsson E, Öberg L, Singhania A, Lau LCK, Wilson SJ, Ward JA, Chauhan A, Sousa AR, De Meulder B, Loza MJ, Baribaud F, Sterk PJ, Chung KF, Sun K, Guo Y, Adcock IM, Payne D, Dahlen B, Chanez P, Shaw DE, Krug N, Hohlfeld JM, Sandström T, Djukanovic R, James A, Hinks TSC, Howarth PH, Vaarala O, van Geest M, Olsson HK, U-BIOPRED study groupet al., 2019,

    Epithelial IL-6 trans-signaling defines a new asthma phenotype with increased airway inflammation

    , Journal of Allergy and Clinical Immunology, Vol: 143, Pages: 577-590, ISSN: 0091-6749

    BACKGROUND: Although several studies link high levels of IL-6 and soluble IL-6 receptor (sIL-6R) with asthma severity and decreased lung function, the role of IL-6 trans-signaling (IL-6TS) in asthma is unclear. OBJECTIVE: To explore the association between epithelial IL-6TS pathway activation and molecular and clinical phenotypes in asthma. METHODS: An IL-6TS gene signature, obtained from air-liquid interface (ALI) cultures of human bronchial epithelial cells stimulated with IL-6 and sIL-6R, was used to stratify lung epithelium transcriptomic data (U-BIOPRED cohorts) by hierarchical clustering. IL-6TS-specific protein markers were used to stratify sputum biomarker data (Wessex cohort). Molecular phenotyping was based on transcriptional profiling of epithelial brushings, pathway analysis and immunohistochemical analysis of bronchial biopsies. RESULTS: Activation of IL-6TS in ALI cultures reduced epithelial integrity and induced a specific gene signature enriched in genes associated with airway remodeling. The IL-6TS signature identified a subset of IL-6TS High asthma patients with increased epithelial expression of IL-6TS inducible genes in absence of systemic inflammation. The IL-6TS High subset had an overrepresentation of frequent exacerbators, blood eosinophilia, and submucosal infiltration of T cells and macrophages. In bronchial brushings, TLR pathway genes were up-regulated while the expression of tight junction genes was reduced. Sputum sIL-6R and IL-6 levels correlated with sputum markers of remodeling and innate immune activation, in particular YKL-40, MMP3, MIP-1β, IL-8 and IL-1β. CONCLUSIONS: Local lung epithelial IL-6TS activation in absence of type 2 airway inflammation defines a novel subset of asthmatics and may drive airway inflammation and epithelial dysfunction in these patients.

  • Journal article
    Simpson AJ, Hekking P-P, Shaw DE, Fleming LJ, Roberts G, Riley JH, Bates S, Sousa AR, Bansal AT, Pandis I, Sun K, Bakke PS, Caruso M, Dahlén B, Dahlén S-E, Horvath I, Krug N, Montuschi P, Sandstrom T, Singer F, Adcock IM, Wagers SS, Djukanovic R, Chung KF, Sterk PJ, Fowler SJ, U-BIOPRED Study Groupet al., 2019,

    Treatable traits in the European U-BIOPRED adult asthma cohorts

    , Allergy, Vol: 74, Pages: 406-411, ISSN: 0105-4538
  • Journal article
    de Montjoye Y-A, Gambs S, Blondel V, Canright G, de Cordes N, Deletaille S, Engø-Monsen K, Garcia-Herranz M, Kendall J, Kerry C, Krings G, Letouzé E, Luengo-Oroz M, Oliver N, Rocher L, Rutherford A, Smoreda Z, Steele J, Wetter E, Pentland AS, Bengtsson Let al., 2018,

    On the privacy-conscientious use of mobile phone data

    , Scientific Data, Vol: 5, ISSN: 2052-4463

    The breadcrumbs we leave behind when using our mobile phones—who somebody calls, for how long, and from where—contain unprecedented insights about us and our societies. Researchers have compared the recent availability of large-scale behavioral datasets, such as the ones generated by mobile phones, to the invention of the microscope, giving rise to the new field of computational social science.

  • Journal article
    Gomez-Romero J, Molina-Solana MJ, Oehmichen A, Guo Yet al., 2018,

    Visualizing large knowledge graphs: a performance analysis

    , Future Generation Computer Systems, Vol: 89, Pages: 224-238, ISSN: 0167-739X

    Knowledge graphs are an increasingly important source of data and context information in Data Science. A first step in data analysis is data exploration, in which visualization plays a key role. Currently, Semantic Web technologies are prevalent for modelling and querying knowledge graphs; however, most visualization approaches in this area tend to be overly simplified and targeted to small-sized representations. In this work, we describe and evaluate the performance of a Big Data architecture applied to large-scale knowledge graph visualization. To do so, we have implemented a graph processing pipeline in the Apache Spark framework and carried out several experiments with real-world and synthetic graphs. We show that distributed implementations of the graph building, metric calculation and layout stages can efficiently manage very large graphs, even without applying partitioning or incremental processing strategies.

This data is extracted from the Web of Science and reproduced under a licence from Thomson Reuters. You may not copy or re-distribute this data in whole or in part without the written consent of the Science business of Thomson Reuters.

Request URL: http://wlsprd.imperial.ac.uk:80/respub/WEB-INF/jsp/search-t4-html.jsp Request URI: /respub/WEB-INF/jsp/search-t4-html.jsp Query String: id=607&limit=15&page=1&respub-action=search.html Current Millis: 1571517427266 Current Time: Sat Oct 19 21:37:07 BST 2019