Publications from our Researchers

Several of our current PhD candidates and fellow researchers at the Data Science Institute have published, or in the proccess of publishing, papers to present their research.  

Search or filter publications

Filter by type:

Filter by publication type

Filter by year:

to

Results

  • Showing results for:
  • Reset all filters

Search results

  • Journal article
    Fernando S, AmadorDíazLópez J, Şerban O, Gómez-Romero J, Molina-Solana M, Guo Yet al., 2019,

    Towards a large-scale twitter observatory for political events

    , Future Generation Computer Systems, ISSN: 0167-739X

    Explosion in usage of social media has made its analysis a relevant topic of interest, and particularly so in the political science area. Within Data Science, no other techniques are more widely accepted and appealing than visualisation. However, with datasets growing in size, visualisation tools also require a paradigm shift to remain useful in big data contexts. This work presents our proposal for a Large-Scale Twitter Observatory that enables researchers to efficiently retrieve, analyse and visualise data from this social network to gain actionable insights and knowledge related with political events. In addition to describing the supporting technologies, we put forward a working pipeline and validate the setup with different examples.

  • Journal article
    Rajpal H, Rosas De Andraca FE, Jensen HJ, 2019,

    Tangled worldview model of opinion dynamics

    , Frontiers in Physics, Vol: 7, ISSN: 2296-424X

    We study the joint evolution of worldviews by proposing a model of opinion dynamics, which is inspired in notions fromevolutionary ecology. Agents update their opinion on a specific issue based on their propensity to change – asserted by thesocial neighbours – weighted by their mutual similarity on other issues. Agents are, therefore, more influenced by neighbourswith similar worldviews (set of opinions on various issues), resulting in a complex co-evolution of each opinion. Simulationsshow that the worldview evolution exhibits events of intermittent polarization when the social network is scale-free. This, in turn,triggers extreme crashes and surges in the popularity of various opinions. Using the proposed model, we highlight the role ofnetwork structure, bounded rationality of agents, and the role of key influential agents in causing polarization and intermittentreformation of worldviews on scale-free networks.

  • Journal article
    Cofré R, Videla L, Rosas F, 2019,

    An introduction to the non-equilibrium steady states of maximum entropy spike trains

    , Entropy, Vol: 21, Pages: 1-28, ISSN: 1099-4300

    Although most biological processes are characterized by a strong temporal asymmetry, several popular mathematical models neglect this issue. Maximum entropy methods provide a principled way of addressing time irreversibility, which leverages powerful results and ideas from the literature of non-equilibrium statistical mechanics. This tutorial provides a comprehensive overview of these issues, with a focus in the case of spike train statistics. We provide a detailed account of the mathematical foundations and work out examples to illustrate the key concepts and results from non-equilibrium statistical mechanics.

  • Conference paper
    Truong N, Sun K, Guo Y,

    Blockchain-based personal data management: from fiction to solution

    , The 18th IEEE International Symposium on Network Computing and Applications (NCA 2019), Publisher: IEEE

    The emerging blockchain technology has enabledvarious decentralised applications in a trustless environmentwithout relying on a trusted intermediary. It is expected as apromising solution to tackle sophisticated challenges on personaldata management, thanks to its advanced features such as im-mutability, decentralisation and transparency. Although certainapproaches have been proposed to address technical difficultiesin personal data management; most of them only provided pre-liminary methodological exploration. Alarmingly, when utilisingBlockchain for developing a personal data management system,fictions have occurred in existing approaches and been promul-gated in the literature. Such fictions are theoretically doable;however, by thoroughly breaking down consensus protocols andtransaction validation processes, we clarify that such existingapproaches are either impractical or highly inefficient due tothe natural limitations of the blockchain and Smart Contractstechnologies. This encourages us to propose a feasible solution inwhich such fictions are reduced by designing a novel systemarchitecture with a blockchain-based “proof of permission”protocol. We demonstrate the feasibility and efficiency of theproposed models by implementing a clinical data sharing servicebuilt on top of a public blockchain platform. We believe thatour research resolves existing ambiguity and take a step furtheron providing a practically feasible solution for decentralisedpersonal data management.

  • Conference paper
    Gadotti A, Houssiau F, Rocher L, Livshits B, de Montjoye Y-Aet al.,

    When the Signal is in the Noise: Exploiting Diffix's Sticky Noise

    , 28th USENIX Security Symposium (USENIX Security '19), Publisher: USENIX

    Anonymized data is highly valuable to both businesses andresearchers. A large body of research has however shown thestrong limits of the de-identification release-and-forget model,where data is anonymized and shared. This has led to the de-velopment of privacy-preserving query-based systems. Basedon the idea of “sticky noise”, Diffix has been recently pro-posed as a novel query-based mechanism satisfying alone theEU Article 29 Working Party’s definition of anonymization.According to its authors, Diffix adds less noise to answersthan solutions based on differential privacy while allowingfor an unlimited number of queries.This paper presents a new class of noise-exploitation at-tacks, exploiting the noise added by the system to infer privateinformation about individuals in the dataset. Our first differen-tial attack uses samples extracted from Diffix in a likelihoodratio test to discriminate between two probability distributions.We show that using this attack against a synthetic best-casedataset allows us to infer private information with 89.4% ac-curacy using only 5 attributes. Our second cloning attack usesdummy conditions that conditionally strongly affect the out-put of the query depending on the value of the private attribute.Using this attack on four real-world datasets, we show thatwe can infer private attributes of at least 93% of the users inthe dataset with accuracy between 93.3% and 97.1%, issuinga median of 304 queries per user. We show how to optimizethis attack, targeting 55.4% of the users and achieving 91.7%accuracy, using a maximum of only 32 queries per user.Our attacks demonstrate that adding data-dependent noise,as done by Diffix, is not sufficient to prevent inference ofprivate attributes. We furthermore argue that Diffix alone failsto satisfy Art. 29 WP’s definition of anonymization. We con-clude by discussing how non-provable privacy-preserving systems can be combined with fundamental security principlessuch as defense-in

  • Journal article
    Rocher L, Hendrickx J, de Montjoye Y-A, 2019,

    Estimating the success of re-identifications in incomplete datasets using generative models

    , Nature Communications, Vol: 10, ISSN: 2041-1723

    While rich medical, behavioral, and socio-demographic data are key to modern data-driven research, their collection and use raise legitimate privacy concerns. Anonymizing datasets through de-identification and sampling before sharing them has been the main tool used to address those concerns. We here propose a generative copula-based method that can accurately estimate the likelihood of a specific person to be correctly re-identified, even in a heavily incomplete dataset. On 210 populations, our method obtains AUC scores for predicting individual uniqueness ranging from 0.84 to 0.97, with low false-discovery rate. Using our model, we find that 99.98% of Americans would be correctly re-identified in any dataset using 15 demographic attributes. Our results suggest that even heavily sampled anonymized datasets are unlikely to satisfy the modern standards for anonymization set forth by GDPR and seriously challenge the technical and legal adequacy of the de-identification release-and-forget model.

  • Conference paper
    Fernando S, Birch D, Molina-Solana M, McIlwraith D, Guo Yet al., 2019,

    Compositional Microservices for Immersive Social Visual Analytics

    , Pages: 216-223, ISSN: 1093-9547

    © 2019 IEEE. As humans, we have developed to process highly complex visual data from our surroundings. This is why data visualization and interaction is one of the quickest ways to facilitate investigation and communicate understanding. To perform visual analytics effectively at the big data scale it is crucial that we develop an integrated processing and visualization ecosystem. However, to date, in Large High-Resolution Display (LHRD) environments the worlds of data processing and visualization remain largely disconnected. In this paper, we propose a common architectural approach to enable integrated data processing and distributed visualization via the composition of discrete microservices. Each of these microservices provides a very specific clearly-defined function, such as analyzing data, creating a visualization, sharding data or providing a synchronization source. By defining common transport, data and API formats we enable the composition of these microservices from processing raw data through to analytics, visualization and rendering. This compositionality, inspired by successful data-driven visualization frameworks provides a common platform for immersive social visual analytics.

  • Report
    Crémer J, de Montjoye Y-A, Schweitzer H, 2019,

    Competition policy for the digital era

    , Competition policy for the digital era, Brussels, Publisher: EU Publications
  • Conference paper
    Jain S, Bensaid E, de Montjoye Y-A, 2019,

    UNVEIL: capture and visualise WiFi data leakages

    , The Web Conference 2019, Publisher: ACM, Pages: 3550-3554

    In the past few years, numerous privacy vulnerabilities have been discovered in the WiFi standards and their implementations for mobile devices. These vulnerabilities allow an attacker to collect large amounts of data on the device user, which could be used to infer sensitive information such as religion, gender, and sexual orientation. Solutions for these vulnerabilities are often hard to design and typically require many years to be widely adopted, leaving many devices at risk.In this paper, we present UNVEIL - an interactive and extendable platform to demonstrate the consequences of these attacks. The platform performs passive and active attacks on smartphones to collect and analyze data leaked through WiFi and communicate the analysis results to users through simple and interactive visualizations.The platform currently performs two attacks. First, it captures probe requests sent by nearby devices and combines them with public WiFi location databases to generate a map of locations previously visited by the device users. Second, it creates rogue access points with SSIDs of popular public WiFis (e.g. _Heathrow WiFi, Railways WiFi) and records the resulting internet traffic. This data is then analyzed and presented in a format that highlights the privacy leakage. The platform has been designed to be easily extendable to include more attacks and to be easily deployable in public spaces. We hope that UNVEIL will help raise public awareness of privacy risks of WiFi networks.

  • Journal article
    Brinkman P, Wagener AH, Hekking P-P, Bansal AT, Maitland-van der Zee A-H, Wang Y, Weda H, Knobel HH, Vink TJ, Rattray NJ, D'Amico A, Pennazza G, Santonico M, Lefaudeux D, De Meulder B, Auffray C, Bakke PS, Caruso M, Chanez P, Chung KF, Corfield J, Dahlen S-E, Djukanovic R, Geiser T, Horvath I, Krug N, Musial J, Sun K, Riley JH, Shaw DE, Sandstrom T, Sousa AR, Montuschi P, Fowler SJ, Sterk PJet al., 2019,

    Identification and prospective stability of electronic nose (eNose)-derived inflammatory phenotypes in patients with severe asthma

    , JOURNAL OF ALLERGY AND CLINICAL IMMUNOLOGY, Vol: 143, Pages: 1811-+, ISSN: 0091-6749
  • Journal article
    Gomez-Romero J, Fernandez-Basso CJ, Cambronero MV, Molina-Solana M, Campana JR, Ruiz MD, Martin-Bautista MJet al., 2019,

    A probabilistic algorithm for predictive control with full-complexity models in non-residential buildings

    , IEEE Access, Vol: 7, Pages: 38748-38765, ISSN: 2169-3536

    Despite the increasing capabilities of information technologies for data acquisition and processing, building energy management systems still require manual configuration and supervision to achieve optimal performance. Model predictive control (MPC) aims to leverage equipment control – particularly heating, ventilation and air conditioning (HVAC)– by using a model of the building to capture its dynamic characteristics and to predict its response to alternative control scenarios. Usually, MPC approaches are based on simplified linear models, which support faster computation but also present some limitations regarding interpretability, solution diversification and longer-term optimization. In this work, we propose a novel MPC algorithm that uses a full-complexity grey-box simulation model to optimize HVAC operation in non-residential buildings. Our system generates hundreds of candidate operation plans, typically for the next day, and evaluates them in terms of consumption and comfort by means of a parallel simulator configured according to the expected building conditions (weather, occupancy, etc.) The system has been implemented and tested in an office building in Helsinki, both in a simulated environment and in the real building, yielding energy savings around 35% during the intermediate winter season and 20% in the whole winter season with respect to the current operation of the heating equipment.

  • Journal article
    Creswell A, Bharath AA, 2019,

    Denoising adversarial autoencoders

    , IEEE Transactions on Neural Networks and Learning Systems, Vol: 30, Pages: 968-984, ISSN: 2162-2388

    Unsupervised learning is of growing interest becauseit unlocks the potential held in vast amounts of unlabelled data tolearn useful representations for inference. Autoencoders, a formof generative model, may be trained by learning to reconstructunlabelled input data from a latent representation space. Morerobust representations may be produced by an autoencoderif it learns to recover clean input samples from corruptedones. Representations may be further improved by introducingregularisation during training to shape the distribution of theencoded data in the latent space. We suggestdenoising adversarialautoencoders, which combine denoising and regularisation, shap-ing the distribution of latent space using adversarial training.We introduce a novel analysis that shows how denoising maybe incorporated into the training and sampling of adversarialautoencoders. Experiments are performed to assess the contri-butions that denoising makes to the learning of representationsfor classification and sample synthesis. Our results suggest thatautoencoders trained using a denoising criterion achieve higherclassification performance, and can synthesise samples that aremore consistent with the input data than those trained withouta corruption process.

  • Journal article
    Rueda R, Cuéllar M, Molina-Solana M, Guo Y, Pegalajar Met al., 2019,

    Generalised regression hypothesis induction for energy consumption forecasting

    , Energies, Vol: 12, Pages: 1069-1069, ISSN: 1996-1073

    This work addresses the problem of energy consumption time series forecasting. In our approach, a set of time series containing energy consumption data is used to train a single, parameterised prediction model that can be used to predict future values for all the input time series. As a result, the proposed method is able to learn the common behaviour of all time series in the set (i.e., a fingerprint) and use this knowledge to perform the prediction task, and to explain this common behaviour as an algebraic formula. To that end, we use symbolic regression methods trained with both single- and multi-objective algorithms. Experimental results validate this approach to learn and model shared properties of different time series, which can then be used to obtain a generalised regression model encapsulating the global behaviour of different energy consumption time series.

  • Journal article
    Jevnikar Z, Östling J, Ax E, Calvén J, Thörn K, Israelsson E, Öberg L, Singhania A, Lau LCK, Wilson SJ, Ward JA, Chauhan A, Sousa AR, De Meulder B, Loza MJ, Baribaud F, Sterk PJ, Chung KF, Sun K, Guo Y, Adcock IM, Payne D, Dahlen B, Chanez P, Shaw DE, Krug N, Hohlfeld JM, Sandström T, Djukanovic R, James A, Hinks TSC, Howarth PH, Vaarala O, van Geest M, Olsson HK, U-BIOPRED study groupet al., 2019,

    Epithelial IL-6 trans-signaling defines a new asthma phenotype with increased airway inflammation

    , Journal of Allergy and Clinical Immunology, Vol: 143, Pages: 577-590, ISSN: 0091-6749

    BACKGROUND: Although several studies link high levels of IL-6 and soluble IL-6 receptor (sIL-6R) with asthma severity and decreased lung function, the role of IL-6 trans-signaling (IL-6TS) in asthma is unclear. OBJECTIVE: To explore the association between epithelial IL-6TS pathway activation and molecular and clinical phenotypes in asthma. METHODS: An IL-6TS gene signature, obtained from air-liquid interface (ALI) cultures of human bronchial epithelial cells stimulated with IL-6 and sIL-6R, was used to stratify lung epithelium transcriptomic data (U-BIOPRED cohorts) by hierarchical clustering. IL-6TS-specific protein markers were used to stratify sputum biomarker data (Wessex cohort). Molecular phenotyping was based on transcriptional profiling of epithelial brushings, pathway analysis and immunohistochemical analysis of bronchial biopsies. RESULTS: Activation of IL-6TS in ALI cultures reduced epithelial integrity and induced a specific gene signature enriched in genes associated with airway remodeling. The IL-6TS signature identified a subset of IL-6TS High asthma patients with increased epithelial expression of IL-6TS inducible genes in absence of systemic inflammation. The IL-6TS High subset had an overrepresentation of frequent exacerbators, blood eosinophilia, and submucosal infiltration of T cells and macrophages. In bronchial brushings, TLR pathway genes were up-regulated while the expression of tight junction genes was reduced. Sputum sIL-6R and IL-6 levels correlated with sputum markers of remodeling and innate immune activation, in particular YKL-40, MMP3, MIP-1β, IL-8 and IL-1β. CONCLUSIONS: Local lung epithelial IL-6TS activation in absence of type 2 airway inflammation defines a novel subset of asthmatics and may drive airway inflammation and epithelial dysfunction in these patients.

  • Journal article
    Simpson AJ, Hekking P-P, Shaw DE, Fleming LJ, Roberts G, Riley JH, Bates S, Sousa AR, Bansal AT, Pandis I, Sun K, Bakke PS, Caruso M, Dahlén B, Dahlén S-E, Horvath I, Krug N, Montuschi P, Sandstrom T, Singer F, Adcock IM, Wagers SS, Djukanovic R, Chung KF, Sterk PJ, Fowler SJ, U-BIOPRED Study Groupet al., 2019,

    Treatable traits in the European U-BIOPRED adult asthma cohorts

    , Allergy, Vol: 74, Pages: 406-411, ISSN: 0105-4538

This data is extracted from the Web of Science and reproduced under a licence from Thomson Reuters. You may not copy or re-distribute this data in whole or in part without the written consent of the Science business of Thomson Reuters.

Request URL: http://wlsprd.imperial.ac.uk:80/respub/WEB-INF/jsp/search-t4-html.jsp Request URI: /respub/WEB-INF/jsp/search-t4-html.jsp Query String: id=607&limit=15&respub-action=search.html Current Millis: 1575841280777 Current Time: Sun Dec 08 21:41:20 GMT 2019