Publications from our Researchers

Several of our current PhD candidates and fellow researchers at the Data Science Institute have published, or in the proccess of publishing, papers to present their research.  

Search or filter publications

Filter by type:

Filter by publication type

Filter by year:



  • Showing results for:
  • Reset all filters

Search results

  • Journal article
    Fernando S, AmadorDíazLópez J, Şerban O, Gómez-Romero J, Molina-Solana M, Guo Yet al., 2020,

    Towards a large-scale twitter observatory for political events

    , Future Generation Computer Systems, Vol: 110, Pages: 976-983, ISSN: 0167-739X

    Explosion in usage of social media has made its analysis a relevant topic of interest, and particularly so in the political science area. Within Data Science, no other techniques are more widely accepted and appealing than visualisation. However, with datasets growing in size, visualisation tools also require a paradigm shift to remain useful in big data contexts. This work presents our proposal for a Large-Scale Twitter Observatory that enables researchers to efficiently retrieve, analyse and visualise data from this social network to gain actionable insights and knowledge related with political events. In addition to describing the supporting technologies, we put forward a working pipeline and validate the setup with different examples.

  • Journal article
    Fernando S, Scott-Brown J, Şerban O, Birch D, Akroyd D, Molina-Solana M, Heinis T, Guo Yet al., 2020,

    Open Visualization Environment (OVE): A web framework for scalable rendering of data visualizations

    , Future Generation Computer Systems, Vol: 112, Pages: 785-799, ISSN: 0167-739X

    Scalable resolution display environments, including immersive data observatories, are emerging as equitable and socially engaging platforms for collaborative data exploration and decision making. These environments require specialized middleware to drive them, but, due to various limitations, there is still a gap in frameworks capable of scalable rendering of data visualizations. To overcome these limitations, we introduce a new modular open-source middleware, the Open Visualization Environment (OVE). This framework uses web technologies to provide an ecosystem for visualizing data using web browsers that span hundreds of displays. In this paper, we discuss the key design features and architecture of our framework as well as its limitations. This is followed by an extensive study on performance and scalability, which validates its design and compares it to the popular SAGE2 middleware. We show how our framework solves three key limitations in SAGE2. Thereafter, we present two of our projects that used OVE and show how it can extend SAGE2 to overcome limitations and simplify the user experience for common data visualization use-cases.

  • Journal article
    Martínez V, Fernando S, Molina-Solana M, Guo Yet al., 2020,

    Tuoris: A middleware for visualizing dynamic graphics in scalable resolution display environments

    , Future Generation Computer Systems, Vol: 106, Pages: 559-571, ISSN: 0167-739X

    In the era of big data, large-scale information visualization has become an important challenge. Scalable resolution display environments (SRDEs) have emerged as a technological solution for building high-resolution display systems by tiling lower resolution screens. These systems bring serious advantages, including lower construction cost and better maintainability compared to other alternatives. However, they require specialized software but also purpose-built content to suit the inherently complex underlying systems. This creates several challenges when designing visualizations for big data, such that can be reused across several SRDEs of varying dimensions. This is not yet a common practice but is becoming increasingly popular among those who engage in collaborative visual analytics in data observatories. In this paper, we define three key requirements for systems suitable for such environments, point out limitations of existing frameworks, and introduce Tuoris, a novel open-source middleware for visualizing dynamic graphics in SRDEs. Tuoris manages the complexity of distributing and synchronizing the information among different components of the system, eliminating the need for purpose-built content. This makes it possible for users to seamlessly port existing graphical content developed using standard web technologies, and simplifies the process of developing advanced, dynamic and interactive web applications for large-scale information visualization. Tuoris is designed to work with Scalable Vector Graphics (SVG), reducing bandwidth consumption and achieving high frame rates in visualizations with dynamic animations. It scales independent of the display wall resolution and contrasts with other frameworks that transmit visual information as blocks of images.

  • Journal article
    Jolliffe DA, Stefanidis C, Wang Z, Kermani NZ, Dimitrov V, White JH, McDonough JE, Janssens W, Pfeffer P, Griffiths CJ, Bush A, Guo Y, Christenson S, Adcock IM, Chung KF, Thummel KE, Martineau ARet al., 2020,

    Vitamin D Metabolism is Dysregulated in Asthma and Chronic Obstructive Pulmonary Disease.

    , Am J Respir Crit Care Med

    RATIONALE: Vitamin D deficiency is common in patients with asthma and COPD. Low 25-hydroxyvitamin D (25[OH]D) levels may represent a cause or a consequence of these conditions. OBJECTIVE: To determine whether vitamin D metabolism is altered in asthma or COPD. METHODS: We conducted a longitudinal study in 186 adults to determine whether the 25(OH)D response to six oral doses of 3 mg vitamin D3, administered over one year, differed between those with asthma or COPD vs. controls. Serum concentrations of vitamin D3, 25(OH)D3 and 1α,25-dihydroxyvitamin D3 (1α,25[OH]2D3) were determined pre- and post-supplementation in 93 adults with asthma, COPD or neither condition, and metabolite-to-parent compound molar ratios were compared between groups to estimate hydroxylase activity. Additionally, we analyzed fourteen datasets to compare expression of 1α,25[OH]2D3-inducible gene expression signatures in clinical samples taken from adults with asthma or COPD vs. controls. MEASUREMENTS AND MAIN RESULTS: The mean post-supplementation 25(OH)D increase in participants with asthma (20.9 nmol/L) and COPD (21.5 nmol/L) was lower than in controls (39.8 nmol/L; P=0.001). Compared with controls, patients with asthma and COPD had lower molar ratios of 25(OH)D3-to-vitamin D3 and higher molar ratios of 1α,25(OH)2D3-to-25(OH)D3 both pre- and post-supplementation (P≤0.005). Inter-group differences in 1α,25[OH]2D3-inducible gene expression signatures were modest and variable where statistically significant. CONCLUSIONS: Attenuation of the 25(OH)D response to vitamin D supplementation in asthma and COPD associated with reduced molar ratios of 25(OH)D3-to-vitamin D3 and increased molar ratios of 1α,25(OH)2D3-to-25(OH)D3 in serum, suggesting that vitamin D metabolism is dysregulated in these conditions.

  • Journal article
    Ali MK, Kim RY, Brown AC, Mayall JR, Karim R, Pinkerton JW, Liu G, Martin KL, Starkey MR, Pillar A, Donovan C, Pathinayake PS, Carroll OR, Trinder D, Tay HL, Badi YE, Kermani NZ, Guo Y-K, Aryal R, Mumby S, Pavlidis S, Adcock IM, Weaver J, Xenaki D, Oliver BG, Holliday EG, Foster PS, Wark PA, Johnstone DM, Milward EA, Hansbro PM, Horvat JCet al., 2020,

    Crucial role for lung iron level and regulation in the pathogenesis and severity of asthma.

    , European Respiratory Journal, Vol: 55, ISSN: 0903-1936

    Accumulating evidence highlights links between iron regulation and respiratory disease. Here, we assessed the relationship between iron levels and regulatory responses in clinical and experimental asthma.We show that cell-free iron levels are reduced in the bronchoalveolar lavage (BAL) supernatant of severe or mild-moderate asthma patients and correlate with lower forced expiratory volume in 1 s (FEV1). Conversely, iron-loaded cell numbers were increased in BAL in these patients and with lower FEV1/forced vital capacity (FEV1/FVC). The airway tissue expression of the iron sequestration molecules divalent metal transporter 1 (DMT1) and transferrin receptor 1 (TFR1) are increased in asthma with TFR1 expression correlating with reduced lung function and increased type 2 (T2) inflammatory responses in the airways. Furthermore, pulmonary iron levels are increased in a house dust mite (HDM)-induced model of experimental asthma in association with augmented Tfr1 expression in airway tissue, similar to human disease. We show that macrophages are the predominant source of increased Tfr1 and Tfr1+ macrophages have increased Il13 expression. We also show that increased iron levels induce increased pro-inflammatory cytokine and/or extracellular matrix (ECM) responses in human airway smooth muscle (ASM) cells and fibroblasts ex vivo and induce key features of asthma, including airway hyper-responsiveness and fibrosis and T2 inflammatory responses, in vivoTogether these complementary clinical and experimental data highlight the importance of altered pulmonary iron levels and regulation in asthma, and the need for a greater focus on the role and potential therapeutic targeting of iron in the pathogenesis and severity of disease.

  • Journal article
    Rajpal H, Rosas De Andraca FE, Jensen HJ, 2019,

    Tangled worldview model of opinion dynamics

    , Frontiers in Physics, Vol: 7, ISSN: 2296-424X

    We study the joint evolution of worldviews by proposing a model of opinion dynamics, which is inspired in notions fromevolutionary ecology. Agents update their opinion on a specific issue based on their propensity to change – asserted by thesocial neighbours – weighted by their mutual similarity on other issues. Agents are, therefore, more influenced by neighbourswith similar worldviews (set of opinions on various issues), resulting in a complex co-evolution of each opinion. Simulationsshow that the worldview evolution exhibits events of intermittent polarization when the social network is scale-free. This, in turn,triggers extreme crashes and surges in the popularity of various opinions. Using the proposed model, we highlight the role ofnetwork structure, bounded rationality of agents, and the role of key influential agents in causing polarization and intermittentreformation of worldviews on scale-free networks.

  • Journal article
    Cofré R, Herzog R, Corcoran D, Rosas FEet al., 2019,

    A comparison of the maximum entropy principle across biological spatial scales

    , Entropy: international and interdisciplinary journal of entropy and information studies, Vol: 21, Pages: 1-20, ISSN: 1099-4300

    Despite their differences, biological systems at different spatial scales tend to exhibit common organizational patterns. Unfortunately, these commonalities are often hard to grasp due to the highly specialized nature of modern science and the parcelled terminology employed by various scientific sub-disciplines. To explore these common organizational features, this paper provides a comparative study of diverse applications of the maximum entropy principle, which has found many uses at different biological spatial scales ranging from amino acids up to societies. By presenting these studies under a common approach and language, this paper aims to establish a unified view over these seemingly highly heterogeneous scenarios.

  • Journal article
    Tiotiu A, Kermani NZ, Agapow P, Saqi M, Guo Y-K, Djukanovic R, Chung KF, Adcock IMet al., 2019,

    Differential macrophage activation in asthmatic sputum using U-BIOPRED transcriptomics

  • Journal article
    Kermani NZ, Pavlidis S, Riley JH, Chung FK, Adcock IM, Guo Y-Ket al., 2019,

    Prediction of longitudinal inflammatory phenotypes using baseline sputum transcriptomics in UBIOPRED

  • Journal article
    Cofré R, Videla L, Rosas F, 2019,

    An introduction to the non-equilibrium steady states of maximum entropy spike trains

    , Entropy, Vol: 21, Pages: 1-28, ISSN: 1099-4300

    Although most biological processes are characterized by a strong temporal asymmetry, several popular mathematical models neglect this issue. Maximum entropy methods provide a principled way of addressing time irreversibility, which leverages powerful results and ideas from the literature of non-equilibrium statistical mechanics. This tutorial provides a comprehensive overview of these issues, with a focus in the case of spike train statistics. We provide a detailed account of the mathematical foundations and work out examples to illustrate the key concepts and results from non-equilibrium statistical mechanics.

  • Conference paper
    Truong N, Sun K, Guo Y, 2019,

    Blockchain-based personal data management: from fiction to solution

    , The 18th IEEE International Symposium on Network Computing and Applications (NCA 2019), Publisher: IEEE

    The emerging blockchain technology has enabledvarious decentralised applications in a trustless environmentwithout relying on a trusted intermediary. It is expected as apromising solution to tackle sophisticated challenges on personaldata management, thanks to its advanced features such as im-mutability, decentralisation and transparency. Although certainapproaches have been proposed to address technical difficultiesin personal data management; most of them only provided pre-liminary methodological exploration. Alarmingly, when utilisingBlockchain for developing a personal data management system,fictions have occurred in existing approaches and been promul-gated in the literature. Such fictions are theoretically doable;however, by thoroughly breaking down consensus protocols andtransaction validation processes, we clarify that such existingapproaches are either impractical or highly inefficient due tothe natural limitations of the blockchain and Smart Contractstechnologies. This encourages us to propose a feasible solution inwhich such fictions are reduced by designing a novel systemarchitecture with a blockchain-based “proof of permission”protocol. We demonstrate the feasibility and efficiency of theproposed models by implementing a clinical data sharing servicebuilt on top of a public blockchain platform. We believe thatour research resolves existing ambiguity and take a step furtheron providing a practically feasible solution for decentralisedpersonal data management.

  • Conference paper
    Gadotti A, Houssiau F, Rocher L, Livshits B, de Montjoye Y-Aet al., 2019,

    When the Signal is in the Noise: Exploiting Diffix's Sticky Noise

    , 28th USENIX Security Symposium (USENIX Security '19), Publisher: USENIX

    Anonymized data is highly valuable to both businesses andresearchers. A large body of research has however shown thestrong limits of the de-identification release-and-forget model,where data is anonymized and shared. This has led to the de-velopment of privacy-preserving query-based systems. Basedon the idea of “sticky noise”, Diffix has been recently pro-posed as a novel query-based mechanism satisfying alone theEU Article 29 Working Party’s definition of anonymization.According to its authors, Diffix adds less noise to answersthan solutions based on differential privacy while allowingfor an unlimited number of queries.This paper presents a new class of noise-exploitation at-tacks, exploiting the noise added by the system to infer privateinformation about individuals in the dataset. Our first differen-tial attack uses samples extracted from Diffix in a likelihoodratio test to discriminate between two probability distributions.We show that using this attack against a synthetic best-casedataset allows us to infer private information with 89.4% ac-curacy using only 5 attributes. Our second cloning attack usesdummy conditions that conditionally strongly affect the out-put of the query depending on the value of the private attribute.Using this attack on four real-world datasets, we show thatwe can infer private attributes of at least 93% of the users inthe dataset with accuracy between 93.3% and 97.1%, issuinga median of 304 queries per user. We show how to optimizethis attack, targeting 55.4% of the users and achieving 91.7%accuracy, using a maximum of only 32 queries per user.Our attacks demonstrate that adding data-dependent noise,as done by Diffix, is not sufficient to prevent inference ofprivate attributes. We furthermore argue that Diffix alone failsto satisfy Art. 29 WP’s definition of anonymization. We con-clude by discussing how non-provable privacy-preserving systems can be combined with fundamental security principlessuch as defense-in

  • Journal article
    Rocher L, Hendrickx J, de Montjoye Y-A, 2019,

    Estimating the success of re-identifications in incomplete datasets using generative models

    , Nature Communications, Vol: 10, ISSN: 2041-1723

    While rich medical, behavioral, and socio-demographic data are key to modern data-driven research, their collection and use raise legitimate privacy concerns. Anonymizing datasets through de-identification and sampling before sharing them has been the main tool used to address those concerns. We here propose a generative copula-based method that can accurately estimate the likelihood of a specific person to be correctly re-identified, even in a heavily incomplete dataset. On 210 populations, our method obtains AUC scores for predicting individual uniqueness ranging from 0.84 to 0.97, with low false-discovery rate. Using our model, we find that 99.98% of Americans would be correctly re-identified in any dataset using 15 demographic attributes. Our results suggest that even heavily sampled anonymized datasets are unlikely to satisfy the modern standards for anonymization set forth by GDPR and seriously challenge the technical and legal adequacy of the de-identification release-and-forget model.

  • Report
    Crémer J, de Montjoye Y-A, Schweitzer H, 2019,

    Competition policy for the digital era

    , Competition policy for the digital era, Brussels, Publisher: EU Publications
  • Conference paper
    Jain S, Bensaid E, de Montjoye Y-A, 2019,

    UNVEIL: capture and visualise WiFi data leakages

    , The Web Conference 2019, Publisher: ACM, Pages: 3550-3554

    In the past few years, numerous privacy vulnerabilities have been discovered in the WiFi standards and their implementations for mobile devices. These vulnerabilities allow an attacker to collect large amounts of data on the device user, which could be used to infer sensitive information such as religion, gender, and sexual orientation. Solutions for these vulnerabilities are often hard to design and typically require many years to be widely adopted, leaving many devices at risk.In this paper, we present UNVEIL - an interactive and extendable platform to demonstrate the consequences of these attacks. The platform performs passive and active attacks on smartphones to collect and analyze data leaked through WiFi and communicate the analysis results to users through simple and interactive visualizations.The platform currently performs two attacks. First, it captures probe requests sent by nearby devices and combines them with public WiFi location databases to generate a map of locations previously visited by the device users. Second, it creates rogue access points with SSIDs of popular public WiFis (e.g. _Heathrow WiFi, Railways WiFi) and records the resulting internet traffic. This data is then analyzed and presented in a format that highlights the privacy leakage. The platform has been designed to be easily extendable to include more attacks and to be easily deployable in public spaces. We hope that UNVEIL will help raise public awareness of privacy risks of WiFi networks.

This data is extracted from the Web of Science and reproduced under a licence from Thomson Reuters. You may not copy or re-distribute this data in whole or in part without the written consent of the Science business of Thomson Reuters.

Request URL: Request URI: /respub/WEB-INF/jsp/search-t4-html.jsp Query String: id=607&limit=15&respub-action=search.html Current Millis: 1596584935886 Current Time: Wed Aug 05 00:48:55 BST 2020