Publications from our Researchers

Several of our current PhD candidates and fellow researchers at the Data Science Institute have published, or in the proccess of publishing, papers to present their research.  

Search or filter publications

Filter by type:

Filter by publication type

Filter by year:



  • Showing results for:
  • Reset all filters

Search results

  • Journal article
    Hekking PP, Loza MJ, Pavlidis S, De Meulder B, Lefaudeux D, Baribaud F, Auffray C, Wagener AH, Brinkman P, Lutter R, Bansal AT, Sousa AR, Bates S, Pandis Y, Fleming LJ, Shaw DE, Fowler SJ, Guo Y, Meiser A, Sun K, Corfield J, Howarth P, Bel EH, Adcock IM, Chung KF, Djukanovic R, Sterk PJ, U-BIOPRED Study Groupet al., 2017,

    Transcriptomic gene signatures associated with persistent airflow limitation in patients with severe asthma

    , European Respiratory Journal, Vol: 50, ISSN: 1399-3003

    Rationale:A proportion of severe asthma patients suffers from persistent airflow limitation, often associated with more symptoms and exacerbations. Little is known about the underlying mechanisms. Aiming for discovery of unexplored potential mechanisms, we used Gene Set Variation Analysis (GSVA), a sensitive technique that can detect underlying pathways in heterogeneous samples. Methods: Severe asthma patients from the U-BIOPRED cohort with persistent airflow limitation (post-bronchodilator FEV1/FVC ratio < lower limit of normal) were compared to those without persistent airflow limitation. Gene expression was assessed on the total RNA of sputum cells, nasal brushings and endobronchial brushings and biopsies. GSVA was applied to identify differentially-enriched pre-defined gene signatures based on all available gene expression publications and data on airways disease.Results: Differentially-enriched gene signatures were identified in nasal brushings (1), sputum (9), bronchial brushings (1) and bronchial biopsies (4), that were associated with response to inhaled steroids, eosinophils, IL-13, IFN-alpha, specific CD4+ T-cells and airway remodeling.Conclusion: Persistent airflow limitation in severe asthma has distinguishable underlying gene networks that are associated with treatment, inflammatory pathways and airway remodeling. These results point towards targets for the therapy of persistent airflow limitation in severe asthma.

  • Journal article
    Hekking PP, Loza MJ, Pavlidis S, De Meulder B, Lefaudeux D, Baribaud F, Auffray C, Wagener A, Brinkman P, Lutter I, Bansal A, Sousa A, Bates S, Pandis Y, Fleming L, Shaw DE, Fowler SJ, Guo Y, Meiser A, Sun K, Corfield J, Howarth P, Bel EH, Adcock IM, Chung KF, Djukanovic R, Sterk PJ, U-BIOPRED Study Groupet al., 2017,

    Pathway discovery using transcriptomic profiles in adult-onset severe asthma

    , Journal of Allergy and Clinical Immunology, Vol: 141, Pages: 1280-1290, ISSN: 1097-6825

    RationaleAdult-onset severe asthma is characterized by highly symptomatic disease despite high intensity asthma treatments. Understanding of the underlying pathways of this heterogeneous disease needed for the development of targeted treatments. Gene Set Variation Analysis (GSVA) is a statistical technique to identify gene profiles in heterogeneous samples.ObjectiveTo identify gene profiles associated with adult-onset severe asthma.MethodsThis was a cross-sectional, observational study in which adult patients with adult-onset of asthma (defined as starting at ≥18yrs old) as compared to childhood-onset severe asthma (<18 yrs) were selected from the U-BIOPRED cohort. Gene expression was assessed on the total RNA of induced sputum (n=83), nasal brushings (n=41), and endobronchial brushings (n=65) and biopsies (n=47) (Affymetrix HT HG-U133+ PM). GSVA was used to identify differentially enriched pre-defined gene signatures of leukocyte lineage, inflammatory and induced lung injury pathways.ResultsSignificant differentially enriched gene signatures in patients with adult-onset as compared to childhood-onset severe asthma were identified in nasal brushings (5 signatures), sputum (3 signatures) and endobronchial brushings (6 signatures). Signatures associated with eosinophilic airway inflammation, mast cells and group 3 innate lymphoid cells (ILC3) were more enriched in adult-onset severe asthma, whereas signatures associated with induced lung injury were less enriched in adult-onset severe asthma.ConclusionsAdult-onset severe asthma is characterized by inflammatory pathways involving eosinophils, mast cells and ILC3s. These pathways could represent useful targets for the treatment of adult-onset severe asthma.

  • Journal article
    Rossios C, Pavlidis S, Hoda U, Kuo CH, Wiegman C, Russell K, Sun K, Loza MJ, Baribaud F, Durham AL, Ojo O, Lutter R, Rowe A, Bansal A, Auffray C, Sousa A, Corfield J, Djukanovic R, Guo Y, Sterk PJ, Chung KF, Adcock IM, Unbiased Biomarkers for the Prediction of Respiratory Diseases Outcomes U-BIOPRED Consortia Project Teamet al., 2017,

    Sputum transcriptomics reveal upregulation of IL-1 receptor family members in patients with severe asthma

    , Journal of Allergy and Clinical Immunology, Vol: 141, Pages: 560-570, ISSN: 1097-6825

    BACKGROUND: Sputum analysis in asthmatic patients is used to define airway inflammatory processes and might guide therapy. OBJECTIVE: We sought to determine differential gene and protein expression in sputum samples from patients with severe asthma (SA) compared with nonsmoking patients with mild/moderate asthma. METHODS: Induced sputum was obtained from nonsmoking patients with SA, smokers/ex-smokers with severe asthma, nonsmoking patients with mild/moderate asthma (MMAs), and healthy nonsmoking control subjects. Differential cell counts, microarray analysis of cell pellets, and SOMAscan analysis of sputum analytes were performed. CRID3 was used to inhibit the inflammasome in a mouse model of SA. RESULTS: Eosinophilic and mixed neutrophilic/eosinophilic inflammation were more prevalent in patients with SA compared with MMAs. Forty-two genes probes were upregulated (>2-fold) in nonsmoking patients with severe asthma compared with MMAs, including IL-1 receptor (IL-1R) family and nucleotide-binding oligomerization domain, leucine-rich repeat and pyrin domain containing 3 (NRLP3) inflammasome members (false discovery rate < 0.05). The inflammasome proteins nucleotide-binding oligomerization domain, leucine rich repeat and pyrin domain containing 1 (NLRP1), NLRP3, and nucleotide-binding oligomerization domain (NOD)-like receptor C4 (NLRC4) were associated with neutrophilic asthma and with sputum IL-1β protein levels, whereas eosinophilic asthma was associated with an IL-13-induced TH2 signature and IL-1 receptor-like 1 (IL1RL1) mRNA expression. These differences were sputum specific because no activation of NLRP3 or enrichment of IL-1R family genes in bronchial brushings or biopsy specimens in patients with SA was observed. Expression of NLRP3 and of the IL-1R family genes was validated in the Airway Disease Endotyping for Personalized Therapeutics cohort. Inflammasome inhibition using CRID3 prevented airway hyperresponsiveness and airway inflammati

  • Journal article
    Jahani E, Sundsøy P, Bjelland J, Bengtsson L, Pentland AS, de Montjoye Y-Aet al., 2017,

    Improving official statistics in emerging markets using machine learning and mobile phone data

    , EPJ Data Science, Vol: 6, ISSN: 2193-1127

    Mobile phones are one of the fastest growing technologies in the developing world with global penetration rates reaching 90%. Mobile phone data, also called CDR, are generated everytime phones are used and recorded by carriers at scale. CDR have generated groundbreaking insights in public health, official statistics, and logistics. However, the fact that most phones in developing countries are prepaid means that the data lacks key information about the user, including gender and other demographic variables. This precludes numerous uses of this data in social science and development economic research. It furthermore severely prevents the development of humanitarian applications such as the use of mobile phone data to target aid towards the most vulnerable groups during crisis. We developed a framework to extract more than 1400 features from standard mobile phone data and used them to predict useful individual characteristics and group estimates. We here present a systematic cross-country study of the applicability of machine learning for dataset augmentation at low cost. We validate our framework by showing how it can be used to reliably predict gender and other information for more than half a million people in two countries. We show how standard machine learning algorithms trained on only 10,000 users are sufficient to predict individual’s gender with an accuracy ranging from 74.3 to 88.4% in a developed country and from 74.5 to 79.7% in a developing country using only metadata. This is significantly higher than previous approaches and, once calibrated, gives highly accurate estimates of gender balance in groups. Performance suffers only marginally if we reduce the training size to 5,000, but significantly decreases in a smaller training set. We finally show that our indicators capture a large range of behavioral traits using factor analysis and that the framework can be used to predict other indicators of vulnerability such as age or socio-economic status. M

  • Journal article
    Steele JE, Sundsoy PR, Pezzulo C, Alegana VA, Bird TJ, Blumenstock J, Bjelland J, Engo-Monsen K, de Montjoye YKJV, Iqbal AM, Hadiuzzaman KN, Lu X, Wetter E, Tatem AJ, Bengtsson Let al., 2017,

    Mapping poverty using mobile phone and satellite data

    , Journal of the Royal Society Interface, Vol: 14, ISSN: 1742-5689

    Poverty is one of the most important determinants of adverse health outcomesglobally, a major cause of societal instability and one of the largest causes of losthuman potential. Traditional approaches to measuring and targeting povertyrely heavily on census data, which in most low- and middle-income countries(LMICs) are unavailable or out-of-date.Alternate measures are needed to comp-lement and update estimates between censuses. This study demonstrates howpublic and private data sources that are commonly available for LMICs can beused to provide novel insight into the spatial distribution of poverty. We evalu-ate the relative value of modelling three traditional poverty measures usingaggregate data from mobile operators and widely available geospatial data.Taken together, models combining these data sources providethebest predictivepower (highestr2¼0.78) and lowest error, but generally models employingmobile data only yield comparable results, offering the potential to measurepoverty more frequently and at finer granularity. Stratifying models intourban and rural areas highlights the advantage of using mobile data in urbanareas and different data in different contexts. The findings indicate the possibilityto estimate and continually monitor poverty rates at high spatial resolution incountries with limited capacity to support traditional methods of datacollection.

  • Journal article
    Molina-Solana MJ, Guo Y, Birch D, 2017,

    Improving data exploration in graphs with fuzzy logic and large-scale visualisation

    , Applied Soft Computing, Vol: 53, Pages: 227-235, ISSN: 1872-9681

    This work presents three case-studies of how fuzzy logic can be combined with large-scale immersive visualisation to enhance the process of graph sensemaking, enabling interactive fuzzy filtering of large global views of graphs. The aim is to provide users a mechanism to quickly identify interesting nodes for further analysis. Fuzzy logic allows a flexible framework to ask human-like curiosity-driven questions over the data, and visualisation allows its communication and understanding. Together, these two technologies successfully empower novices and experts to a faster and deeper understanding of the underlying patterns in big datasets compared to traditional means in a desktop screen with crisp queries. Among other examples, we provide evidence of how these two technologies successfully enable the identification of relevant transaction patterns in the Bitcoin network.

  • Journal article
    Molina-Solana M, Ros M, Ruiz MD, Gómez-Romero J, Martin-Bautista MJet al., 2016,

    Data science for building energy management: A review

    , Renewable and Sustainable Energy Reviews, Vol: 70, Pages: 598-609, ISSN: 1364-0321

    The energy consumption of residential and commercial buildings has risen steadily in recent years, an increase largely due to their HVAC systems. Expected energy loads, transportation, and storage as well as user behavior influence the quantity and quality of the energy consumed daily in buildings. However, technology is now available that can accurately monitor, collect, and store the huge amount of data involved in this process. Furthermore, this technology is capable of analyzing and exploiting such data in meaningful ways. Not surprisingly, the use of data science techniques to increase energy efficiency is currently attracting a great deal of attention and interest. This paper reviews how Data Science has been applied to address the most difficult problems faced by practitioners in the field of Energy Management, especially in the building sector. The work also discusses the challenges and opportunities that will arise with the advent of fully connected devices and new computational technologies.

  • Journal article
    Lefaudeux D, De Meulder B, Loza MJ, Peffer N, Rowe A, Baribaud F, Bansal AT, Lutter R, Sousa AR, Corfield J, Pandis I, Bakke PS, Caruso M, Chanez P, Dahlen S-E, Fleming LJ, Fowler SJ, Horvath I, Krug N, Montuschi P, Sanak M, Sandstrom T, Shaw DE, Singer F, Sterk PJ, Roberts G, Adcock IM, Djukanovic R, Auffray C, Chung KF, U-BIOPRED Study Groupet al., 2016,

    U-BIOPRED clinical adult asthma clusters linked to a subset of sputum -omics

    , Journal of Allergy and Clinical Immunology, Vol: 139, Pages: 1797-1807, ISSN: 1097-6825
  • Journal article
    Wilson SJ, Ward JA, Sousa AR, Corfield J, Bansal AT, De Meulder B, Lefaudeux D, Auffray C, Loza MJ, Baribaud F, Fitch N, Sterk PJ, Chung KF, Gibeon D, Sun K, Guo YK, Adcock I, Djukanovic R, Dahlen B, Chanez P, Shaw D, Krug N, Hohlfeld J, Sandström T, Howarth PH, U-BIOPRED Study Groupet al., 2016,

    Severe asthma exists despite suppressed tissue inflammation: findings of the U-BIOPRED study.

    , European Respiratory Journal, Vol: 48, Pages: 1307-1319, ISSN: 1399-3003

    The U-BIOPRED study is a multicentre European study aimed at a better understanding of severe asthma. It included three steroid-treated adult asthma groups (severe nonsmokers (SAn group), severe current/ex-smokers (SAs/ex group) and those with mild-moderate disease (MMA group)) and healthy controls (HC group). The aim of this cross-sectional, bronchoscopy substudy was to compare bronchial immunopathology between these groups.In 158 participants, bronchial biopsies and bronchial epithelial brushings were collected for immunopathologic and transcriptomic analysis. Immunohistochemical analysis of glycol methacrylate resin-embedded biopsies showed there were more mast cells in submucosa of the HC group (33.6 mm(-2)) compared with both severe asthma groups (SAn: 17.4 mm(-2), p<0.001; SAs/ex: 22.2 mm(-2), p=0.01) and with the MMA group (21.2 mm(-2), p=0.01). The number of CD4(+) lymphocytes was decreased in the SAs/ex group (4.7 mm(-2)) compared with the SAn (11.6 mm(-2), p=0.002), MMA (10.1 mm(-2), p=0.008) and HC (10.6 mm(-2), p<0.001) groups. No other differences were observed.Affymetrix microarray analysis identified seven probe sets in the bronchial brushing samples that had a positive relationship with submucosal eosinophils. These mapped to COX-2 (cyclo-oxygenase-2), ADAM-7 (disintegrin and metalloproteinase domain-containing protein 7), SLCO1A2 (solute carrier organic anion transporter family member 1A2), TMEFF2 (transmembrane protein with epidermal growth factor like and two follistatin like domains 2) and TRPM-1 (transient receptor potential cation channel subfamily M member 1); the remaining two are unnamed.We conclude that in nonsmoking and smoking patients on currently recommended therapy, severe asthma exists despite suppressed tissue inflammation within the proximal airway wall.

  • Journal article
    de Montjoye YKJV, Rocher L, Pentland AS, 2016,

    bandicoot: an open-source Python toolbox to analyze mobile phone metadata

    , Journal of Machine Learning Research, Vol: 17, ISSN: 1532-4435

    bandicoot is an open-source Python toolbox to extract more than 1442 features from standard mobile phone metadata. bandicoot makes it easy for machine learning researchers and practitioners to load mobile phone data, to analyze and visualize them, and to extract robust features which can be used for various classification and clustering tasks. Emphasis is put on ease of use, consistency, and documentation. bandicoot has no dependencies and is distributed under MIT license

  • Journal article
    Taquet M, Quoidbach J, de Montjoye Y-A, Desseilles M, Gross JJet al., 2016,

    Hedonism and the choice of everyday activities

    , Proceedings of the National Academy of Sciences, Vol: 113, Pages: 9769-9773, ISSN: 0027-8424

    Most theories of motivation have highlighted that human behavior is guided by the hedonic principle, according to which our choices of daily activities aim to minimize negative affect and maximize positive affect. However, it is not clear how to reconcile this idea with the fact that people routinely engage in unpleasant yet necessary activities. To address this issue, we monitored in real time the activities and moods of over 28,000 people across an average of 27 d using a multiplatform smartphone application. We found that people’s choices of activities followed a hedonic flexibility principle. Specifically, people were more likely to engage in mood-increasing activities (e.g., play sports) when they felt bad, and to engage in useful but mood-decreasing activities (e.g., housework) when they felt good. These findings clarify how hedonic considerations shape human behavior. They may explain how humans overcome the allure of short-term gains in happiness to maximize long-term welfare.

  • Journal article
    McGinn D, Birch DA, Akroyd D, Molina-Solana M, Guo Y, Knottenbelt Wet al., 2016,

    Visualizing Dynamic Bitcoin Transaction Patterns

    , Big Data, Vol: 4, Pages: 109-119, ISSN: 2167-647X

    This work presents a systemic top-down visualization of Bitcoin transaction activity to explore dynamically generated patterns of algorithmic behavior. Bitcoin dominates the cryptocurrency markets and presents researchers with a rich source of real-time transactional data. The pseudonymous yet public nature of the data presents opportunities for the discovery of human and algorithmic behavioral patterns of interest to many parties such as financial regulators, protocol designers, and security analysts. However, retaining visual fidelity to the underlying data to retain a fuller understanding of activity within the network remains challenging, particularly in real time. We expose an effective force-directed graph visualization employed in our large-scale data observation facility to accelerate this data exploration and derive useful insight among domain experts and the general public alike. The high-fidelity visualizations demonstrated in this article allowed for collaborative discovery of unexpected high frequency transaction patterns, including automated laundering operations, and the evolution of multiple distinct algorithmic denial of service attacks on the Bitcoin network.

  • Journal article
    Bertone G, Calore F, Caron S, Austri RRD, Kim JS, Trotta R, Weniger Cet al., 2016,

    Global analysis of the pMSSM in light of the Fermi GeV excess: prospects for the LHC Run-II and astroparticle experiments

    , Journal of Cosmology and Astroparticle Physics, Vol: 2016, ISSN: 1475-7516
  • Journal article
    Ma ZB, Yang Y, Liu YX, Bharath AAet al., 2016,

    Recurrently decomposable 2-D convolvers for FPGA-based digital image processing

    , IEEE Transactions on Circuits and Systems, Vol: 63, Pages: 979-983, ISSN: 1549-7747

    Two-dimensional (2-D) convolution is a widely used operation in image processing and computer vision, characterized by intensive computation and frequent memory accesses. Previous efforts to improve the performance of field-programmable gate array (FPGA) convolvers focused on the design of buffering schemes and on minimizing the use of multipliers. A recently proposed recurrently decomposable (RD) filter design method can reduce the computational complexity of 2-D convolutions by splitting the convolution between an image and a large mask into a sequence of convolutions using several smaller masks. This brief explores how to efficiently implement RD based 2-D convolvers using FPGA. Three FPGA architectures are proposed based on RD filters, each with a different buffering scheme. The conclusion is that RD based architectures achieve higher area efficiency than other previously reported state-of-the-art methods, especially for larger convolution masks. An area efficiency metric is also suggested, which allows the most appropriate architecture to be selected.

  • Journal article
    Creswell A, Bharath AA, 2016,

    Task Specific Adversarial Cost Function

    The cost function used to train a generative model should fit the purpose ofthe model. If the model is intended for tasks such as generating perceptuallycorrect samples, it is beneficial to maximise the likelihood of a sample drawnfrom the model, Q, coming from the same distribution as the training data, P.This is equivalent to minimising the Kullback-Leibler (KL) distance, KL[Q||P].However, if the model is intended for tasks such as retrieval or classificationit is beneficial to maximise the likelihood that a sample drawn from thetraining data is captured by the model, equivalent to minimising KL[P||Q]. Thecost function used in adversarial training optimises the Jensen-Shannon entropywhich can be seen as an even interpolation between KL[Q||P] and KL[P||Q]. Here,we propose an alternative adversarial cost function which allows easy tuning ofthe model for either task. Our task specific cost function is evaluated on adataset of hand-written characters in the following tasks: Generation,retrieval and one-shot learning.

This data is extracted from the Web of Science and reproduced under a licence from Thomson Reuters. You may not copy or re-distribute this data in whole or in part without the written consent of the Science business of Thomson Reuters.

Request URL: Request URI: /respub/WEB-INF/jsp/search-t4-html.jsp Query String: id=607&limit=15&page=6&respub-action=search.html Current Millis: 1600772752234 Current Time: Tue Sep 22 12:05:52 BST 2020