Publications from our Researchers

Several of our current PhD candidates and fellow researchers at the Data Science Institute have published, or in the proccess of publishing, papers to present their research.  

Search or filter publications

Filter by type:

Filter by publication type

Filter by year:

to

Results

  • Showing results for:
  • Reset all filters

Search results

  • Journal article
    Gomez-Romero J, Molina-Solana MJ, Oehmichen A, Guo Yet al., 2018,

    Visualizing large knowledge graphs: a performance analysis

    , Future Generation Computer Systems, Vol: 89, Pages: 224-238, ISSN: 0167-739X

    Knowledge graphs are an increasingly important source of data and context information in Data Science. A first step in data analysis is data exploration, in which visualization plays a key role. Currently, Semantic Web technologies are prevalent for modelling and querying knowledge graphs; however, most visualization approaches in this area tend to be overly simplified and targeted to small-sized representations. In this work, we describe and evaluate the performance of a Big Data architecture applied to large-scale knowledge graph visualization. To do so, we have implemented a graph processing pipeline in the Apache Spark framework and carried out several experiments with real-world and synthetic graphs. We show that distributed implementations of the graph building, metric calculation and layout stages can efficiently manage very large graphs, even without applying partitioning or incremental processing strategies.

  • Journal article
    Molina-Solana M, Kennedy M, Amador Diaz Lopez J, 2018,

    foo.castr: visualising the future AI workforce

    , Big Data Analytics, Vol: 3, ISSN: 2058-6345

    Organization of companies and their HR departments are becoming hugely affected by recent advancements in computational power and Artificial Intelligence, with this trend likely to dramatically rise in the next few years. This work presents foo.castr, a tool we are developing to visualise, communicate and facilitate the understanding of the impact of these advancements in the future of workforce. It builds upon the idea that particular tasks within job descriptions will be progressively taken by computers, forcing the shaping of human jobs. In its current version, foo.castr presents three different scenarios to help HR departments planning potential changes and disruptions brought by the adoption of Artificial Intelligence.

  • Journal article
    Dolan D, Jensen H, Martinez Mediano P, Molina-Solana MJ, Rajpal H, Rosas De Andraca F, Sloboda JAet al., 2018,

    The improvisational state of mind: a multidisciplinary study of an improvisatory approach to classical music repertoire performance

    , Frontiers in Psychology, Vol: 9, ISSN: 1664-1078

    The recent re-introduction of improvisation as a professional practice within classical music, however cautious and still rare, allows direct and detailed contemporary comparison between improvised and “standard” approaches to performances of the same composition, comparisons which hitherto could only be inferred from impressionistic historical accounts. This study takes an interdisciplinary multi-method approach to discovering the contrasting nature and effects of prepared and improvised approaches during live chamber-music concert performances of a movement from Franz Schubert’s “Shepherd on the Rock”, given by a professional trio consisting of voice, flute, and piano, in the presence of an invited audience of 22 adults with varying levels of musical experience and training. The improvised performances were found to be differ systematically from prepared performances in their timing, dynamic, and timbral features as well as in the degree of risk-taking and “mind reading” between performers including during moments of added extemporised notes. Post-performance critical reflection by the performers characterised distinct mental states underlying the two modes of performance. The amount of overall body movements was reduced in the improvised performances, which showed less unco-ordinated movements between performers when compared to the prepared performance. Audience members, who were told only that the two performances would be different, but not how, rated the improvised version as more emotionally compelling and musically convincing than the prepared version. The size of this effect was not affected by whether or not the audience could see the performers, or by levels of musical training. EEG measurements from 19 scalp locations showed higher levels of Lempel-Ziv complexity (associated with awareness and alertness) in the improvised version in both performers and audience. Results are discussed in terms of their potential

  • Journal article
    Creswell A, Bharath AA, 2018,

    Denoising adversarial autoencoders

    , IEEE Transactions on Neural Networks and Learning Systems, ISSN: 2162-2388

    Unsupervised learning is of growing interest becauseit unlocks the potential held in vast amounts of unlabelled data tolearn useful representations for inference. Autoencoders, a formof generative model, may be trained by learning to reconstructunlabelled input data from a latent representation space. Morerobust representations may be produced by an autoencoderif it learns to recover clean input samples from corruptedones. Representations may be further improved by introducingregularisation during training to shape the distribution of theencoded data in the latent space. We suggestdenoising adversarialautoencoders, which combine denoising and regularisation, shap-ing the distribution of latent space using adversarial training.We introduce a novel analysis that shows how denoising maybe incorporated into the training and sampling of adversarialautoencoders. Experiments are performed to assess the contri-butions that denoising makes to the learning of representationsfor classification and sample synthesis. Our results suggest thatautoencoders trained using a denoising criterion achieve higherclassification performance, and can synthesise samples that aremore consistent with the input data than those trained withouta corruption process.

  • Journal article
    Song J, Fan S, Lin W, Mottet L, Woodward H, Wykes MD, Arcucci R, Xiao D, Debay J-E, ApSimon H, Aristodemou E, Birch D, Carpentieri M, Fang F, Herzog M, Hunt GR, Jones RL, Pain C, Pavlidis D, Robins AG, Short CA, Linden PFet al., 2018,

    Natural ventilation in cities: the implications of fluid mechanics

    , BUILDING RESEARCH AND INFORMATION, Vol: 46, Pages: 809-828, ISSN: 0961-3218
  • Journal article
    Jahani E, Sundsøy P, Bjelland J, Bengtsson L, Pentland AS, de Montjoye Y-Aet al., 2017,

    Improving official statistics in emerging markets using machine learning and mobile phone data

    , EPJ Data Science, Vol: 6, ISSN: 2193-1127

    Mobile phones are one of the fastest growing technologies in the developing world with global penetration rates reaching 90%. Mobile phone data, also called CDR, are generated everytime phones are used and recorded by carriers at scale. CDR have generated groundbreaking insights in public health, official statistics, and logistics. However, the fact that most phones in developing countries are prepaid means that the data lacks key information about the user, including gender and other demographic variables. This precludes numerous uses of this data in social science and development economic research. It furthermore severely prevents the development of humanitarian applications such as the use of mobile phone data to target aid towards the most vulnerable groups during crisis. We developed a framework to extract more than 1400 features from standard mobile phone data and used them to predict useful individual characteristics and group estimates. We here present a systematic cross-country study of the applicability of machine learning for dataset augmentation at low cost. We validate our framework by showing how it can be used to reliably predict gender and other information for more than half a million people in two countries. We show how standard machine learning algorithms trained on only 10,000 users are sufficient to predict individual’s gender with an accuracy ranging from 74.3 to 88.4% in a developed country and from 74.5 to 79.7% in a developing country using only metadata. This is significantly higher than previous approaches and, once calibrated, gives highly accurate estimates of gender balance in groups. Performance suffers only marginally if we reduce the training size to 5,000, but significantly decreases in a smaller training set. We finally show that our indicators capture a large range of behavioral traits using factor analysis and that the framework can be used to predict other indicators of vulnerability such as age or socio-economic status. M

  • Journal article
    Steele JE, Sundsoy PR, Pezzulo C, Alegana VA, Bird TJ, Blumenstock J, Bjelland J, Engo-Monsen K, de Montjoye YKJV, Iqbal AM, Hadiuzzaman KN, Lu X, Wetter E, Tatem AJ, Bengtsson Let al., 2017,

    Mapping poverty using mobile phone and satellite data

    , Journal of the Royal Society Interface, Vol: 14, ISSN: 1742-5689

    Poverty is one of the most important determinants of adverse health outcomesglobally, a major cause of societal instability and one of the largest causes of losthuman potential. Traditional approaches to measuring and targeting povertyrely heavily on census data, which in most low- and middle-income countries(LMICs) are unavailable or out-of-date.Alternate measures are needed to comp-lement and update estimates between censuses. This study demonstrates howpublic and private data sources that are commonly available for LMICs can beused to provide novel insight into the spatial distribution of poverty. We evalu-ate the relative value of modelling three traditional poverty measures usingaggregate data from mobile operators and widely available geospatial data.Taken together, models combining these data sources providethebest predictivepower (highestr2¼0.78) and lowest error, but generally models employingmobile data only yield comparable results, offering the potential to measurepoverty more frequently and at finer granularity. Stratifying models intourban and rural areas highlights the advantage of using mobile data in urbanareas and different data in different contexts. The findings indicate the possibilityto estimate and continually monitor poverty rates at high spatial resolution incountries with limited capacity to support traditional methods of datacollection.

  • Journal article
    Molina-Solana MJ, Guo Y, Birch D, 2017,

    Improving data exploration in graphs with fuzzy logic and large-scale visualisation

    , Applied Soft Computing, Vol: 53, Pages: 227-235, ISSN: 1872-9681

    This work presents three case-studies of how fuzzy logic can be combined with large-scale immersive visualisation to enhance the process of graph sensemaking, enabling interactive fuzzy filtering of large global views of graphs. The aim is to provide users a mechanism to quickly identify interesting nodes for further analysis. Fuzzy logic allows a flexible framework to ask human-like curiosity-driven questions over the data, and visualisation allows its communication and understanding. Together, these two technologies successfully empower novices and experts to a faster and deeper understanding of the underlying patterns in big datasets compared to traditional means in a desktop screen with crisp queries. Among other examples, we provide evidence of how these two technologies successfully enable the identification of relevant transaction patterns in the Bitcoin network.

  • Journal article
    de Montjoye YKJV, Rocher L, Pentland AS, 2016,

    bandicoot: an open-source Python toolbox to analyze mobile phone metadata

    , Journal of Machine Learning Research, Vol: 17, ISSN: 1532-4435

    bandicoot is an open-source Python toolbox to extract more than 1442 features from standard mobile phone metadata. bandicoot makes it easy for machine learning researchers and practitioners to load mobile phone data, to analyze and visualize them, and to extract robust features which can be used for various classification and clustering tasks. Emphasis is put on ease of use, consistency, and documentation. bandicoot has no dependencies and is distributed under MIT license

  • Journal article
    Taquet M, Quoidbach J, de Montjoye Y-A, Desseilles M, Gross JJet al., 2016,

    Hedonism and the choice of everyday activities

    , Proceedings of the National Academy of Sciences, Vol: 113, Pages: 9769-9773, ISSN: 0027-8424

    Most theories of motivation have highlighted that human behavior is guided by the hedonic principle, according to which our choices of daily activities aim to minimize negative affect and maximize positive affect. However, it is not clear how to reconcile this idea with the fact that people routinely engage in unpleasant yet necessary activities. To address this issue, we monitored in real time the activities and moods of over 28,000 people across an average of 27 d using a multiplatform smartphone application. We found that people’s choices of activities followed a hedonic flexibility principle. Specifically, people were more likely to engage in mood-increasing activities (e.g., play sports) when they felt bad, and to engage in useful but mood-decreasing activities (e.g., housework) when they felt good. These findings clarify how hedonic considerations shape human behavior. They may explain how humans overcome the allure of short-term gains in happiness to maximize long-term welfare.

  • Journal article
    McGinn D, Birch DA, Akroyd D, Molina-Solana M, Guo Y, Knottenbelt Wet al., 2016,

    Visualizing Dynamic Bitcoin Transaction Patterns

    , Big Data, Vol: 4, Pages: 109-119, ISSN: 2167-647X

    This work presents a systemic top-down visualization of Bitcoin transaction activity to explore dynamically generated patterns of algorithmic behavior. Bitcoin dominates the cryptocurrency markets and presents researchers with a rich source of real-time transactional data. The pseudonymous yet public nature of the data presents opportunities for the discovery of human and algorithmic behavioral patterns of interest to many parties such as financial regulators, protocol designers, and security analysts. However, retaining visual fidelity to the underlying data to retain a fuller understanding of activity within the network remains challenging, particularly in real time. We expose an effective force-directed graph visualization employed in our large-scale data observation facility to accelerate this data exploration and derive useful insight among domain experts and the general public alike. The high-fidelity visualizations demonstrated in this article allowed for collaborative discovery of unexpected high frequency transaction patterns, including automated laundering operations, and the evolution of multiple distinct algorithmic denial of service attacks on the Bitcoin network.

  • Journal article
    Bertone G, Calore F, Caron S, Austri RRD, Kim JS, Trotta R, Weniger Cet al., 2016,

    Global analysis of the pMSSM in light of the Fermi GeV excess: prospects for the LHC Run-II and astroparticle experiments

    , Journal of Cosmology and Astroparticle Physics, Vol: 2016, ISSN: 1475-7516
  • Journal article
    Ma ZB, Yang Y, Liu YX, Bharath AAet al., 2016,

    Recurrently decomposable 2-D convolvers for FPGA-based digital image processing

    , IEEE Transactions on Circuits and Systems, Vol: 63, Pages: 979-983, ISSN: 1549-7747

    Two-dimensional (2-D) convolution is a widely used operation in image processing and computer vision, characterized by intensive computation and frequent memory accesses. Previous efforts to improve the performance of field-programmable gate array (FPGA) convolvers focused on the design of buffering schemes and on minimizing the use of multipliers. A recently proposed recurrently decomposable (RD) filter design method can reduce the computational complexity of 2-D convolutions by splitting the convolution between an image and a large mask into a sequence of convolutions using several smaller masks. This brief explores how to efficiently implement RD based 2-D convolvers using FPGA. Three FPGA architectures are proposed based on RD filters, each with a different buffering scheme. The conclusion is that RD based architectures achieve higher area efficiency than other previously reported state-of-the-art methods, especially for larger convolution masks. An area efficiency metric is also suggested, which allows the most appropriate architecture to be selected.

  • Journal article
    de Montjoye YKJV,

    Privacy by design in big data: An overview of privacy enhancing technologies in the era of big data analytics

    , arXiv

    The extensive collection and processing of personal information in big data analytics has given rise to serious privacy concerns, related to wide scale electronic surveillance, profiling, and disclosure of private data. To reap the benefits of analytics without invading the individuals' private sphere, it is essential to draw the limits of big data processing and integrate data protection safeguards in the analytics value chain. ENISA, with the current report, supports this approach and the position that the challenges of ...

  • Journal article
    Rivera-Rubio J, Alexiou I, Bharath AA, 2015,

    Appearance-based indoor localization: a comparison of patch descriptor performance

    , Pattern Recognition Letters, Vol: 66, Pages: 109-117, ISSN: 1872-7344

    Vision is one of the most important of the senses, and humans use it extensively during navigation. We evaluated different types of image and video frame descriptors that could be used to determine distinctive visual landmarks for localizing a person based on what is seen by a camera that they carry. To do this, we created a database containing over 3 km of video-sequences with ground-truth in the form of distance travelled along different corridors. Using this database, the accuracy of localization—both in terms of knowing which route a user is on—and in terms of position along a certain route, can be evaluated. For each type of descriptor, we also tested different techniques to encode visual structure and to search between journeys to estimate a user’s position. The techniques include single-frame descriptors, those using sequences of frames, and both color and achromatic descriptors. We found that single-frame indexing worked better within this particular dataset. This might be because the motion of the person holding the camera makes the video too dependent on individual steps and motions of one particular journey. Our results suggest that appearance-based information could be an additional source of navigational data indoors, augmenting that provided by, say, radio signal strength indicators (RSSIs). Such visual information could be collected by crowdsourcing low-resolution video feeds, allowing journeys made by different users to be associated with each other, and location to be inferred without requiring explicit mapping. This offers a complementary approach to methods based on simultaneous localization and mapping (SLAM) algorithms.

This data is extracted from the Web of Science and reproduced under a licence from Thomson Reuters. You may not copy or re-distribute this data in whole or in part without the written consent of the Science business of Thomson Reuters.

Request URL: http://wlsprd.imperial.ac.uk:80/respub/WEB-INF/jsp/search-t4-html.jsp Request URI: /respub/WEB-INF/jsp/search-t4-html.jsp Query String: id=607&limit=15&respub-action=search.html Current Millis: 1563685768030 Current Time: Sun Jul 21 06:09:28 BST 2019