Publications

Journal article

Arulkumaran K, Deisenroth MP, Brundage M, Bharath AAet al., 2017,

A brief survey of deep reinforcement learning

, IEEE Signal Processing Magazine, Vol: 34, Pages: 26-38, ISSN: 1053-5888

Deep reinforcement learning (DRL) is poised to revolutionize the field of artificial intelligence (AI) and represents a step toward building autonomous systems with a higherlevel understanding of the visual world. Currently, deep learning is enabling reinforcement learning (RL) to scale to problems that were previously intractable, such as learning to play video games directly from pixels. DRL algorithms are also applied to robotics, allowing control policies for robots to be learned directly from camera inputs in the real world. In this survey, we begin with an introduction to the general field of RL, then progress to the main streams of value-based and policy-based methods. Our survey will cover central algorithms in deep RL, including the deep Q-network (DQN), trust region policy optimization (TRPO), and asynchronous advantage actor critic. In parallel, we highlight the unique advantages of deep neural networks, focusing on visual understanding via RL. To conclude, we describe several current areas of research within the field.

Conference paper

Olofsson S, Mehrian M, Geris L, Calandra R, Deisenroth MP, Misener Ret al., 2017,

Bayesian multi-objective optimisation of neotissue growth in a perfusion bioreactor set-up

, European Symposium on Computer Aided Process Engineering (ESCAPE 27), Publisher: Elsevier

We consider optimising bone neotissue growth in a 3D scaffold during dynamic perfusionbioreactor culture. The goal is to choose design variables by optimising two conflictingobjectives: (i) maximising neotissue growth and (ii) minimising operating cost. Our con-tribution is a novel extension of Bayesian multi-objective optimisation to the case of oneblack-box (neotissue growth) and one analytical (operating cost) objective function, thathelps determine, within a reasonable amount of time, what design variables best managethe trade-off between neotissue growth and operating cost. Our method is tested againstand outperforms the most common approach in literature, genetic algorithms, and showsits important real-world applicability to problems that combine black-box models witheasy-to-quantify objectives like cost.

Abstract
Cite

Conference paper

Chamberlain BP, Cardoso A, Liu CHB, Pagliari R, Deisenroth MPet al., 2017,

Customer lifetime value pediction using embeddings

, International Conference on Knowledge Discovery and Data Mining, Publisher: ACM, Pages: 1753-1762

We describe the Customer LifeTime Value (CLTV) prediction system deployed at ASOS.com, a global online fashion retailer. CLTV prediction is an important problem in e-commerce where an accurate estimate of future value allows retailers to effectively allocate marketing spend, identify and nurture high value customers and mitigate exposure to losses. The system at ASOS provides daily estimates of the future value of every customer and is one of the cornerstones of the personalised shopping experience. The state of the art in this domain uses large numbers of handcrafted features and ensemble regressors to forecast value, predict churn and evaluate customer loyalty. Recently, domains including language, vision and speech have shown dramatic advances by replacing handcrafted features with features that are learned automatically from data. We detail the system deployed at ASOS and show that learning feature representations is a promising extension to the state of the art in CLTV modelling. We propose a novel way to generate embeddings of customers, which addresses the issue of the ever changing product catalogue and obtain a significant improvement over an exhaustive set of handcrafted features.

Conference paper

Somuyiwa S, Gyorgy A, Gunduz D, 2017,

Energy-efficient wireless content delivery with proactive caching

, 15th International Symposium on Modeling and Optimization in Mobile, Ad Hoc, and Wireless Networks (WiOpt), Publisher: IEEE

We propose an intelligent proactive content caching scheme to reduce the energy consumption in wireless downlink. We consider an online social network (OSN) setting where new contents are generated over time, and remain relevant to the user for a random lifetime. Contents are downloaded to the user equipment (UE) through a time-varying wireless channel at an energy cost that depends on the channel state and the number of contents downloaded. The user accesses the OSN at random time instants, and consumes all the relevant contents. To reduce the energy consumption, we propose proactive caching of contents under favorable channel conditions to a finite capacity cache memory. Assuming that the channel quality (or equivalently, the cost of downloading data) is memoryless over time slots, we show that the optimal caching policy, which may replace contents in the cache with shorter remaining lifetime with contents at the server that remain relevant longer, has a threshold structure with respect to the channel quality. Since the optimal policy is computationally demanding in practice, we introduce a simplified caching scheme and optimize its parameters using policy search. We also present two lower bounds on the energy consumption. We demonstrate through numerical simulations that the proposed caching scheme significantly reduces the energy consumption compared to traditional reactive caching tools, and achieves close- to-optimal performance for a wide variety of system parameters.

Conference paper

Somuyiwa S, Gyorgy A, Gunduz D, 2017,

Improved policy representation and policy search for proactive content caching in wireless networks

, 15th International Symposium on Modeling and Optimization in Mobile, Ad Hoc, and Wireless Networks, Publisher: IEEE

We study the problem of proactively pushing contents into a finite capacity cache memory of a user equipment in order to reduce the long-term average energy consumption in a wireless network. We consider an online social network (OSN) framework, in which new contents are generated over time and each content remains relevant to the user for a random time period, called the lifetime of the content. The user accesses the OSN through a wireless network at random time instants to download and consume all the relevant contents. Downloading contents has an energy cost that depends on the channel state and the number of downloaded contents. Our aim is to reduce the long-term average energy consumption by proactively caching contents at favorable channel conditions. In previous work, it was shown that the optimal caching policy is infeasible to compute (even with the complete knowledge of a stochastic model describing the system), and a simple family of threshold policies was introduced and optimised using the finite difference method. In this paper we improve upon both components of this approach: we use linear function approximation (LFA) to better approximate the considered family of caching policies, and apply the REINFORCE algorithm to optimise its parameters. Numerical simulations show that the new approach provides reduction in both the average energy cost and the running time for policy optimisation.

Journal article

Eleftheriadis S, Rudovic O, Deisenroth MP, Pantic Met al., 2017,

Gaussian process domain experts for modeling of facial affect

, IEEE Transactions on Image Processing, Vol: 26, Pages: 4697-4711, ISSN: 1941-0042

Most of existing models for facial behavior analysis rely on generic classifiers, which fail to generalize well to previously unseen data. This is because of inherent differences in source (training) and target (test) data, mainly caused by variation in subjects’ facial morphology, camera views, and so on. All of these account for different contexts in which target and source data are recorded, and thus, may adversely affect the performance of the models learned solely from source data. In this paper, we exploit the notion of domain adaptation and propose a data efficient approach to adapt already learned classifiers to new unseen contexts. Specifically, we build upon the probabilistic framework of Gaussian processes (GPs), and introduce domain-specific GP experts (e.g., for each subject). The model adaptation is facilitated in a probabilistic fashion, by conditioning the target expert on the predictions from multiple source experts. We further exploit the predictive variance of each expert to define an optimal weighting during inference. We evaluate the proposed model on three publicly available data sets for multi-class (MultiPIE) and multi-label (DISFA, FERA2015) facial expression analysis by performing adaptation of two contextual factors: “where” (view) and “who” (subject). In our experiments, the proposed approach consistently outperforms: 1) both source and target classifiers, while using a small number of target examples during the adaptation and 2) related state-of-the-art approaches for supervised domain adaptation.

Conference paper

Zhang Q, Filippi SL, Flaxman S, Sejdinovic Det al., 2017,

Feature-to-feature regression for a two-step conditional independence test

, Uncertainty in Artificial Intelligence

The algorithms for causal discovery and morebroadly for learning the structure of graphicalmodels require well calibrated and consistentconditional independence (CI) tests. We revisitthe CI tests which are based on two-step proceduresand involve regression with subsequent(unconditional) independence test (RESIT) onregression residuals and investigate the assumptionsunder which these tests operate. In particular,we demonstrate that when going beyond simplefunctional relationships with additive noise,such tests can lead to an inflated number of falsediscoveries. We study the relationship of thesetests with those based on dependence measuresusing reproducing kernel Hilbert spaces (RKHS)and propose an extension of RESIT which usesRKHS-valued regression. The resulting test inheritsthe simple two-step testing procedure ofRESIT, while giving correct Type I control andcompetitive power. When used as a componentof the PC algorithm, the proposed test is morerobust to the case where hidden variables inducea switching behaviour in the associations presentin the data.

Journal article

Kupcsik A, Deisenroth MP, Peters J, Loh AP, Vadakkepat P, Neumann Get al., 2017,

Model-based contextual policy search for data-efficient generalization of robot skills

, Artificial Intelligence, Vol: 247, Pages: 415-439, ISSN: 0004-3702

© 2014 Elsevier B.V. In robotics, lower-level controllers are typically used to make the robot solve a specific task in a fixed context. For example, the lower-level controller can encode a hitting movement while the context defines the target coordinates to hit. However, in many learning problems the context may change between task executions. To adapt the policy to a new context, we utilize a hierarchical approach by learning an upper-level policy that generalizes the lower-level controllers to new contexts. A common approach to learn such upper-level policies is to use policy search. However, the majority of current contextual policy search approaches are model-free and require a high number of interactions with the robot and its environment. Model-based approaches are known to significantly reduce the amount of robot experiments, however, current model-based techniques cannot be applied straightforwardly to the problem of learning contextual upper-level policies. They rely on specific parametrizations of the policy and the reward function, which are often unrealistic in the contextual policy search formulation. In this paper, we propose a novel model-based contextual policy search algorithm that is able to generalize lower-level controllers, and is data-efficient. Our approach is based on learned probabilistic forward models and information theoretic policy search. Unlike current algorithms, our method does not require any assumption on the parametrization of the policy or the reward function. We show on complex simulated robotic tasks and in a real robot experiment that the proposed learning framework speeds up the learning process by up to two orders of magnitude in comparison to existing methods, while learning high quality policies.

Journal article

Jahani E, Sundsøy P, Bjelland J, Bengtsson L, Pentland AS, de Montjoye Y-Aet al., 2017,

Improving official statistics in emerging markets using machine learning and mobile phone data

, EPJ Data Science, Vol: 6, ISSN: 2193-1127

Mobile phones are one of the fastest growing technologies in the developing world with global penetration rates reaching 90%. Mobile phone data, also called CDR, are generated everytime phones are used and recorded by carriers at scale. CDR have generated groundbreaking insights in public health, official statistics, and logistics. However, the fact that most phones in developing countries are prepaid means that the data lacks key information about the user, including gender and other demographic variables. This precludes numerous uses of this data in social science and development economic research. It furthermore severely prevents the development of humanitarian applications such as the use of mobile phone data to target aid towards the most vulnerable groups during crisis. We developed a framework to extract more than 1400 features from standard mobile phone data and used them to predict useful individual characteristics and group estimates. We here present a systematic cross-country study of the applicability of machine learning for dataset augmentation at low cost. We validate our framework by showing how it can be used to reliably predict gender and other information for more than half a million people in two countries. We show how standard machine learning algorithms trained on only 10,000 users are sufficient to predict individual’s gender with an accuracy ranging from 74.3 to 88.4% in a developed country and from 74.5 to 79.7% in a developing country using only metadata. This is significantly higher than previous approaches and, once calibrated, gives highly accurate estimates of gender balance in groups. Performance suffers only marginally if we reduce the training size to 5,000, but significantly decreases in a smaller training set. We finally show that our indicators capture a large range of behavioral traits using factor analysis and that the framework can be used to predict other indicators of vulnerability such as age or socio-economic status. M

Conference paper

Tiwari K, Honore V, Jeong S, Chong NY, Deisenroth MPet al., 2017,

Resource-constrained decentralized active sensing for multi-robot systems using distributed Gaussian processes

, 2016 16th International Conference on Control, Automation and Systems (ICCAS), Publisher: IEEE, Pages: 13-18, ISSN: 1598-7833

We consider the problem of area coverage for robot teams operating under resource constraints, while modeling spatio-temporal environmental phenomena. The aim of the mobile robot team is to avoid exhaustive search and only visit the most important locations that can improve the prediction accuracy of a spatio-temporal model. We use a Gaussian Process (GP) to model spatially varying and temporally evolving dynamics of the target phenomenon. Each robot of the team is allocated a dedicated search area wherein the robot autonomously optimizes its prediction accuracy. We present this as a Decentralized Computation and Centralized Data Fusion approach wherein the trajectory sampled by the robot is generated using our proposed Resource-Constrained Decentralized Active Sensing (RC-DAS). Since each robot possesses its own independent prediction model, at the end of robot's mission time, we fuse all the prediction models from all robots to have a global model of the spatio-temporal phenomenon. Previously, all robots and GPs needed to be synchronized, such that the GPs can be jointly trained. However, doing so defeats the purpose of a fully decentralized mobile robot team. Thus, we allow the robots to independently gather new measurements and update their model parameters irrespective of other members of the team. To evaluate the performance of our model, we compare the trajectory traced by the robot using active and passive (e.g., nearest neighbor selection) sensing. We compare the performance and cost incurred by a resource constrained optimization with the unconstrained entropy maximization version.

Imperial College London

Latest News

Machine Learning Initiative at Imperial

A brief survey of deep reinforcement learning

Bayesian multi-objective optimisation of neotissue growth in a perfusion bioreactor set-up

Customer lifetime value pediction using embeddings

Energy-efficient wireless content delivery with proactive caching

Improved policy representation and policy search for proactive content caching in wireless networks

Gaussian process domain experts for modeling of facial affect

Feature-to-feature regression for a two-step conditional independence test

Model-based contextual policy search for data-efficient generalization of robot skills

Improving official statistics in emerging markets using machine learning and mobile phone data

Resource-constrained decentralized active sensing for multi-robot systems using distributed Gaussian processes

Publications

Search or filter publications

Filter by type:

Filter by year:

Results

Search results

Bayesian multi-objective optimisation of neotissue growth in a perfusion bioreactor set-up

Feature-to-feature regression for a two-step conditional independence test