Publications
Results
- Showing results for:
- Reset all filters
Search results
-
Conference paperKamthe S, Deisenroth MP, 2018,
Data-Efficient Reinforcement Learning with Probabilistic Model Predictive Control
, International Conference on Artificial Intelligence and StatisticsTrial-and-error based reinforcement learning (RL) has seen rapid advancementsin recent times, especially with the advent of deep neural networks. However,the majority of autonomous RL algorithms either rely on engineered features ora large number of interactions with the environment. Such a large number ofinteractions may be impractical in many real-world applications. For example,robots are subject to wear and tear and, hence, millions of interactions maychange or damage the system. Moreover, practical systems have limitations inthe form of the maximum torque that can be safely applied. To reduce the numberof system interactions while naturally handling constraints, we propose amodel-based RL framework based on Model Predictive Control (MPC). Inparticular, we propose to learn a probabilistic transition model using GaussianProcesses (GPs) to incorporate model uncertainties into long-term predictions,thereby, reducing the impact of model errors. We then use MPC to find a controlsequence that minimises the expected long-term cost. We provide theoreticalguarantees for the first-order optimality in the GP-based transition modelswith deterministic approximate inference for long-term planning. The proposedframework demonstrates superior data efficiency and learning rates compared tothe current state of the art.
-
Journal articleArulkumaran K, Deisenroth MP, Brundage M, et al., 2017,
A brief survey of deep reinforcement learning
, IEEE Signal Processing Magazine, Vol: 34, Pages: 26-38, ISSN: 1053-5888Deep reinforcement learning (DRL) is poised to revolutionize the field of artificial intelligence (AI) and represents a step toward building autonomous systems with a higherlevel understanding of the visual world. Currently, deep learning is enabling reinforcement learning (RL) to scale to problems that were previously intractable, such as learning to play video games directly from pixels. DRL algorithms are also applied to robotics, allowing control policies for robots to be learned directly from camera inputs in the real world. In this survey, we begin with an introduction to the general field of RL, then progress to the main streams of value-based and policy-based methods. Our survey will cover central algorithms in deep RL, including the deep Q-network (DQN), trust region policy optimization (TRPO), and asynchronous advantage actor critic. In parallel, we highlight the unique advantages of deep neural networks, focusing on visual understanding via RL. To conclude, we describe several current areas of research within the field.
-
Conference paperOlofsson S, Mehrian M, Geris L, et al., 2017,
Bayesian multi-objective optimisation of neotissue growth in a perfusion bioreactor set-up
, European Symposium on Computer Aided Process Engineering (ESCAPE 27), Publisher: ElsevierWe consider optimising bone neotissue growth in a 3D scaffold during dynamic perfusionbioreactor culture. The goal is to choose design variables by optimising two conflictingobjectives: (i) maximising neotissue growth and (ii) minimising operating cost. Our con-tribution is a novel extension of Bayesian multi-objective optimisation to the case of oneblack-box (neotissue growth) and one analytical (operating cost) objective function, thathelps determine, within a reasonable amount of time, what design variables best managethe trade-off between neotissue growth and operating cost. Our method is tested againstand outperforms the most common approach in literature, genetic algorithms, and showsits important real-world applicability to problems that combine black-box models witheasy-to-quantify objectives like cost.
-
Conference paperChamberlain BP, Cardoso A, Liu CHB, et al., 2017,
Customer lifetime value pediction using embeddings
, International Conference on Knowledge Discovery and Data Mining, Publisher: ACM, Pages: 1753-1762We describe the Customer LifeTime Value (CLTV) prediction system deployed at ASOS.com, a global online fashion retailer. CLTV prediction is an important problem in e-commerce where an accurate estimate of future value allows retailers to effectively allocate marketing spend, identify and nurture high value customers and mitigate exposure to losses. The system at ASOS provides daily estimates of the future value of every customer and is one of the cornerstones of the personalised shopping experience. The state of the art in this domain uses large numbers of handcrafted features and ensemble regressors to forecast value, predict churn and evaluate customer loyalty. Recently, domains including language, vision and speech have shown dramatic advances by replacing handcrafted features with features that are learned automatically from data. We detail the system deployed at ASOS and show that learning feature representations is a promising extension to the state of the art in CLTV modelling. We propose a novel way to generate embeddings of customers, which addresses the issue of the ever changing product catalogue and obtain a significant improvement over an exhaustive set of handcrafted features.
-
Conference paperJoulani P, Gyorgy A, Szepesvari C, 2017,
A Modular Analysis of Adaptive (Non-)Convex Optimization: Optimism, Composite Objectives, and Variational Bounds
, 28th International Conference on Algorithmic Learning Theory -
Conference paperSomuyiwa S, Gyorgy A, Gunduz D, 2017,
Improved policy representation and policy search for proactive content caching in wireless networks
, 15th International Symposium on Modeling and Optimization in Mobile, Ad Hoc, and Wireless Networks, Publisher: IEEEWe study the problem of proactively pushing contents into a finite capacity cache memory of a user equipment in order to reduce the long-term average energy consumption in a wireless network. We consider an online social network (OSN) framework, in which new contents are generated over time and each content remains relevant to the user for a random time period, called the lifetime of the content. The user accesses the OSN through a wireless network at random time instants to download and consume all the relevant contents. Downloading contents has an energy cost that depends on the channel state and the number of downloaded contents. Our aim is to reduce the long-term average energy consumption by proactively caching contents at favorable channel conditions. In previous work, it was shown that the optimal caching policy is infeasible to compute (even with the complete knowledge of a stochastic model describing the system), and a simple family of threshold policies was introduced and optimised using the finite difference method. In this paper we improve upon both components of this approach: we use linear function approximation (LFA) to better approximate the considered family of caching policies, and apply the REINFORCE algorithm to optimise its parameters. Numerical simulations show that the new approach provides reduction in both the average energy cost and the running time for policy optimisation.
-
Conference paperSomuyiwa S, Gyorgy A, Gunduz D, 2017,
Energy-efficient wireless content delivery with proactive caching
, 15th International Symposium on Modeling and Optimization in Mobile, Ad Hoc, and Wireless Networks (WiOpt), Publisher: IEEEWe propose an intelligent proactive content caching scheme to reduce the energy consumption in wireless downlink. We consider an online social network (OSN) setting where new contents are generated over time, and remain relevant to the user for a random lifetime. Contents are downloaded to the user equipment (UE) through a time-varying wireless channel at an energy cost that depends on the channel state and the number of contents downloaded. The user accesses the OSN at random time instants, and consumes all the relevant contents. To reduce the energy consumption, we propose proactive caching of contents under favorable channel conditions to a finite capacity cache memory. Assuming that the channel quality (or equivalently, the cost of downloading data) is memoryless over time slots, we show that the optimal caching policy, which may replace contents in the cache with shorter remaining lifetime with contents at the server that remain relevant longer, has a threshold structure with respect to the channel quality. Since the optimal policy is computationally demanding in practice, we introduce a simplified caching scheme and optimize its parameters using policy search. We also present two lower bounds on the energy consumption. We demonstrate through numerical simulations that the proposed caching scheme significantly reduces the energy consumption compared to traditional reactive caching tools, and achieves close- to-optimal performance for a wide variety of system parameters.
-
Journal articleEleftheriadis S, Rudovic O, Deisenroth MP, et al., 2017,
Gaussian process domain experts for modeling of facial affect
, IEEE Transactions on Image Processing, Vol: 26, Pages: 4697-4711, ISSN: 1941-0042Most of existing models for facial behavior analysis rely on generic classifiers, which fail to generalize well to previously unseen data. This is because of inherent differences in source (training) and target (test) data, mainly caused by variation in subjects’ facial morphology, camera views, and so on. All of these account for different contexts in which target and source data are recorded, and thus, may adversely affect the performance of the models learned solely from source data. In this paper, we exploit the notion of domain adaptation and propose a data efficient approach to adapt already learned classifiers to new unseen contexts. Specifically, we build upon the probabilistic framework of Gaussian processes (GPs), and introduce domain-specific GP experts (e.g., for each subject). The model adaptation is facilitated in a probabilistic fashion, by conditioning the target expert on the predictions from multiple source experts. We further exploit the predictive variance of each expert to define an optimal weighting during inference. We evaluate the proposed model on three publicly available data sets for multi-class (MultiPIE) and multi-label (DISFA, FERA2015) facial expression analysis by performing adaptation of two contextual factors: “where” (view) and “who” (subject). In our experiments, the proposed approach consistently outperforms: 1) both source and target classifiers, while using a small number of target examples during the adaptation and 2) related state-of-the-art approaches for supervised domain adaptation.
-
Conference paperZhang Q, Filippi SL, Flaxman S, et al., 2017,
Feature-to-feature regression for a two-step conditional independence test
, Uncertainty in Artificial IntelligenceThe algorithms for causal discovery and morebroadly for learning the structure of graphicalmodels require well calibrated and consistentconditional independence (CI) tests. We revisitthe CI tests which are based on two-step proceduresand involve regression with subsequent(unconditional) independence test (RESIT) onregression residuals and investigate the assumptionsunder which these tests operate. In particular,we demonstrate that when going beyond simplefunctional relationships with additive noise,such tests can lead to an inflated number of falsediscoveries. We study the relationship of thesetests with those based on dependence measuresusing reproducing kernel Hilbert spaces (RKHS)and propose an extension of RESIT which usesRKHS-valued regression. The resulting test inheritsthe simple two-step testing procedure ofRESIT, while giving correct Type I control andcompetitive power. When used as a componentof the PC algorithm, the proposed test is morerobust to the case where hidden variables inducea switching behaviour in the associations presentin the data.
-
Journal articleKupcsik A, Deisenroth MP, Peters J, et al., 2017,
Model-based contextual policy search for data-efficient generalization of robot skills
, Artificial Intelligence, Vol: 247, Pages: 415-439, ISSN: 0004-3702© 2014 Elsevier B.V. In robotics, lower-level controllers are typically used to make the robot solve a specific task in a fixed context. For example, the lower-level controller can encode a hitting movement while the context defines the target coordinates to hit. However, in many learning problems the context may change between task executions. To adapt the policy to a new context, we utilize a hierarchical approach by learning an upper-level policy that generalizes the lower-level controllers to new contexts. A common approach to learn such upper-level policies is to use policy search. However, the majority of current contextual policy search approaches are model-free and require a high number of interactions with the robot and its environment. Model-based approaches are known to significantly reduce the amount of robot experiments, however, current model-based techniques cannot be applied straightforwardly to the problem of learning contextual upper-level policies. They rely on specific parametrizations of the policy and the reward function, which are often unrealistic in the contextual policy search formulation. In this paper, we propose a novel model-based contextual policy search algorithm that is able to generalize lower-level controllers, and is data-efficient. Our approach is based on learned probabilistic forward models and information theoretic policy search. Unlike current algorithms, our method does not require any assumption on the parametrization of the policy or the reward function. We show on complex simulated robotic tasks and in a real robot experiment that the proposed learning framework speeds up the learning process by up to two orders of magnitude in comparison to existing methods, while learning high quality policies.
This data is extracted from the Web of Science and reproduced under a licence from Thomson Reuters. You may not copy or re-distribute this data in whole or in part without the written consent of the Science business of Thomson Reuters.