Search or filter publications

Filter by type:

Filter by publication type

Filter by year:



  • Showing results for:
  • Reset all filters

Search results

  • Conference paper
    Tavakoli A, Pardo F, Kormushev P, 2017,

    Action Branching Architectures for Deep Reinforcement Learning

    , Deep Reinforcement Learning Symposium, 31st Conference on Neural Information Processing Systems (NIPS 2017)
  • Conference paper
    Eleftheriadis S, Nicholson TFW, Deisenroth MP, Hensman Jet al., 2017,

    Identification of Gaussian Process State Space Models

    , Advances in Neural Information Processing Systems (NIPS) 2017, Publisher: Neural Information Processing Systems Foundation, Inc., Pages: 5310-5320, ISSN: 1049-5258

    The Gaussian process state space model (GPSSM) is a non-linear dynamicalsystem, where unknown transition and/or measurement mappings are described byGPs. Most research in GPSSMs has focussed on the state estimation problem.However, the key challenge in GPSSMs has not been satisfactorily addressed yet:system identification. To address this challenge, we impose a structuredGaussian variational posterior distribution over the latent states, which isparameterised by a recognition model in the form of a bi-directional recurrentneural network. Inference with this structure allows us to recover a posteriorsmoothed over the entire sequence(s) of data. We provide a practical algorithmfor efficiently computing a lower bound on the marginal likelihood using thereparameterisation trick. This additionally allows arbitrary kernels to be usedwithin the GPSSM. We demonstrate that we can efficiently generate plausiblefuture trajectories of the system we seek to model with the GPSSM, requiringonly a small number of interactions with the true system.

  • Journal article
    Huang R, Lattimore T, Gyorgy A, Szepesvari Cet al., 2017,

    Following the Leader and Fast Rates in Online Linear Prediction: Curved Constraint Sets and Other Regularities

    , Journal of Machine Learning Research, Vol: 18(145), Pages: 1-31, ISSN: 1532-4435

    Follow the leader (FTL) is a simple online learning algorithm that is known to perform well whenthe loss functions are convex and positively curved. In this paper we ask whether there are othersettings when FTL achieves low regret. In particular, we study the fundamental problem of linearprediction over a convex, compact domain with non-empty interior. Amongst other results, weprove that the curvature of the boundary of the domain can act as if the losses were curved: In thiscase, we prove that as long as the mean of the loss vectors have positive lengths bounded away fromzero, FTL enjoys logarithmic regret, while for polytope domains and stochastic data it enjoys finiteexpected regret. The former result is also extended to strongly convex domains by establishing anequivalence between the strong convexity of sets and the minimum curvature of their boundary,which may be of independent interest. Building on a previously known meta-algorithm, we alsoget an algorithm that simultaneously enjoys the worst-case guarantees and the smaller regret ofFTL when the data is ‘easy’. Finally, we show that such guarantees are achievable directly (e.g.,by the follow the regularized leader algorithm or by a shrinkage-based variant of FTL) when theconstraint set is an ellipsoid.

  • Conference paper
    Kamthe S, Deisenroth MP,

    Data-Efficient Reinforcement Learning with Probabilistic Model Predictive Control

    , International Conference on Artificial Intelligence and Statistics

    Trial-and-error based reinforcement learning (RL) has seen rapid advancementsin recent times, especially with the advent of deep neural networks. However,the majority of autonomous RL algorithms either rely on engineered features ora large number of interactions with the environment. Such a large number ofinteractions may be impractical in many real-world applications. For example,robots are subject to wear and tear and, hence, millions of interactions maychange or damage the system. Moreover, practical systems have limitations inthe form of the maximum torque that can be safely applied. To reduce the numberof system interactions while naturally handling constraints, we propose amodel-based RL framework based on Model Predictive Control (MPC). Inparticular, we propose to learn a probabilistic transition model using GaussianProcesses (GPs) to incorporate model uncertainties into long-term predictions,thereby, reducing the impact of model errors. We then use MPC to find a controlsequence that minimises the expected long-term cost. We provide theoreticalguarantees for the first-order optimality in the GP-based transition modelswith deterministic approximate inference for long-term planning. The proposedframework demonstrates superior data efficiency and learning rates compared tothe current state of the art.

  • Journal article
    Arulkumaran K, Deisenroth MP, Brundage M, Bharath AAet al., 2017,

    A brief survey of deep reinforcement learning

    , IEEE Signal Processing Magazine, Vol: 34, Pages: 26-38, ISSN: 1053-5888

    Deep reinforcement learning (DRL) is poised to revolutionize the field of artificial intelligence (AI) and represents a step toward building autonomous systems with a higherlevel understanding of the visual world. Currently, deep learning is enabling reinforcement learning (RL) to scale to problems that were previously intractable, such as learning to play video games directly from pixels. DRL algorithms are also applied to robotics, allowing control policies for robots to be learned directly from camera inputs in the real world. In this survey, we begin with an introduction to the general field of RL, then progress to the main streams of value-based and policy-based methods. Our survey will cover central algorithms in deep RL, including the deep Q-network (DQN), trust region policy optimization (TRPO), and asynchronous advantage actor critic. In parallel, we highlight the unique advantages of deep neural networks, focusing on visual understanding via RL. To conclude, we describe several current areas of research within the field.

  • Conference paper
    Olofsson S, Mehrian M, Geris L, Calandra R, Deisenroth MP, Misener Ret al., 2017,

    Bayesian multi-objective optimisation of neotissue growth in a perfusion bioreactor set-up

    , European Symposium on Computer Aided Process Engineering (ESCAPE 27), Publisher: Elsevier

    We consider optimising bone neotissue growth in a 3D scaffold during dynamic perfusionbioreactor culture. The goal is to choose design variables by optimising two conflictingobjectives: (i) maximising neotissue growth and (ii) minimising operating cost. Our con-tribution is a novel extension of Bayesian multi-objective optimisation to the case of oneblack-box (neotissue growth) and one analytical (operating cost) objective function, thathelps determine, within a reasonable amount of time, what design variables best managethe trade-off between neotissue growth and operating cost. Our method is tested againstand outperforms the most common approach in literature, genetic algorithms, and showsits important real-world applicability to problems that combine black-box models witheasy-to-quantify objectives like cost.

  • Conference paper
    Chamberlain BP, Cardoso A, Liu CHB, Pagliari R, Deisenroth MPet al., 2017,

    Customer lifetime value pediction using embeddings

    , International Conference on Knowledge Discovery and Data Mining, Publisher: ACM, Pages: 1753-1762

    We describe the Customer LifeTime Value (CLTV) prediction system deployed at, a global online fashion retailer. CLTV prediction is an important problem in e-commerce where an accurate estimate of future value allows retailers to effectively allocate marketing spend, identify and nurture high value customers and mitigate exposure to losses. The system at ASOS provides daily estimates of the future value of every customer and is one of the cornerstones of the personalised shopping experience. The state of the art in this domain uses large numbers of handcrafted features and ensemble regressors to forecast value, predict churn and evaluate customer loyalty. Recently, domains including language, vision and speech have shown dramatic advances by replacing handcrafted features with features that are learned automatically from data. We detail the system deployed at ASOS and show that learning feature representations is a promising extension to the state of the art in CLTV modelling. We propose a novel way to generate embeddings of customers, which addresses the issue of the ever changing product catalogue and obtain a significant improvement over an exhaustive set of handcrafted features.

  • Conference paper
    Joulani P, Gyorgy A, Szepesvari C,

    A Modular Analysis of Adaptive (Non-)Convex Optimization: Optimism, Composite Objectives, and Variational Bounds

    , 28th International Conference on Algorithmic Learning Theory
  • Conference paper
    Somuyiwa S, Gyorgy A, Gunduz D, 2017,

    Energy-efficient wireless content delivery with proactive caching

    , 15th International Symposium on Modeling and Optimization in Mobile, Ad Hoc, and Wireless Networks (WiOpt), Publisher: IEEE

    We propose an intelligent proactive content caching scheme to reduce the energy consumption in wireless downlink. We consider an online social network (OSN) setting where new contents are generated over time, and remain relevant to the user for a random lifetime. Contents are downloaded to the user equipment (UE) through a time-varying wireless channel at an energy cost that depends on the channel state and the number of contents downloaded. The user accesses the OSN at random time instants, and consumes all the relevant contents. To reduce the energy consumption, we propose proactive caching of contents under favorable channel conditions to a finite capacity cache memory. Assuming that the channel quality (or equivalently, the cost of downloading data) is memoryless over time slots, we show that the optimal caching policy, which may replace contents in the cache with shorter remaining lifetime with contents at the server that remain relevant longer, has a threshold structure with respect to the channel quality. Since the optimal policy is computationally demanding in practice, we introduce a simplified caching scheme and optimize its parameters using policy search. We also present two lower bounds on the energy consumption. We demonstrate through numerical simulations that the proposed caching scheme significantly reduces the energy consumption compared to traditional reactive caching tools, and achieves close- to-optimal performance for a wide variety of system parameters.

  • Conference paper
    Somuyiwa S, Gyorgy A, Gunduz D, 2017,

    Improved policy representation and policy search for proactive content caching in wireless networks

    , 15th International Symposium on Modeling and Optimization in Mobile, Ad Hoc, and Wireless Networks, Publisher: IEEE

    We study the problem of proactively pushing contents into a finite capacity cache memory of a user equipment in order to reduce the long-term average energy consumption in a wireless network. We consider an online social network (OSN) framework, in which new contents are generated over time and each content remains relevant to the user for a random time period, called the lifetime of the content. The user accesses the OSN through a wireless network at random time instants to download and consume all the relevant contents. Downloading contents has an energy cost that depends on the channel state and the number of downloaded contents. Our aim is to reduce the long-term average energy consumption by proactively caching contents at favorable channel conditions. In previous work, it was shown that the optimal caching policy is infeasible to compute (even with the complete knowledge of a stochastic model describing the system), and a simple family of threshold policies was introduced and optimised using the finite difference method. In this paper we improve upon both components of this approach: we use linear function approximation (LFA) to better approximate the considered family of caching policies, and apply the REINFORCE algorithm to optimise its parameters. Numerical simulations show that the new approach provides reduction in both the average energy cost and the running time for policy optimisation.

This data is extracted from the Web of Science and reproduced under a licence from Thomson Reuters. You may not copy or re-distribute this data in whole or in part without the written consent of the Science business of Thomson Reuters.

Request URL: Request URI: /respub/WEB-INF/jsp/search-t4-html.jsp Query String: id=954&limit=10&page=4&respub-action=search.html Current Millis: 1591492588138 Current Time: Sun Jun 07 02:16:28 BST 2020