Search or filter publications

Filter by type:

Filter by publication type

Filter by year:



  • Showing results for:
  • Reset all filters

Search results

    Arulkumaran K, Deisenroth MP, Brundage M, Bharath AAet al.,

    A Brief Survey of Deep Reinforcement Learning

    , IEEE Signal Processing Magazine, ISSN: 1053-5888

    Deep reinforcement learning is poised to revolutionise the field of AI andrepresents a step towards building autonomous systems with a higher levelunderstanding of the visual world. Currently, deep learning is enablingreinforcement learning to scale to problems that were previously intractable,such as learning to play video games directly from pixels. Deep reinforcementlearning algorithms are also applied to robotics, allowing control policies forrobots to be learned directly from camera inputs in the real world. In thissurvey, we begin with an introduction to the general field of reinforcementlearning, then progress to the main streams of value-based and policy-basedmethods. Our survey will cover central algorithms in deep reinforcementlearning, including the deep $Q$-network, trust region policy optimisation, andasynchronous advantage actor-critic. In parallel, we highlight the uniqueadvantages of deep neural networks, focusing on visual understanding viareinforcement learning. To conclude, we describe several current areas ofresearch within the field.

    Creswell A, Bharath AA,

    Task Specific Adversarial Cost Function

    The cost function used to train a generative model should fit the purpose ofthe model. If the model is intended for tasks such as generating perceptuallycorrect samples, it is beneficial to maximise the likelihood of a sample drawnfrom the model, Q, coming from the same distribution as the training data, P.This is equivalent to minimising the Kullback-Leibler (KL) distance, KL[Q||P].However, if the model is intended for tasks such as retrieval or classificationit is beneficial to maximise the likelihood that a sample drawn from thetraining data is captured by the model, equivalent to minimising KL[P||Q]. Thecost function used in adversarial training optimises the Jensen-Shannon entropywhich can be seen as an even interpolation between KL[Q||P] and KL[P||Q]. Here,we propose an alternative adversarial cost function which allows easy tuning ofthe model for either task. Our task specific cost function is evaluated on adataset of hand-written characters in the following tasks: Generation,retrieval and one-shot learning.

    Creswell A, Bharath AA,

    Denoising Adversarial Autoencoders

    Unsupervised learning is of growing interest because it unlocks the potentialheld in vast amounts of unlabelled data to learn useful representations forinference. Autoencoders, a form of generative model, may be trained by learningto reconstruct unlabelled input data from a latent representation space. Morerobust representations may be produced by an autoencoder if it learns torecover clean input samples from corrupted ones. Representations may be furtherimproved by introducing regularisation during training to shape thedistribution of the encoded data in latent space. We suggest denoisingadversarial autoencoders, which combine denoising and regularisation, shapingthe distribution of latent space using adversarial training. We introduce anovel analysis that shows how denoising may be incorporated into the trainingand sampling of adversarial autoencoders. Experiments are performed to assessthe contributions that denoising makes to the learning of representations forclassification and sample synthesis. Our results suggest that autoencoderstrained using a denoising criterion achieve higher classification performance,and can synthesise samples that are more consistent with the input data thanthose trained without a corruption process.

    Eleftheriadis S, Nicholson TFW, Deisenroth MP, Hensman Jet al.,

    Identification of Gaussian Process State Space Models

    , Advances in Neural Information Processing Systems

    The Gaussian process state space model (GPSSM) is a non-linear dynamicalsystem, where unknown transition and/or measurement mappings are described byGPs. Most research in GPSSMs has focussed on the state estimation problem.However, the key challenge in GPSSMs has not been satisfactorily addressed yet:system identification. To address this challenge, we impose a structuredGaussian variational posterior distribution over the latent states, which isparameterised by a recognition model in the form of a bi-directional recurrentneural network. Inference with this structure allows us to recover a posteriorsmoothed over the entire sequence(s) of data. We provide a practical algorithmfor efficiently computing a lower bound on the marginal likelihood using thereparameterisation trick. This additionally allows arbitrary kernels to be usedwithin the GPSSM. We demonstrate that we can efficiently generate plausiblefuture trajectories of the system we seek to model with the GPSSM, requiringonly a small number of interactions with the true system.

    Chamberlain B, Liu CHB, Cardoso A, Pagliari R, Deisenroth MPet al., 2017,

    Customer life time value prediction using embeddings

    , 23rd ACM SIGKDD Conference on Knowledge Discovery and Data Mining, Publisher: ACM

    We describe the Customer Life Time Value (CLTV) prediction sys-tem deployed at, a global online fashion retailer. CLTVprediction is an important problem in e-commerce where an accu-rate estimate of future value allows retailers to effectively allocatemarketing spend, identify and nurture high value customers andmitigate exposure to losses.The system at ASOS provides dailyestimates of the future value of every customer and is one of thecornerstones of the personalised shopping experience. The state ofthe art in this domain uses large numbers of handcrafted featuresand ensemble regressors to forecast value, predict churn and evalu-ate customer loyalty. We describe our system, which adopts thisapproach, and our ongoing e‚orts to further improve it. Recently,domains including language, vision and speech have shown dra-matic advances by replacing hand-crafted features with featuresthat are learned automatically from data. We show that learningfeature representations is a promising extension to the state of theart in CLTV modeling. We propose a novel way to generate embed-dings of customers which addresses the issue of the ever changingproduct catalogue and obtain a signi€cant improvement over anexhaustive set of handcrafted features.

    Eleftheriadis S, Rudovic O, Deisenroth MP, Pantic Met al., 2017,

    Variational gaussian process auto-Encoder for ordinal prediction of facial action units

    , Pages: 154-170, ISSN: 0302-9743

    © Springer International Publishing AG 2017. We address the task of simultaneous feature fusion and modeling of discrete ordinal outputs. We propose a novel Gaussian process (GP) autoencoder modeling approach. In particular, we introduce GP encoders to project multiple observed features onto a latent space, while GP decoders are responsible for reconstructing the original features. Inference is performed in a novel variational framework, where the recovered latent representations are further constrained by the ordinal output labels. In this way, we seamlessly integrate the ordinal structure in the learned manifold, while attaining robust fusion of the input features. We demonstrate the representation abilities of our model on benchmark datasets from machine learning and affect analysis. We further evaluate the model on the tasks of feature fusion and joint ordinal prediction of facial action units. Our experiments demonstrate the benefits of the proposed approach compared to the state of the art.

    Eleftheriadis S, Rudovic O, Deisenroth MP, Pantic Met al., 2017,

    Gaussian Process Domain Experts for Modeling of Facial Affect

    , IEEE TRANSACTIONS ON IMAGE PROCESSING, Vol: 26, Pages: 4697-4711, ISSN: 1057-7149
    Filippi SL, Zhang Q, Flaxman S, Sejdinovic Det al., 2017,

    Feature-to-Feature Regression for a Two-Step Conditional Independence Test

    , Uncertainty in Artificial Intelligence
    Jahani E, Sundsøy P, Bjelland J, Bengtsson L, Pentland AS, de Montjoye Y-Aet al., 2017,

    Improving official statistics in emerging markets using machine learning and mobile phone data

    , EPJ Data Science, Vol: 6
    Kupcsik A, Deisenroth MP, Peters J, Loh AP, Vadakkepat P, Neumann Get al., 2017,

    Model-based contextual policy search for data-efficient generalization of robot skills

    , Artificial Intelligence, Vol: 247, Pages: 415-439, ISSN: 0004-3702

    © 2014 Elsevier B.V. In robotics, lower-level controllers are typically used to make the robot solve a specific task in a fixed context. For example, the lower-level controller can encode a hitting movement while the context defines the target coordinates to hit. However, in many learning problems the context may change between task executions. To adapt the policy to a new context, we utilize a hierarchical approach by learning an upper-level policy that generalizes the lower-level controllers to new contexts. A common approach to learn such upper-level policies is to use policy search. However, the majority of current contextual policy search approaches are model-free and require a high number of interactions with the robot and its environment. Model-based approaches are known to significantly reduce the amount of robot experiments, however, current model-based techniques cannot be applied straightforwardly to the problem of learning contextual upper-level policies. They rely on specific parametrizations of the policy and the reward function, which are often unrealistic in the contextual policy search formulation. In this paper, we propose a novel model-based contextual policy search algorithm that is able to generalize lower-level controllers, and is data-efficient. Our approach is based on learned probabilistic forward models and information theoretic policy search. Unlike current algorithms, our method does not require any assumption on the parametrization of the policy or the reward function. We show on complex simulated robotic tasks and in a real robot experiment that the proposed learning framework speeds up the learning process by up to two orders of magnitude in comparison to existing methods, while learning high quality policies.

This data is extracted from the Web of Science and reproduced under a licence from Thomson Reuters. You may not copy or re-distribute this data in whole or in part without the written consent of the Science business of Thomson Reuters.

Request URL: Request URI: /respub/WEB-INF/jsp/search-t4-html.jsp Query String: id=954&limit=10&page=1&respub-action=search.html Current Millis: 1508694303788 Current Time: Sun Oct 22 18:45:03 BST 2017