Search or filter publications

Filter by type:

Filter by publication type

Filter by year:



  • Showing results for:
  • Reset all filters

Search results

  • Conference paper
    Lane DM, Maurelli F, Kormushev P, Carreras M, Fox M, Kyriakopoulos Ket al., 2012,

    Persistent Autonomy: the Challenges of the PANDORA Project

  • Conference paper
    Kormushev P, Ugurlu B, Calinon S, Tsagarakis N, Caldwell DGet al., 2011,

    Bipedal Walking Energy Minimization by Reinforcement Learning with Evolving Policy Parameterization

    , Pages: 318-324
  • Journal article
    Filippi S, Cappe O, Garivier A, 2011,

    Optimally Sensing a Single Channel Without Prior Information: The Tiling Algorithm and Regret Bounds

    , IEEE Journal of Selected Topics in Signal Processing
  • Journal article
    Kormushev P, Nomoto K, Dong F, Hirota Ket al., 2011,

    Time Hopping Technique for Faster Reinforcement Learning in Simulations

    , International Journal of Cybernetics and Information Technologies, Vol: 11, Pages: 42-59
  • Conference paper
    Filippi S, Cappe O, Garivier A, 2010,

    Optimism in Reinforcement Learning and Kullback-Leibler Divergence

    , ALLERTON 2010

    We consider model-based reinforcement learning in finite Markov De- cisionProcesses (MDPs), focussing on so-called optimistic strategies. In MDPs,optimism can be implemented by carrying out extended value it- erations under aconstraint of consistency with the estimated model tran- sition probabilities.The UCRL2 algorithm by Auer, Jaksch and Ortner (2009), which follows thisstrategy, has recently been shown to guarantee near-optimal regret bounds. Inthis paper, we strongly argue in favor of using the Kullback-Leibler (KL)divergence for this purpose. By studying the linear maximization problem underKL constraints, we provide an ef- ficient algorithm, termed KL-UCRL, forsolving KL-optimistic extended value iteration. Using recent deviation boundson the KL divergence, we prove that KL-UCRL provides the same guarantees asUCRL2 in terms of regret. However, numerical experiments on classicalbenchmarks show a significantly improved behavior, particularly when the MDPhas reduced connectivity. To support this observation, we provide elements ofcom- parison between the two algorithms based on geometric considerations.

  • Conference paper
    Filippi S, Cappe O, Garivier A, Szepesvari Cet al., 2010,

    Parametric bandits: The generalized linear case

    , Neural Information Processing Systems (NIPS’2010)
  • Journal article
    Creswell A, Bharath AA,

    Task Specific Adversarial Cost Function

    The cost function used to train a generative model should fit the purpose ofthe model. If the model is intended for tasks such as generating perceptuallycorrect samples, it is beneficial to maximise the likelihood of a sample drawnfrom the model, Q, coming from the same distribution as the training data, P.This is equivalent to minimising the Kullback-Leibler (KL) distance, KL[Q||P].However, if the model is intended for tasks such as retrieval or classificationit is beneficial to maximise the likelihood that a sample drawn from thetraining data is captured by the model, equivalent to minimising KL[P||Q]. Thecost function used in adversarial training optimises the Jensen-Shannon entropywhich can be seen as an even interpolation between KL[Q||P] and KL[P||Q]. Here,we propose an alternative adversarial cost function which allows easy tuning ofthe model for either task. Our task specific cost function is evaluated on adataset of hand-written characters in the following tasks: Generation,retrieval and one-shot learning.

This data is extracted from the Web of Science and reproduced under a licence from Thomson Reuters. You may not copy or re-distribute this data in whole or in part without the written consent of the Science business of Thomson Reuters.

Request URL: Request URI: /respub/WEB-INF/jsp/search-t4-html.jsp Query String: id=954&limit=10&page=15&respub-action=search.html Current Millis: 1594448640484 Current Time: Sat Jul 11 07:24:00 BST 2020