Search or filter publications

Filter by type:

Filter by publication type

Filter by year:



  • Showing results for:
  • Reset all filters

Search results

  • Journal article
    Filippi S, Cappe O, Garivier A, 2011,

    Optimally Sensing a Single Channel Without Prior Information: The Tiling Algorithm and Regret Bounds

    , IEEE Journal of Selected Topics in Signal Processing
  • Journal article
    Kormushev P, Calinon S, Caldwell DG, 2011,

    Imitation Learning of Positional and Force Skills Demonstrated via Kinesthetic Teaching and Haptic Input

    , Advanced Robotics, Vol: 25, Pages: 581-603
  • Journal article
    Kormushev P, Nomoto K, Dong F, Hirota Ket al., 2011,

    Time Hopping Technique for Faster Reinforcement Learning in Simulations

    , International Journal of Cybernetics and Information Technologies, Vol: 11, Pages: 42-59
  • Conference paper
    Kormushev P, Calinon S, Caldwell DG, 2010,

    Robot Motor Skill Coordination with EM-based Reinforcement Learning

    , Pages: 3232-3237
  • Conference paper
    Filippi S, Cappe O, Garivier A, 2010,

    Optimism in Reinforcement Learning and Kullback-Leibler Divergence

    , ALLERTON 2010

    We consider model-based reinforcement learning in finite Markov De- cisionProcesses (MDPs), focussing on so-called optimistic strategies. In MDPs,optimism can be implemented by carrying out extended value it- erations under aconstraint of consistency with the estimated model tran- sition probabilities.The UCRL2 algorithm by Auer, Jaksch and Ortner (2009), which follows thisstrategy, has recently been shown to guarantee near-optimal regret bounds. Inthis paper, we strongly argue in favor of using the Kullback-Leibler (KL)divergence for this purpose. By studying the linear maximization problem underKL constraints, we provide an ef- ficient algorithm, termed KL-UCRL, forsolving KL-optimistic extended value iteration. Using recent deviation boundson the KL divergence, we prove that KL-UCRL provides the same guarantees asUCRL2 in terms of regret. However, numerical experiments on classicalbenchmarks show a significantly improved behavior, particularly when the MDPhas reduced connectivity. To support this observation, we provide elements ofcom- parison between the two algorithms based on geometric considerations.

  • Conference paper
    Filippi S, Cappe O, Garivier A, Szepesvari Cet al., 2010,

    Parametric bandits: The generalized linear case

    , Neural Information Processing Systems (NIPS’2010)

This data is extracted from the Web of Science and reproduced under a licence from Thomson Reuters. You may not copy or re-distribute this data in whole or in part without the written consent of the Science business of Thomson Reuters.

Request URL: Request URI: /respub/WEB-INF/jsp/search-t4-html.jsp Query String: id=954&limit=10&page=18&respub-action=search.html Current Millis: 1685635314011 Current Time: Thu Jun 01 17:01:54 BST 2023