Search or filter publications

Filter by type:

Filter by publication type

Filter by year:



  • Showing results for:
  • Reset all filters

Search results

    Filippi S, Cappe O, Garivier A, 2010,

    Optimism in Reinforcement Learning and Kullback-Leibler Divergence

    , ALLERTON 2010

    We consider model-based reinforcement learning in finite Markov De- cisionProcesses (MDPs), focussing on so-called optimistic strategies. In MDPs,optimism can be implemented by carrying out extended value it- erations under aconstraint of consistency with the estimated model tran- sition probabilities.The UCRL2 algorithm by Auer, Jaksch and Ortner (2009), which follows thisstrategy, has recently been shown to guarantee near-optimal regret bounds. Inthis paper, we strongly argue in favor of using the Kullback-Leibler (KL)divergence for this purpose. By studying the linear maximization problem underKL constraints, we provide an ef- ficient algorithm, termed KL-UCRL, forsolving KL-optimistic extended value iteration. Using recent deviation boundson the KL divergence, we prove that KL-UCRL provides the same guarantees asUCRL2 in terms of regret. However, numerical experiments on classicalbenchmarks show a significantly improved behavior, particularly when the MDPhas reduced connectivity. To support this observation, we provide elements ofcom- parison between the two algorithms based on geometric considerations.

    Filippi S, Cappe O, Garivier A, Szepesvari Cet al., 2010,

    Parametric bandits: The generalized linear case

    , Neural Information Processing Systems (NIPS’2010)

This data is extracted from the Web of Science and reproduced under a licence from Thomson Reuters. You may not copy or re-distribute this data in whole or in part without the written consent of the Science business of Thomson Reuters.

Request URL: Request URI: /respub/WEB-INF/jsp/search-t4-html.jsp Query String: id=954&limit=10&page=5&respub-action=search.html Current Millis: 1508634708257 Current Time: Sun Oct 22 02:11:48 BST 2017