- Showing results for:
- Reset all filters
Conference paperKormushev P, Calinon S, Caldwell DG, 2010,
Robot Motor Skill Coordination with EM-based Reinforcement Learning, Pages: 3232-3237
Conference paperFilippi S, Cappe O, Garivier A, 2010,
We consider model-based reinforcement learning in finite Markov De- cisionProcesses (MDPs), focussing on so-called optimistic strategies. In MDPs,optimism can be implemented by carrying out extended value it- erations under aconstraint of consistency with the estimated model tran- sition probabilities.The UCRL2 algorithm by Auer, Jaksch and Ortner (2009), which follows thisstrategy, has recently been shown to guarantee near-optimal regret bounds. Inthis paper, we strongly argue in favor of using the Kullback-Leibler (KL)divergence for this purpose. By studying the linear maximization problem underKL constraints, we provide an ef- ficient algorithm, termed KL-UCRL, forsolving KL-optimistic extended value iteration. Using recent deviation boundson the KL divergence, we prove that KL-UCRL provides the same guarantees asUCRL2 in terms of regret. However, numerical experiments on classicalbenchmarks show a significantly improved behavior, particularly when the MDPhas reduced connectivity. To support this observation, we provide elements ofcom- parison between the two algorithms based on geometric considerations.
Conference paperFilippi S, Cappe O, Garivier A, et al., 2010,
Parametric bandits: The generalized linear case, Neural Information Processing Systems (NIPS’2010)
This data is extracted from the Web of Science and reproduced under a licence from Thomson Reuters. You may not copy or re-distribute this data in whole or in part without the written consent of the Science business of Thomson Reuters.