Search or filter publications

Filter by type:

Filter by publication type

Filter by year:

to

Results

  • Showing results for:
  • Reset all filters

Search results

  • Conference paper
    Lane DM, Maurelli F, Kormushev P, Carreras M, Fox M, Kyriakopoulos Ket al., 2012,

    Persistent Autonomy: the Challenges of the PANDORA Project

  • Conference paper
    Kormushev P, Ugurlu B, Calinon S, Tsagarakis N, Caldwell DGet al., 2011,

    Bipedal Walking Energy Minimization by Reinforcement Learning with Evolving Policy Parameterization

    , Pages: 318-324
  • Journal article
    Kormushev P, Calinon S, Caldwell DG, 2011,

    Imitation Learning of Positional and Force Skills Demonstrated via Kinesthetic Teaching and Haptic Input

    , Advanced Robotics, Vol: 25, Pages: 581-603
  • Journal article
    Kormushev P, Nomoto K, Dong F, Hirota Ket al., 2011,

    Time Hopping Technique for Faster Reinforcement Learning in Simulations

    , International Journal of Cybernetics and Information Technologies, Vol: 11, Pages: 42-59
  • Journal article
    Filippi S, Cappe O, Garivier A, 2011,

    Optimally Sensing a Single Channel Without Prior Information: The Tiling Algorithm and Regret Bounds

    , IEEE Journal of Selected Topics in Signal Processing
  • Conference paper
    Kormushev P, Calinon S, Caldwell DG, 2010,

    Robot Motor Skill Coordination with EM-based Reinforcement Learning

    , Pages: 3232-3237
  • Conference paper
    Filippi S, Cappe O, Garivier A, 2010,

    Optimism in Reinforcement Learning and Kullback-Leibler Divergence

    , ALLERTON 2010

    We consider model-based reinforcement learning in finite Markov De- cisionProcesses (MDPs), focussing on so-called optimistic strategies. In MDPs,optimism can be implemented by carrying out extended value it- erations under aconstraint of consistency with the estimated model tran- sition probabilities.The UCRL2 algorithm by Auer, Jaksch and Ortner (2009), which follows thisstrategy, has recently been shown to guarantee near-optimal regret bounds. Inthis paper, we strongly argue in favor of using the Kullback-Leibler (KL)divergence for this purpose. By studying the linear maximization problem underKL constraints, we provide an ef- ficient algorithm, termed KL-UCRL, forsolving KL-optimistic extended value iteration. Using recent deviation boundson the KL divergence, we prove that KL-UCRL provides the same guarantees asUCRL2 in terms of regret. However, numerical experiments on classicalbenchmarks show a significantly improved behavior, particularly when the MDPhas reduced connectivity. To support this observation, we provide elements ofcom- parison between the two algorithms based on geometric considerations.

  • Conference paper
    Filippi S, Cappe O, Garivier A, Szepesvari Cet al., 2010,

    Parametric bandits: The generalized linear case

    , Neural Information Processing Systems (NIPS’2010)
  • Journal article
    Chappell D, Wang K, Kormushev P,

    Asynchronous Real-Time Optimization of Footstep Placement and Timing in Bipedal Walking Robots

    Online footstep planning is essential for bipedal walking robots to be ableto walk in the presence of disturbances. Until recently this has been achievedby only optimizing the placement of the footstep, keeping the duration of thestep constant. In this paper we introduce a footstep planner capable ofoptimizing footstep placement and timing in real-time by asynchronouslycombining two optimizers, which we refer to as asynchronous real-timeoptimization (ARTO). The first optimizer which runs at approximately 25 Hz,utilizes a fourth-order Runge-Kutta (RK4) method to accurately approximate thedynamics of the linear inverted pendulum (LIP) model for bipedal walking, thenuses non-linear optimization to find optimal footsteps and duration at a lowerfrequency. The second optimizer that runs at approximately 250 Hz, usesanalytical gradients derived from the full dynamics of the LIP model andconstraint penalty terms to perform gradient descent, which finds approximatelyoptimal footstep placement and timing at a higher frequency. By combining thetwo optimizers asynchronously, ARTO has the benefits of fast reactions todisturbances from the gradient descent optimizer, accurate solutions that avoidlocal optima from the RK4 optimizer, and increases the probability that afeasible solution will be found from the two optimizers. Experimentally, weshow that ARTO is able to recover from considerably larger pushes and producesfeasible solutions to larger reference velocity changes than a standardfootstep location optimizer, and outperforms using just the RK4 optimizeralone.

This data is extracted from the Web of Science and reproduced under a licence from Thomson Reuters. You may not copy or re-distribute this data in whole or in part without the written consent of the Science business of Thomson Reuters.

Request URL: http://wlsprd.imperial.ac.uk:80/respub/WEB-INF/jsp/search-t4-html.jsp Request URI: /respub/WEB-INF/jsp/search-t4-html.jsp Query String: id=954&limit=10&page=17&respub-action=search.html Current Millis: 1632813428030 Current Time: Tue Sep 28 08:17:08 BST 2021