Imperial College London

DrPetarKormushev

Faculty of EngineeringDyson School of Design Engineering

Lecturer
 
 
 
//

Contact

 

+44 (0)20 7594 9235p.kormushev Website

 
 
//

Location

 

10-12 Prince's GardensSouth Kensington Campus

//

Summary

 

Publications

Publication Type
Year
to

74 results found

Ahmadzadeh SR, Mastrogiovanni F, Kormushev P, 2016, Visuospatial Skill Learning for Robots

A novel skill learning approach is proposed that allows a robot to acquirehuman-like visuospatial skills for object manipulation tasks. Visuospatialskills are attained by observing spatial relationships among objects throughdemonstrations. The proposed Visuospatial Skill Learning (VSL) is a goal-basedapproach that focuses on achieving a desired goal configuration of objectsrelative to one another while maintaining the sequence of operations. VSL iscapable of learning and generalizing multi-operation skills from a singledemonstration, while requiring minimum prior knowledge about the objects andthe environment. In contrast to many existing approaches, VSL offerssimplicity, efficiency and user-friendly human-robot interaction. We also showthat VSL can be easily extended towards 3D object manipulation tasks, simply byemploying point cloud processing techniques. In addition, a robot learningframework, VSL-SP, is proposed by integrating VSL, Imitation Learning, and aconventional planning method. In VSL-SP, the sequence of performed actions arelearned using VSL, while the sensorimotor skills are learned using aconventional trajectory-based learning approach. such integration easilyextends robot capabilities to novel situations, even by users withoutprogramming ability. In VSL-SP the internal planner of VSL is integrated withan existing action-level symbolic planner. Using the underlying constraints ofthe task and extracted symbolic predicates, identified by VSL, symbolicrepresentation of the task is updated. Therefore the planner maintains ageneralized representation of each skill as a reusable action, which can beused in planning and performed independently during the learning phase. Theproposed approach is validated through several real-world experiments.

BOOK CHAPTER

Jamisola RS, Kormushev PS, Roberts RG, Caldwell DGet al., 2016, Task-Space Modular Dynamics for Dual-Arms Expressed through a Relative Jacobian, JOURNAL OF INTELLIGENT & ROBOTIC SYSTEMS, Vol: 83, Pages: 205-218, ISSN: 0921-0296

JOURNAL ARTICLE

Kormushev PS, 2016, Robot Learning for Persistent Autonomy, Handling Uncertainty and Networked Structure in Robot Control, Publisher: Springer, ISBN: 9783319263274

Chapter. 1. Robot. Learning. for. Persistent. Autonomy. Petar Kormushev and Seyed Reza Ahmadzadeh Abstract Autonomous robots are not very good at being autonomous. They work well in structured environments, but fail quickly in the real ...

BOOK CHAPTER

Palomeras N, Carrera A, Hurtós N, Karras GC, Bechlioulis CP, Cashmore M, Magazzeni D, Long D, Fox M, Kyriakopoulos KJ, Kormushev P, Salvi J, Carreras Met al., 2016, Toward persistent autonomous intervention in a subsea panel, Autonomous Robots, Vol: 40, Pages: 1279-1306, ISSN: 0929-5593

JOURNAL ARTICLE

Ahmadzadeh SR, Paikan A, Mastrogiovanni F, Natale L, Kormushev P, Caldwell DGet al., 2015, Learning Symbolic Representations of Actions from Human Demonstrations, ICRA 2015, Publisher: IEEE, Pages: 3801-3808

In this paper, a robot learning approach is pro-posed which integrates Visuospatial Skill Learning, ImitationLearning, and conventional planning methods. In our approach,the sensorimotor skills (i.e., actions) are learned through alearning from demonstration strategy. The sequence of per-formed actions is learned through demonstrations using Visu-ospatial Skill Learning. A standard action-level planner is usedto represent a symbolic description of the skill, which allows thesystem to represent the skill in a discrete, symbolic form. TheVisuospatial Skill Learning module identifies the underlyingconstraints of the task and extracts symbolic predicates (i.e.,action preconditions and effects), thereby updating the plannerrepresentation while the skills are being learned. Therefore theplanner maintains a generalized representation of each skill asa reusable action, which can be planned and performed inde-pendently during the learning phase. Preliminary experimentalresults on the iCub robot are presented.

CONFERENCE PAPER

Bimbo J, Kormushev P, Althoefer K, Liu Het al., 2015, Global estimation of an object’s pose using tactile sensing, Advanced Robotics, Vol: 29, Pages: 363-374, ISSN: 0169-1864

JOURNAL ARTICLE

Carrera A, Palomeras N, Hurtos N, Kormushev P, Carreras Met al., 2015, Learning multiple strategies to perform a valve turning with underwater currents using an I-AUV, MTS/IEEE OCEANS 2015, Publisher: IEEE

Recent efforts in the field of interventionautonomousunderwater vehicles (I-AUVs) have started to showpromising results in simple manipulation tasks. However, thereis still a long way to go to reach the complexity of the taskscarried out by ROV pilots. This paper proposes an interventionframework based on parametric Learning by Demonstration(p-LbD) techniques in order to acquire multiple strategies toperform an autonomous intervention task adapted to differentenvironment conditions. The manipulation skills of a pilot areacquired thought a set of demonstrations done under differentenvironment circumstances, in our case different levels of watercurrent. The proposed algorithm is able to learn these differentstrategies and depending on the estimated water current,autonomously reproduce a combined strategy to perform thetask. The p-LbD algorithm as well as its interplay with the restof the modules that take part in the proposed framework aredescribed in this paper. We also present results on a free-floatingvalve turning task, using the Girona 500 I-AUV equipped witha manipulator and a customized end-effector. The obtainedresults show the feasibility of the p-LbD algorithm to performautonomous intervention tasks combining the learned strategiesdepending on the environment conditions.

CONFERENCE PAPER

Carrera A, Palomeras N, Hurtós N, Kormushev P, Carreras Met al., 2015, Cognitive system for autonomous underwater intervention, Pattern Recognition Letters, Vol: 67, Pages: 91-99, ISSN: 0167-8655

JOURNAL ARTICLE

Jamali N, Kormushev P, Carrera A, Carreras M, Caldwell DGet al., 2015, Underwater robot-object contact perception using machine learning on force/torque sensor feedback, ICRA 2015, Publisher: IEEE, Pages: 3915-3920

Autonomous manipulation of objects requires re-liable information on robot-object contact state. Underwaterenvironments can adversely affect sensing modalities such asvision, making them unreliable. In this paper we investi-gate underwater robot-object contact perception between anautonomous underwater vehicle and a T-bar valve using aforce/torque sensor and the robot’s proprioceptive information.We present an approach in which machine learning is used tolearn a classifier for different contact states, namely, a contactaligned with the central axis of the valve, an edge contact andno contact. To distinguish between different contact states, therobot performs an exploratory behavior that produces distinctpatterns in the force/torque sensor. The sensor output formsa multidimensional time-series. A probabilistic clustering algo-rithm is used to analyze the time-series. The algorithm dissectsthe multidimensional time-series into clusters, producing a one-dimensional sequence of symbols. The symbols are used to traina hidden Markov model, which is subsequently used to predictnovel contact conditions. We show that the learned classifiercan successfully distinguish the three contact states with anaccuracy of 72% ± 12 %.

CONFERENCE PAPER

Jamisola RS, Kormushev P, Caldwell DG, Ibikunle Fet al., 2015, Modular Relative Jacobian for Dual-Arms and the Wrench Transformation Matrix, 7th IEEE International Conference on Cybernetics and Intelligent Systems (CIS) and the 7th IEEE International Conference on Robotics, Automation and Mechatronics (RAM), Publisher: IEEE

A modular relative Jacobian is recently derivedand is expressed in terms of the individual Jacobians ofstand-alone manipulators. It includes a wrench transformationmatrix, which was not shown in earlier expressions. This paperis an experimental extension of that recent work, which showedthat at higher angular end-effector velocities the contributionof the wrench transformation matrix cannot be ignored. Inthis work, we investigate the dual-arm force control performance,without necessarily driving the end-effectors at higherangular velocities. We compare experimental results for twocases: modular relative Jacobian with and without the wrenchtransformation matrix. The experimental setup is a dual-armsystem consisting of two KUKA LWR robots. Two experimentaltasks are used: relative end-effector motion and coordinatedindependent tasks, where a force controller is implemented inboth tasks. Furthermore, we show in an experimental designthat the use of a relative Jacobian affords less accurate taskspecifications for a highly complicated task requirement forboth end-effectors of the dual-arm. Experimental results onthe force control performance are compared and analyzed.

CONFERENCE PAPER

Kormushev P, Demiris Y, Caldwell DG, 2015, Encoderless Position Control of a Two-Link Robot Manipulator, IEEE International Conference on Robotics and Automation (ICRA), Publisher: IEEE COMPUTER SOC, Pages: 943-949, ISSN: 1050-4729

CONFERENCE PAPER

Kormushev P, Demiris Y, Caldwell DG, 2015, Kinematic-free Position Control of a 2-DOF Planar Robot Arm, IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Publisher: IEEE, Pages: 5518-5525, ISSN: 2153-0858

CONFERENCE PAPER

Kryczka P, Kormushev P, Tsagarakis N, Caldwell DGet al., 2015, Online regeneration of bipedal walking gait optimizing footstep placement and timing, IROS 2015, Publisher: IEEE

We propose a new algorithm capable of onlineregeneration of walking gait patterns. The algorithm uses anonlinear optimization technique to find step parameters thatwill bring the robot from the present state to a desired state. Itmodifies online not only the footstep positions, but also the steptiming in order to maintain dynamic stability during walking.Inclusion of step time modification extends the robustnessagainst rarely addressed disturbances, such as pushes towardsthe stance foot. The controller is able to recover dynamicstability regardless of the source of the disturbance (e.g. modelinaccuracy, reference tracking error or external disturbance).We describe the robot state estimation and center-of-massfeedback controller necessary to realize stable locomotion onour humanoid platform COMAN. We also present a set ofexperiments performed on the platform that show the per-formance of the feedback controller and of the gait patternregenerator. We show how the robot is able to cope with seriesof pushes, by adjusting step times and positions.

CONFERENCE PAPER

Lane DM, Maurelli F, Kormushev P, Carreras M, Fox M, Kyriakopoulos Ket al., 2015, PANDORA - Persistent Autonomy through Learning, Adaptation, Observation and Replanning, IFAC Workshop on Navigation, Guidance and Control of Underwater Vehicles (NGCUV’2015)

PANDORA is a EU FP7 project that is developing new computational methodsto make underwater robots Persistently Autonomous, significantly reducing the frequency ofassistance requests. The aim of the project is to extend the range of tasks that can be carried onautonomously and increase their complexity while reducing the need for operator assistances.Dynamic adaptation to the change of conditions is very important while addressing autonomyin the real world and not just in well-known situation. The key of Pandora is the ability torecognise failure and respond to it, at all levels of abstraction. Under the guidance of majorindustrial players, validation tasks of inspection, cleaning and valve turning have been trialledwith partners’ AUVs in Scotland and Spain.

CONFERENCE PAPER

Takano W, Asfour T, Kormushev P, 2015, Special Issue on Humanoid Robotics, Advanced Robotics, Vol: 29

JOURNAL ARTICLE

Ahmadzadeh SR, Carrera A, Leonetti M, Kormushev P, Caldwell DGet al., 2014, Online Discovery of AUV Control Policies to Overcome Thruster Failures, ICRA 2014, Publisher: IEEE, Pages: 6522-6528

We investigate methods to improve fault-tolerance of Autonomous Underwater Vehicles (AUVs) to increase their reliability and persistent autonomy. We propose a learning-based approach that is able to discover new control policies to overcome thruster failures as they happen. The proposed approach is a model-based direct policy search that learns on an on-board simulated model of the AUV. The model is adapted to a new condition when a fault is detected and isolated. Since the approach generates an optimal trajectory, the learned fault-tolerant policy is able to navigate the AUV towards a specified target with minimum cost. Finally, the learned policy is executed on the real robot in a closed-loop using the state feedback of the AUV. Unlike most existing methods which rely on the redundancy of thrusters, our approach is also applicable when the AUV becomes under-actuated in the presence of a fault. To validate the feasibility and efficiency of the presented approach, we evaluate it with three learning algorithms and three policy representations with increasing complexity. The proposed method is tested on a real AUV, Girona500.

CONFERENCE PAPER

Ahmadzadeh SR, Jamisola RS, Kormushev P, Caldwell DGet al., 2014, Learning reactive robot behavior for autonomous valve turning, 2014 14th IEEE-RAS International Conference on Humanoid Robots (Humanoids), Publisher: IEEE, Pages: 366-373

CONFERENCE PAPER

Ahmadzadeh SR, Kormushev P, Caldwell DG, 2014, Multi-Objective Reinforcement Learning for AUV Thruster Failure Recovery, ADPRL 2014, Publisher: IEEE, Pages: 1-8

This paper investigates learning approaches for discovering fault-tolerant control policies to overcome thruster failures in Autonomous Underwater Vehicles (AUV). The proposed approach is a model-based direct policy search that learns on an on-board simulated model of the vehicle. When a fault is detected and isolated the model of the AUV is reconfigured according to the new condition. To discover a set of optimal solutions a multi-objective reinforcement learning approach is employed which can deal with multiple conflicting objectives. Each optimal solution can be used to generate a trajectory that is able to navigate the AUV towards a specified target while satisfying multiple objectives. The discovered policies are executed on the robot in a closed-loop using AUV's state feedback. Unlike most existing methods which disregard the faulty thruster, our approach can also deal with partially broken thrusters to increase the persistent autonomy of the AUV. In addition, the proposed approach is applicable when the AUV either becomes under-actuated or remains redundant in the presence of a fault. We validate the proposed approach on the model of the Girona500 AUV.

CONFERENCE PAPER

Carrera A, Karras G, Bechlioulis C, Palomeras N, Hurtos N, Kyriakopoulos K, Kormushev P, Carreras Met al., 2014, Improving a Learning by Demonstration framework for Intervention AUVs by means of an UVMS controller

CONFERENCE PAPER

Carrera A, Palomeras N, Hurtos N, Kormushev P, Carreras Met al., 2014, Learning by demonstration applied to underwater intervention, Seventeenth International Conference of the Catalan Association of Artificial Intelligence (CCIA 2014)

Performing subsea intervention tasks is a challenge due to the complexitiesof the underwater domain. We propose to use a learning by demonstraitionalgorithm to intuitively teach an intervention autonomous underwater vehicle (IAUV)how to perform a given task. Taking as an input few operator demonstrations,the algorithm generalizes the task into a model and simultaneously controlsthe vehicle and the manipulator (using 8 degrees of freedom) to reproduce the task.A complete framework has been implemented in order to integrate the LbD algorithmwith the different onboard sensors and actuators. A valve turning interventiontask is used to validate the full framework through real experiments conducted in awater tank.

CONFERENCE PAPER

Carrera A, Palomeras N, Ribas D, Kormushev P, Carreras Met al., 2014, An Intervention-AUV learns how to perform an underwater valve turning, OCEANS 2014, Publisher: IEEE, Pages: 1-7

Intervention autonomous underwater vehicles (I-AUVs) are a promising platform to perform intervention task in underwater environments, replacing current methods like remotely operate underwater vehicles (ROVs) and manned sub-mersibles that are more expensive. This article proposes a complete system including all the necessary elements to perform a valve turning task using an I-AUV. The knowledge of an operator to perform the task is transmitted to an I-AUV by a learning by demonstration (LbD) algorithm. The algorithm learns the trajectory of the vehicle and the end-effector to accomplish the valve turning. The method has shown its feasibility in a controlled environment repeating the learned task with different valves and configurations.

CONFERENCE PAPER

Dallali H, Kormushev P, Tsagarakis N, Caldwell DGet al., 2014, Can Active Impedance Protect Robots from Landing Impact?, 2014 14th IEEE-RAS International Conference on Humanoid Robots (Humanoids), Publisher: IEEE, Pages: 1022-1027

This paper studies the effect of passive and active impedance for protecting jumping robots from landing impacts. The theory of force transmissibility is used for selecting the passive impedance of the system to minimize the shock propagation. The active impedance is regulated online by a joint-level controller. On top of this controller, a reflex-based leg retraction scheme is implemented which is optimized using direct policy search reinforcement learning based on particle filtering. Experiments are conducted both in simulation and on a real-world hopping leg. We show that although the impact dynamics is fast, the addition of passive impedance provides enough time for the active impedance controller to react to the impact and protect the robot from damage.

CONFERENCE PAPER

Jamali N, Kormushev P, Ahmadzadeh SR, Caldwell DGet al., 2014, Covariance Analysis as a Measure of Policy Robustness in Reinforcement Learning, OCEANS'14 MTS/IEEE

—In this paper we propose covariance analysis as ametric for reinforcement learning to improve the robustness ofa learned policy. The local optima found during the explorationare analyzed in terms of the total cumulative reward and thelocal behavior of the system in the neighborhood of the optima.The analysis is performed in the solution space to select a policythat exhibits robustness in uncertain and noisy environments.We demonstrate the utility of the method using our previouslydeveloped system where an autonomous underwater vehicle(AUV) has to recover from a thruster failure. When a failure isdetected the recovery system is invoked, which uses simulationsto learn a new controller that utilizes the remaining functioningthrusters to achieve the goal of the AUV, that is, to reach a targetposition. In this paper, we use covariance analysis to examinethe performance of the top, n, policies output by the previousalgorithm. We propose a scoring metric that uses the output ofthe covariance analysis, the time it takes the AUV to reach thetarget position and the distance between the target position andthe AUV’s final position. The top polices are simulated in a noisyenvironment and evaluated using the proposed scoring metric toanalyze the effect of noise on their performance. The policy thatexhibits more tolerance to noise is selected. We show experimentalresults where covariance analysis successfully selects a morerobust policy that was ranked lower by the original algorithm.

CONFERENCE PAPER

Jamali N, Kormushev P, Caldwell DG, 2014, Robot-Object Contact Perception using Symbolic Temporal Pattern Learning, ICRA 2014, Publisher: IEEE, Pages: 6542-6548

This paper investigates application of machine learning to the problem of contact perception between a robot's gripper and an object. The input data comprises a multidimensional time-series produced by a force/torque sensor at the robot's wrist, the robot's proprioceptive information, namely, the position of the end-effector, as well as the robot's control command. These data are used to train a hidden Markov model (HMM) classifier. The output of the classifier is a prediction of the contact state, which includes no contact, a contact aligned with the central axis of the valve, and an edge contact. To distinguish between contact states, the robot performs exploratory behaviors that produce distinct patterns in the time-series data. The patterns are discovered by first analyzing the data using a probabilistic clustering algorithm that transforms the multidimensional data into a one-dimensional sequence of symbols. The symbols produced by the clustering algorithm are used to train the HMM classifier. We examined two exploratory behaviors: a rotation around the x-axis, and a rotation around the y-axis of the gripper. We show that using these two exploratory behaviors we can successfully predict a contact state with an accuracy of 88 ± 5 % and 81 ± 10 %, respectively.

CONFERENCE PAPER

Jamisola RS, Kormushev P, Bicchi A, Caldwell DGet al., 2014, Haptic Exploration of Unknown Surfaces with Discontinuities, 2014 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS 2014), Publisher: IEEE, Pages: 1255-1260

This work presents an approach for exploring unknown surfaces with discontinuities using only force/torque information. The motivation is to build an information map of an unknown object or environment by performing a fully-autonomous haptic exploration. Examples of discontinuities considered here are contours with sharp turns (such as wall corners) and abrupt dips (such as cliffs). Compliant motion control using force information has the ability to conform to unknown, smooth surfaces but not to discontinuous surfaces. This paper investigates solutions to address the limitation in compliant motion control over discontinuities while maintaining a desired normal force along the surface. We propose two methods to address the problem: (1) superposition of motion and force control and (2) rotation of axes for force and motion control. The theoretical principles are discussed and experimental results with a KUKA lightweight arm moving in 2D space are presented. Both approaches successfully negotiate objects with sharp 90-degree and 120-degree turns while still maintaining good tracking of the desired force.

CONFERENCE PAPER

Ahmadzadeh SR, Kormushev P, Caldwell DG, 2013, Visuospatial skill learning for object reconfiguration tasks, Pages: 685-691, ISSN: 2153-0858

We present a novel robot learning approach based on visual perception that allows a robot to acquire new skills by observing a demonstration from a tutor. Unlike most existing learning from demonstration approaches, where the focus is placed on the trajectories, in our approach the focus is on achieving a desired goal configuration of objects relative to one another. Our approach is based on visual perception which captures the object's context for each demonstrated action. This context is the basis of the visuospatial representation and encodes implicitly the relative positioning of the object with respect to multiple other objects simultaneously. The proposed approach is capable of learning and generalizing multi-operation skills from a single demonstration, while requiring minimum a priori knowledge about the environment. The learned skills comprise a sequence of operations that aim to achieve the desired goal configuration using the given objects. We illustrate the capabilities of our approach using three object reconfiguration tasks with a Barrett WAM robot. © 2013 IEEE.

CONFERENCE PAPER

Ahmadzadeh SR, Kormushev P, Caldwell DG, 2013, Autonomous robotic valve turning: A hierarchical learning approach, 2013 IEEE International Conference on Robotics and Automation (ICRA), Publisher: IEEE, Pages: 4629-4634, ISSN: 1050-4729

Autonomous valve turning is an extremely challenging task for an Autonomous Underwater Vehicle (AUV). To resolve this challenge, this paper proposes a set of different computational techniques integrated in a three-layer hierarchical scheme. Each layer realizes specific subtasks to improve the persistent autonomy of the system. In the first layer, the robot acquires the motor skills of approaching and grasping the valve by kinesthetic teaching. A Reactive Fuzzy Decision Maker (RFDM) is devised in the second layer which reacts to the relative movement between the valve and the AUV, and alters the robot's movement accordingly. Apprenticeship learning method, implemented in the third layer, performs tuning of the RFDM based on expert knowledge. Although the long-term goal is to perform the valve turning task on a real AUV, as a first step the proposed approach is tested in a laboratory environment. © 2013 IEEE.

CONFERENCE PAPER

Ahmadzadeh SR, Kormushev P, Caldwell DG, 2013, Interactive robot learning of visuospatial skills, ICAR 2013, Publisher: IEEE, Pages: 1-8

This paper proposes a novel interactive robot learning approach for acquiring visuospatial skills. It allows a robot to acquire new capabilities by observing a demonstration while interacting with a human caregiver. Most existing learning from demonstration approaches focus on the trajectories, whereas in our approach the focus is placed on achieving a desired goal configuration of objects relative to one another. Our approach is based on visual perception which captures the object's context for each demonstrated action. The context embodies implicitly the visuospatial representation including the relative positioning of the object with respect to multiple other objects simultaneously. The proposed approach is capable of learning and generalizing different skills such as object reconfiguration, classification, and turn-taking interaction. The robot learns to achieve the goal from a single demonstration while requiring minimum a priori knowledge about the environment. We illustrate the capabilities of our approach using four real world experiments with a Barrett WAM robot.

CONFERENCE PAPER

Ahmadzadeh SR, Leonetti M, Kormushev P, 2013, Online Direct Policy Search for Thruster Failure Recovery in Autonomous Underwater Vehicles, ECAL 2013 12th European Conference on Artificial Life

Autonomous underwater vehicles are prone to variousfactors that may lead a mission to fail and cause unrecoverable damages.Even robust controllers cannot make sure that the robot is ableto navigate to a safe location in such situations. In this paper wepropose an online learning method for reconfiguring the controller,which tries to recover the robot and survive the mission using thecurrent asset of the system. The proposed method is framed in thereinforcement learning setting, and in particular as a model-baseddirect policy search approach. Since learning on a damaged vehiclewould be impossible owing to time and energy constraints, learningis performed on a model which is identified and kept updatedonline. We evaluate the applicability of our method with differentpolicy representations and learning algorithms, on the model of thevehicle Girona500.

CONFERENCE PAPER

Calinon S, Kormushev P, Caldwell DG, 2013, Compliant skills acquisition and multi-optima policy search with EM-based reinforcement learning, Robotics and Autonomous Systems, Vol: 61, Pages: 369-379, ISSN: 0921-8890

The democratization of robotics technology and the development of new actuators progressively bring robots closer to humans. The applications that can now be envisaged drastically contrast with the requirements of industrial robots. In standard manufacturing settings, the criterions used to assess performance are usually related to the robot's accuracy, repeatability, speed or stiffness. Learning a control policy to actuate such robots is characterized by the search of a single solution for the task, with a representation of the policy consisting of moving the robot through a set of points to follow a trajectory. With new environments such as homes and offices populated with humans, the reproduction performance is portrayed differently. These robots are expected to acquire rich motor skills that can be generalized to new situations, while behaving safely in the vicinity of users. Skills acquisition can no longer be guided by a single form of learning, and must instead combine different approaches to continuously create, adapt and refine policies. The family of search strategies based on expectation-maximization (EM) looks particularly promising to cope with these new requirements. The exploration can be performed directly in the policy parameters space, by refining the policy together with exploration parameters represented in the form of covariances. With this formulation, RL can be extended to a multi-optima search problem in which several policy alternatives can be considered. We present here two applications exploiting EM-based exploration strategies, by considering parameterized policies based on dynamical systems, and by using Gaussian mixture models for the search of multiple policy alternatives. © 2012 Elsevier B.V. All rights reserved.

JOURNAL ARTICLE

This data is extracted from the Web of Science and reproduced under a licence from Thomson Reuters. You may not copy or re-distribute this data in whole or in part without the written consent of the Science business of Thomson Reuters.

Request URL: http://wlsprd.imperial.ac.uk:80/respub/WEB-INF/jsp/search-html.jsp Request URI: /respub/WEB-INF/jsp/search-html.jsp Query String: respub-action=search.html&id=00873100&limit=30&person=true