92 results found
Kormushev P, Ugurlu B, Caldwell DG, et al., 2018, Learning to exploit passive compliance for energy-efficient gait generation on a compliant humanoid, Autonomous Robots, Pages: 1-17, ISSN: 0929-5593
© 2018 Springer Science+Business Media, LLC, part of Springer Nature Modern humanoid robots include not only active compliance but also passive compliance. Apart from improved safety and dependability, availability of passive elements, such as springs, opens up new possibilities for improving the energy efficiency. With this in mind, this paper addresses the challenging open problem of exploiting the passive compliance for the purpose of energy efficient humanoid walking. To this end, we develop a method comprising two parts: an optimization part that finds an optimal vertical center-of-mass trajectory, and a walking pattern generator part that uses this trajectory to produce a dynamically-balanced gait. For the optimization part, we propose a reinforcement learning approach that dynamically evolves the policy parametrization during the learning process. By gradually increasing the representational power of the policy parametrization, it manages to find better policies in a faster and computationally efficient way. For the walking generator part, we develop a variable-center-of-mass-height ZMP-based bipedal walking pattern generator. The method is tested in real-world experiments with the bipedal robot COMAN and achieves a significant 18% reduction in the electric energy consumption by learning to efficiently use the passive compliance of the robot.
Pardo F, Tavakoli A, Levdik V, et al., 2018, Time limits in reinforcement learning, International Conference on Machine Learning, Pages: 4042-4051
In reinforcement learning, it is common to let anagent interact for a fixed amount of time with itsenvironment before resetting it and repeating theprocess in a series of episodes. The task that theagent has to learn can either be to maximize itsperformance over (i) that fixed period, or (ii) anindefinite period where time limits are only usedduring training to diversify experience. In thispaper, we provide a formal account for how timelimits could effectively be handled in each of thetwo cases and explain why not doing so can causestate-aliasing and invalidation of experience re-play, leading to suboptimal policies and traininginstability. In case (i), we argue that the termi-nations due to time limits are in fact part of theenvironment, and thus a notion of the remainingtime should be included as part of the agent’s in-put to avoid violation of the Markov property. Incase (ii), the time limits are not part of the envi-ronment and are only used to facilitate learning.We argue that this insight should be incorporatedby bootstrapping from the value of the state atthe end of each partial episode. For both cases,we illustrate empirically the significance of ourconsiderations in improving the performance andstability of existing reinforcement learning algo-rithms, showing state-of-the-art results on severalcontrol tasks.
Saputra RP, Kormushev P, 2018, ResQbot: A Mobile Rescue Robot for Casualty Extraction, Pages: 239-240
© 2018 Authors. Performing search and rescue missions in disaster-struck environments is challenging. Despite the advances in the robotic search phase of the rescue missions, few works have been focused on the physical casualty extraction phase. In this work, we propose a mobile rescue robot that is capable of performing a safe casualty extraction routine. To perform this routine, this robot adopts a loco-manipulation approach. We have designed and built a mobile rescue robot platform called ResQbot as a proof of concept of the proposed system. We have conducted preliminary experiments using a sensorised human-sized dummy as a victim, to confirm that the platform is capable of performing a safe casualty extraction procedure.
Saputra RP, Kormushev P, 2018, Casualty Detection from 3D Point Cloud Data for Autonomous Ground Mobile Rescue Robots, SSRR 2018
Saputra RP, Kormushev P, 2018, Casualty Detection for Mobile Rescue Robots via Ground-Projected Point Clouds
Saputra RP, Kormushev P, 2018, ResQbot: A Mobile Rescue Robot with Immersive Teleperception for Casualty Extraction
Tavakoli A, Pardo F, Kormushev P, 2018, Action Branching Architectures for Deep Reinforcement Learning
Discrete-action algorithms have been central to numerous recent successes ofdeep reinforcement learning. However, applying these algorithms tohigh-dimensional action tasks requires tackling the combinatorial increase ofthe number of possible actions with the number of action dimensions. Thisproblem is further exacerbated for continuous-action tasks that require finecontrol of actions via discretization. In this paper, we propose a novel neuralarchitecture featuring a shared decision module followed by several networkbranches, one for each action dimension. This approach achieves a linearincrease of the number of network outputs with the number of degrees of freedomby allowing a level of independence for each individual action dimension. Toillustrate the approach, we present a novel agent, called Branching DuelingQ-Network (BDQ), as a branching variant of the Dueling Double Deep Q-Network(Dueling DDQN). We evaluate the performance of our agent on a set ofchallenging continuous control tasks. The empirical results show that theproposed agent scales gracefully to environments with increasing actiondimensionality and indicate the significance of the shared decision module incoordination of the distributed action branches. Furthermore, we show that theproposed agent performs competitively against a state-of-the-art continuouscontrol algorithm, Deep Deterministic Policy Gradient (DDPG).
Wang K, Shah A, Kormushev P, 2018, SLIDER: A Bipedal Robot with Knee-less Legs and Vertical Hip Sliding Motion
Wang K, Shah A, Kormushev P, 2018, SLIDER: A Novel Bipedal Walking Robot without Knees
Kanajar P, Caldwell DG, Kormushev P, 2017, Climbing over Large Obstacles with a Humanoid Robot via Multi-Contact Motion Planning
Pardo F, Tavakoli A, Levdik V, et al., 2017, Time Limits in Reinforcement Learning, Deep Reinforcement Learning Symposium (DRLS), 31st Conference on Neural Information Processing Systems (NIPS 2017)
In reinforcement learning, it is common to let an agent interact with its environment for a fixed amount of time before resetting the environment and repeating the process in a series of episodes. The task that the agent has to learn can either be to maximize its performance over (i) that fixed period, or (ii) an indefinite period where time limits are only used during training to diversify experience. In this paper, we investigate theoretically how time limits could effectively be handled in each of the two cases. In the first one, we argue that the terminations due to time limits are in fact part of the environment, and propose to include a notion of the remaining time as part of the agent’s input. In the second case, the time limits are not part of the environment and are only used to facilitate learning. We argue that such terminations should not be treated as environmental ones and propose a method, specific to value-based algorithms, that incorporates this insight by continuing to bootstrap at the end of each partial episode. To illustrate the significance of our proposals, we perform several experiments on a range of environments from simple few-state transition graphs to complex control tasks, including novel and standard benchmark domains. Our results show that the proposed methods improve the performance and stability of existing reinforcement learning algorithms.
Rakicevic N, Kormushev P, 2017, Efficient Robot Task Learning and Transfer via Informed Search in Movement Parameter Space
Tavakoli A, Pardo F, Kormushev P, 2017, Action Branching Architectures for Deep Reinforcement Learning
Jamisola RS, Kormushev PS, Roberts RG, et al., 2016, Task-Space Modular Dynamics for Dual-Arms Expressed through a Relative Jacobian, JOURNAL OF INTELLIGENT & ROBOTIC SYSTEMS, Vol: 83, Pages: 205-218, ISSN: 0921-0296
Maurelli F, Lane D, Kormushev P, et al., 2016, The PANDORA project: a success story in AUV autonomy, OCEANS Conference, Publisher: IEEE, ISSN: 0197-7385
Palomeras N, Carrera A, Hurtos N, et al., 2016, Toward persistent autonomous intervention in a subsea panel, AUTONOMOUS ROBOTS, Vol: 40, Pages: 1279-1306, ISSN: 0929-5593
Ahmadzadeh SR, Kormushev P, 2015, Visuospatial skill learning, Studies in Systems, Decision and Control, Pages: 75-99
© Springer International Publishing Switzerland 2015. This chapter introduces Visuospatial Skill Learning (VSL), which is a novel interactive robot learning approach. VSL is based on visual perception that allows a robot to acquire new skills by observing a single demonstration while interacting with a tutor. The focus of VSL is placed on achieving a desired goal configuration of objects relative to another. VSL captures the object’s context for each demonstrated action. This context is the basis of the visuospatial representation and encodes implicitly the relative positioning of the object with respect to multiple other objects simultaneously. VSL is capable of learning and generalizing multi-operation skills from a single demonstration, while requiring minimum a priori knowledge about the environment. Different capabilities of VSL such as learning and generalization of object reconfiguration, classification, and turn-taking interaction are illustrated through both simulation and real-world experiments.
Ahmadzadeh SR, Paikan A, Mastrogiovanni F, et al., 2015, Learning Symbolic Representations of Actions from Human Demonstrations, IEEE International Conference on Robotics and Automation (ICRA), Publisher: IEEE COMPUTER SOC, Pages: 3801-3808, ISSN: 1050-4729
Bimbo J, Kormushev P, Althoefer K, et al., 2015, Global estimation of an object's pose using tactile sensing, ADVANCED ROBOTICS, Vol: 29, Pages: 363-374, ISSN: 0169-1864
Carrera A, Palomeras N, Hurtos N, et al., 2015, Cognitive system for autonomous underwater intervention, PATTERN RECOGNITION LETTERS, Vol: 67, Pages: 91-99, ISSN: 0167-8655
Carrera A, Palomeras N, Hurtos N, et al., 2015, Learning multiple strategies to perform a valve turning with underwater currents using an I-AUV, Oceans 2015 Genova, Publisher: IEEE
Jamali N, Kormushev P, Vinas AC, et al., 2015, Underwater Robot-Object Contact Perception using Machine Learning on Force/Torque Sensor Feedback, IEEE International Conference on Robotics and Automation (ICRA), Publisher: IEEE COMPUTER SOC, Pages: 3915-3920, ISSN: 1050-4729
Jamisola RS, Kormushev P, Caldwell DG, et al., 2015, Modular Relative Jacobian for Dual-Arms and the Wrench Transformation Matrix, Proceedings of the 2015 7th IEEE International Conference on Cybernetics and Intelligent Systems (CIS) And Robotics, Automation and Mechatronics (RAM), Publisher: IEEE
Kormushev P, Ahmadzadeh SR, 2015, Robot learning for persistent autonomy, Studies in Systems, Decision and Control, Pages: 3-28
© Springer International Publishing Switzerland 2015. Autonomous robots are not very good at being autonomous. They work well in structured environments, but fail quickly in the real world facing uncertainty and dynamically changing conditions. In this chapter, we describe robot learning approaches that help to elevate robot autonomy to the next level, the so-called ‘persistent autonomy’. For a robot to be ‘persistently autonomous’ means to be able to perform missions over extended time periods (e.g. days or months) in dynamic, uncertain environments without need for human assistance. In particular, persistent autonomy is extremely important for robots in difficult-to-reach environments such as underwater, rescue, and space robotics. There are many facets of persistent autonomy, such as: coping with uncertainty, reacting to changing conditions, disturbance rejection, fault tolerance, energy efficiency and so on. This chapter presents a collection of robot learning approaches that addressmany of these facets. Experimentswith robot manipulators and autonomous underwater vehicles demonstrate the usefulness of these learning approaches in real world scenarios.
Kormushev P, Demiris Y, Caldwell DG, 2015, Encoderless Position Control of a Two-Link Robot Manipulator, IEEE International Conference on Robotics and Automation (ICRA), Publisher: IEEE COMPUTER SOC, Pages: 943-949, ISSN: 1050-4729
Kormushev P, Demiris Y, Caldwell DG, 2015, Kinematic-free Position Control of a 2-DOF Planar Robot Arm, IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Publisher: IEEE, Pages: 5518-5525, ISSN: 2153-0858
Kryczka P, Kormushev P, Tsagarakis NG, et al., 2015, Online Regeneration of Bipedal Walking Gait Pattern Optimizing Footstep Placement and Timing, IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Publisher: IEEE, Pages: 3352-3357, ISSN: 2153-0858
Lane DM, Maurelli F, Kormushev P, et al., 2015, PANDORA - Persistent autonomy through learning, adaptation, observation and replanning, Pages: 238-243
© 2015, IFAC (International Federation of Automatic Control) Hosting by Elsevier Ltd. All rights reserved. PANDORA is a EU FP7 project that is developing new computational methods to make underwater robots Persistently Autonomous, significantly reducing the frequency of assistance requests. The aim of the project is to extend the range of tasks that can be carried on autonomously and increase their complexity while reducing the need for operator assistances. Dynamic adaptation to the change of conditions is very important while addressing autonomy in the real world and not just in well-known situation. The key of Pandora is the ability to recognise failure and respond to it, at all levels of abstraction. Under the guidance of major industrial players, validation tasks of inspection, cleaning and valve turning have been trialled with partners' AUVs in Scotland and Spain.
Takano W, Asfour T, Kormushev P, 2015, Special Issue on Humanoid Robotics PREFACE, ADVANCED ROBOTICS, Vol: 29, Pages: 301-301, ISSN: 0169-1864
Ahmadzadeh SR, Kormushev P, Caldwell DG, 2014, Multi-objective reinforcement learning for AUV thruster failure recovery
© 2014 IEEE. This paper investigates learning approaches for discovering fault-tolerant control policies to overcome thruster failures in Autonomous Underwater Vehicles (AUV). The proposed approach is a model-based direct policy search that learns on an on-board simulated model of the vehicle. When a fault is detected and isolated the model of the AUV is reconfigured according to the new condition. To discover a set of optimal solutions a multi-objective reinforcement learning approach is employed which can deal with multiple conflicting objectives. Each optimal solution can be used to generate a trajectory that is able to navigate the AUV towards a specified target while satisfying multiple objectives. The discovered policies are executed on the robot in a closed-loop using AUV's state feedback. Unlike most existing methods which disregard the faulty thruster, our approach can also deal with partially broken thrusters to increase the persistent autonomy of the AUV. In addition, the proposed approach is applicable when the AUV either becomes under-actuated or remains redundant in the presence of a fault. We validate the proposed approach on the model of the Girona500 AUV.
This data is extracted from the Web of Science and reproduced under a licence from Thomson Reuters. You may not copy or re-distribute this data in whole or in part without the written consent of the Science business of Thomson Reuters.