27 results found
Cully A, 2021, Multi-Emitter MAP-Elites: Improving quality, diversity and convergence speed with heterogeneous sets of emitters, Genetic and Evolutionary Computation Conference (GECCO)
Quality-Diversity (QD) optimisation is a new family of learning algorithmsthat aims at generating collections of diverse and high-performing solutions.Among those algorithms, MAP-Elites is a simple yet powerful approach that hasshown promising results in numerous applications. In this paper, we introduce anovel algorithm named Multi-Emitter MAP-Elites (ME-MAP-Elites) that improvesthe quality, diversity and convergence speed of MAP-Elites. It is based on therecently introduced concept of emitters, which are used to drive thealgorithm's exploration according to predefined heuristics. ME-MAP-Elitesleverages the diversity of a heterogeneous set of emitters, in which eachemitter type is designed to improve differently the optimisation process.Moreover, a bandit algorithm is used to dynamically find the best emitter setdepending on the current situation. We evaluate the performance ofME-MAP-Elites on six tasks, ranging from standard optimisation problems (in 100dimensions) to complex locomotion tasks in robotics. Our comparisons againstMAP-Elites and existing approaches using emitters show that ME-MAP-Elites isfaster at providing collections of solutions that are significantly morediverse and higher performing. Moreover, in the rare cases where no fruitfulsynergy can be found between the different emitters, ME-MAP-Elites isequivalent to the best of the compared algorithms.
Chatzilygeroudis K, Cully A, Vassiliades V, et al., 2021, Quality-Diversity Optimization: a novel branch of stochastic optimization, Black Box Optimization, Machine Learning, and No-Free Lunch Theorems, Editors: Pardalos, Rasskazova, Vrahatis, Publisher: Springer International Publishing, ISBN: 978-3-030-66515-9
Rakicevic N, Cully A, Kormushev P, 2020, Policy manifold search for improving diversity-based neuroevolution, Publisher: arXiv
Diversity-based approaches have recently gained popularity as an alternativeparadigm to performance-based policy search. A popular approach from thisfamily, Quality-Diversity (QD), maintains a collection of high-performingpolicies separated in the diversity-metric space, defined based on policies'rollout behaviours. When policies are parameterised as neural networks, i.e.Neuroevolution, QD tends to not scale well with parameter space dimensionality.Our hypothesis is that there exists a low-dimensional manifold embedded in thepolicy parameter space, containing a high density of diverse and feasiblepolicies. We propose a novel approach to diversity-based policy search viaNeuroevolution, that leverages learned latent representations of the policyparameters which capture the local structure of the data. Our approachiteratively collects policies according to the QD framework, in order to (i)build a collection of diverse policies, (ii) use it to learn a latentrepresentation of the policy parameters, (iii) perform policy search in thelearned latent space. We use the Jacobian of the inverse transformation(i.e.reconstruction function) to guide the search in the latent space. Thisensures that the generated samples remain in the high-density regions of theoriginal space, after reconstruction. We evaluate our contributions on threecontinuous control tasks in simulated environments, and compare todiversity-based baselines. The findings suggest that our approach yields a moreefficient and robust policy search process.
Kusters R, Misevic D, Berry H, et al., 2020, Interdisciplinary research in artificial intelligence: challenges and opportunities, Frontiers in Big Data, Vol: 3, Pages: 1-7, ISSN: 2624-909X
The use of artificial intelligence (AI) in a variety of research fields is speeding up multiple digital revolutions, from shifting paradigms in healthcare, precision medicine and wearable sensing, to public services and education offered to the masses around the world, to future cities made optimally efficient by autonomous driving. When a revolution happens, the consequences are not obvious straight away, and to date, there is no uniformly adapted framework to guide AI research to ensure a sustainable societal transition. To answer this need, here we analyze three key challenges to interdisciplinary AI research, and deliver three broad conclusions: 1) future development of AI should not only impact other scientific domains but should also take inspiration and benefit from other fields of science, 2) AI research must be accompanied by decision explainability, dataset bias transparency as well as development of evaluation methodologies and creation of regulatory agencies to ensure responsibility, and 3) AI education should receive more attention, efforts and innovation from the educational and scientific communities. Our analysis is of interest not only to AI practitioners but also to other researchers and the general public as it offers ways to guide the emerging collaborations and interactions toward the most fruitful outcomes.
Flageat M, Cully A, 2020, Fast and stable MAP-Elites in noisy domains using deep grids, 2020 Conference on Artificial Life, Publisher: Massachusetts Institute of Technology, Pages: 273-282
Quality-Diversity optimisation algorithms enable the evolutionof collections of both high-performing and diverse solutions.These collections offer the possibility to quickly adapt andswitch from one solution to another in case it is not workingas expected. It therefore finds many applications in real-worlddomain problems such as robotic control. However, QD algo-rithms, like most optimisation algorithms, are very sensitive touncertainty on the fitness function, but also on the behaviouraldescriptors. Yet, such uncertainties are frequent in real-worldapplications. Few works have explored this issue in the spe-cific case of QD algorithms, and inspired by the literature inEvolutionary Computation, mainly focus on using samplingto approximate the ”true” value of the performances of a solu-tion. However, sampling approaches require a high number ofevaluations, which in many applications such as robotics, canquickly become impractical.In this work, we propose Deep-Grid MAP-Elites, a variantof the MAP-Elites algorithm that uses an archive of similarpreviously encountered solutions to approximate the perfor-mance of a solution. We compare our approach to previouslyexplored ones on three noisy tasks: a standard optimisationtask, the control of a redundant arm and a simulated Hexapodrobot. The experimental results show that this simple approachis significantly more resilient to noise on the behavioural de-scriptors, while achieving competitive performances in termsof fitness optimisation, and being more sample-efficient thanother existing approaches.
Lehman J, Clune J, Misevic D, et al., 2020, The surprising creativity of digital evolution: a collection of anecdotes from the evolutionary computation and artificial life research communities, Artificial Life, Vol: 26, Pages: 274-306, ISSN: 1064-5462
Evolution provides a creative fount of complex and subtle adaptations that often surprise the scientists who discover them. However, the creativity of evolution is not limited to the natural world: Artificial organisms evolving in computational environments have also elicited surprise and wonder from the researchers studying them. The process of evolution is an algorithmic process that transcends the substrate in which it occurs. Indeed, many researchers in the field of digital evolution can provide examples of how their evolving algorithms and organisms have creatively subverted their expectations or intentions, exposed unrecognized bugs in their code, produced unexpectedly adaptations, or engaged in behaviors and outcomes, uncannily convergent with ones found in nature. Such stories routinely reveal surprise and creativity by evolution in these digital worlds, but they rarely fit into the standard scientific narrative. Instead they are often treated as mere obstacles to be overcome, rather than results that warrant study in their own right. Bugs are fixed, experiments are refocused, and one-off surprises are collapsed into a single data point. The stories themselves are traded among researchers through oral tradition, but that mode of information transmission is inefficient and prone to error and outright loss. Moreover, the fact that these stories tend to be shared only among practitioners means that many natural scientists do not realize how interesting and lifelike digital organisms are and how natural their evolution can be. To our knowledge, no collection of such anecdotes has been published before. This article is the crowd-sourced product of researchers in the fields of artificial life and evolutionary computation who have provided first-hand accounts of such cases. It thus serves as a written, fact-checked collection of scientifically important and even entertaining stories. In doing so we also present here substantial evidence that the existence and impor
Zambelli M, Cully A, Demiris Y, 2020, Multimodal representation models for prediction and control from partial information, Robotics and Autonomous Systems, Vol: 123, ISSN: 0921-8890
Similar to humans, robots benefit from interacting with their environment through a number of different sensor modalities, such as vision, touch, sound. However, learning from different sensor modalities is difficult, because the learning model must be able to handle diverse types of signals, and learn a coherent representation even when parts of the sensor inputs are missing. In this paper, a multimodal variational autoencoder is proposed to enable an iCub humanoid robot to learn representations of its sensorimotor capabilities from different sensor modalities. The proposed model is able to (1) reconstruct missing sensory modalities, (2) predict the sensorimotor state of self and the visual trajectories of other agents actions, and (3) control the agent to imitate an observed visual trajectory. Also, the proposed multimodal variational autoencoder can capture the kinematic redundancy of the robot motion through the learned probability distribution. Training multimodal models is not trivial due to the combinatorial complexity given by the possibility of missing modalities. We propose a strategy to train multimodal models, which successfully achieves improved performance of different reconstruction models. Finally, extensive experiments have been carried out using an iCub humanoid robot, showing high performance in multiple reconstruction, prediction and imitation tasks.
Zhang F, Cully A, Demiris Y, 2019, Probabilistic real-time user posture tracking for personalized robot-assisted dressing, IEEE Transactions on Robotics, Vol: 35, Pages: 873-888, ISSN: 1552-3098
Robotic solutions to dressing assistance have the potential to provide tremendous support for elderly and disabled people. However, unexpected user movements may lead to dressing failures or even pose a risk to the user. Tracking such user movements with vision sensors is challenging due to severe visual occlusions created by the robot and clothes. In this paper, we propose a probabilistic tracking method using Bayesian networks in latent spaces, which fuses robot end-effector positions and force information to enable cameraless and real-time estimation of the user postures during dressing. The latent spaces are created before dressing by modeling the user movements with a Gaussian process latent variable model, taking the user’s movement limitations into account. We introduce a robot-assisted dressing system that combines our tracking method with hierarchical multitask control to minimize the force between the user and the robot. The experimental results demonstrate the robustness and accuracy of our tracking method. The proposed method enables the Baxter robot to provide personalized dressing assistance in putting on a sleeveless jacket for users with (simulated) upper-body impairments.
Arulkumaran K, Cully A, Togelius J, 2019, AlphaStar: an evolutionary computation perspective, The Genetic and Evolutionary Computation Conference 2019, Publisher: ACM, Pages: 314-315
In January 2019, DeepMind revealed AlphaStar to the world—thefirst artificial intelligence (AI) system to beat a professional playerat the game of StarCraft II—representing a milestone in the progressof AI. AlphaStar draws on many areas of AI research, includingdeep learning, reinforcement learning, game theory, and evolution-ary computation (EC). In this paper we analyze AlphaStar primar-ily through the lens of EC, presenting a new look at the systemandrelating it to many concepts in the field. We highlight some ofitsmost interesting aspects—the use of Lamarckian evolution,com-petitive co-evolution, and quality diversity. In doing so,we hopeto provide a bridge between the wider EC community and one ofthe most significant AI systems developed in recent times.
Cully A, 2019, Autonomous skill discovery with quality-diversity and unsupervised descriptors, Genetic and Evolutionary Computation Conference (GECCO '19), Publisher: ACM, Pages: 81-89
Quality-Diversity optimization is a new family of optimization al-gorithms that, instead of searching for a single optimal solutionto solving a task, searches for a large collection of solutions thatall solve the task in a different way. This approach is particularly promising for learning behavioral repertoires in robotics, as sucha diversity of behaviors enables robots to be more versatile and resilient. However, these algorithms require the user to manually define behavioral descriptors, which is used to determine whethertwo solutions are different or similar. The choice of a behavioral de-scriptor is crucial, as it completely changes the solution types thatthe algorithm derives. In this paper, we introduce a new method to automatically define this descriptor by combining Quality-Diversityalgorithms with unsupervised dimensionality reduction algorithms. This approach enables robots to autonomously discover the rangeof their capabilities while interacting with their environment. The results from two experimental scenarios demonstrate that robot canautonomously discover a large range of possible behaviors, without any prior knowledge about their morphology and environment. Furthermore, these behaviors are deemed to be similar to hand-crafted solutions that uses domain knowledge and signicantly more diverse than when using existing unsupervised methods.
Cully A, Demiris Y, 2019, Online knowledge level tracking with data-driven student models and collaborative filtering, IEEE Transactions on Knowledge and Data Engineering, Vol: 32, Pages: 2000-2013, ISSN: 1041-4347
Intelligent Tutoring Systems are promising tools for delivering optimal and personalised learning experiences to students. A key component for their personalisation is the student model, which infers the knowledge level of the students to balance the difficulty of the exercises. While important advances have been achieved, several challenges remain. In particular, the models should be able to track in real-time the evolution of the students' knowledge levels. These evolutions are likely to follow different profiles for each student, while measuring the exact knowledge level remains difficult given the limited and noisy information provided by the interactions. This paper introduces a novel model that addresses these challenges with three contributions: 1) the model relies on Gaussian Processes to track online the evolution of the student's knowledge level over time, 2) it uses collaborative filtering to rapidly provide long-term predictions by leveraging the information from previous users, and 3) it automatically generates abstract representations of knowledge components via automatic relevance determination of covariance matrices. The model is evaluated on three datasets, including real users. The results demonstrate that the model converges to accurate predictions in average 4 times faster than the compared methods.
Arulkumaran K, Cully A, Togelius J, 2019, AlphaStar : An Evolutionary Computation Perspective
Cully AHR, Demiris Y, 2018, Hierarchical behavioral repertoires with unsupervised descriptors, Genetic and Evolutionary Computation Conference 2018, Publisher: ACM
Enabling artificial agents to automatically learn complex, versatile and high-performing behaviors is a long-lasting challenge. This paper presents a step in this direction with hierarchical behavioral repertoires that stack several behavioral repertoires to generate sophisticated behaviors. Each repertoire of this architecture uses the lower repertoires to create complex behaviors as sequences of simpler ones, while only the lowest repertoire directly controls the agent's movements. This paper also introduces a novel approach to automatically define behavioral descriptors thanks to an unsupervised neural network that organizes the produced high-level behaviors. The experiments show that the proposed architecture enables a robot to learn how to draw digits in an unsupervised manner after having learned to draw lines and arcs. Compared to traditional behavioral repertoires, the proposed architecture reduces the dimensionality of the optimization problems by orders of magnitude and provides behaviors with a twice better fitness. More importantly, it enables the transfer of knowledge between robots: a hierarchical repertoire evolved for a robotic arm to draw digits can be transferred to a humanoid robot by simply changing the lowest layer of the hierarchy. This enables the humanoid to draw digits although it has never been trained for this task.
Cully A, Chatzilygeroudis K, Allocati F, et al., 2018, Limbo: A Flexible High-performance Library for Gaussian Processes modeling and Data-Efficient Optimization
Limbo (LIbrary for Model-Based Optimization) is an open-source C++11 library for Gaussian Processes and data-efficient optimization (e.g., Bayesian optimization) that is designed to be both highly flexible and very fast. It can be used as a state-of-the-art optimization library or to experiment with novel algorithms with “plugin” components. Limbo is currently mostly used for data-efficient policy search in robot learning and online adaptation because computation time matters when using the low-power embedded computers of robots. For example, Limbo was the key library to develop a new algorithm that allows a legged robot to learn a new gait after a mechanical damage in about 10-15 trials (2 minutes), and a 4-DOF manipulator to learn neural networks policies for goal reaching in about 5 trials.The implementation of Limbo follows a policy-based design that leverages C++ templates: this allows it to be highly flexible without the cost induced by classic object-oriented designs (cost of virtual functions). The regression benchmarks show that the query time of Limbo’s Gaussian processes is several orders of magnitude better than the one of GPy (a state-of-the-art Python library for Gaussian processes) for a similar accuracy (the learning time highly depends on the optimization algorithm chosen to optimize the hyper-parameters). The black-box optimization benchmarks demonstrate that Limbo is about 2 times faster than BayesOpt (a C++ library for data-efficient optimization) for a similar accuracy and data-efficiency. In practice, changing one of the components of the algorithms in Limbo (e.g., changing the acquisition function) usually requires changing only a template definition in the source code. This design allows users to rapidly experiment and test new ideas while keeping the software as fast as specialized code.Limbo takes advantage of multi-core architectures to parallelize the internal optimization processes (optimization of the acquisition funct
Cully AHR, Demiris Y, 2018, Quality and diversity optimization: a unifying modular framework, IEEE Transactions on Evolutionary Computation, Vol: 22, Pages: 245-259, ISSN: 1941-0026
The optimization of functions to find the best solution according to one or several objectives has a central role in many engineering and research fields. Recently, a new family of optimization algorithms, named Quality-Diversity optimization, has been introduced, and contrasts with classic algorithms. Instead of searching for a single solution, Quality-Diversity algorithms are searching for a large collection of both diverse and high-performing solutions. The role of this collection is to cover the range of possible solution types as much as possible, and to contain the best solution for each type. The contribution of this paper is threefold. Firstly, we present a unifying framework of Quality-Diversity optimization algorithms that covers the two main algorithms of this family (Multi-dimensional Archive of Phenotypic Elites and the Novelty Search with Local Competition), and that highlights the large variety of variants that can be investigated within this family. Secondly, we propose algorithms with a new selection mechanism for Quality-Diversity algorithms that outperforms all the algorithms tested in this paper. Lastly, we present a new collection management that overcomes the erosion issues observed when using unstructured collections. These three contributions are supported by extensive experimental comparisons of Quality-Diversity algorithms on three different experimental scenarios.
Zhang F, Cully A, Demiris YIANNIS, 2017, Personalized Robot-assisted Dressing using User Modeling in Latent Spaces, 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Publisher: IEEE, ISSN: 2153-0866
Robots have the potential to provide tremendous support to disabled and elderly people in their everyday tasks, such as dressing. Many recent studies on robotic dressing assistance usually view dressing as a trajectory planning problem. However, the user movements during the dressing process are rarely taken into account, which often leads to the failures of the planned trajectory and may put the user at risk. The main difficulty of taking user movements into account is caused by severe occlusions created by the robot, the user, and the clothes during the dressing process, which prevent vision sensors from accurately detecting the postures of the user in real time. In this paper, we address this problem by introducing an approach that allows the robot to automatically adapt its motion according to the force applied on the robot's gripper caused by user movements. There are two main contributions introduced in this paper: 1) the use of a hierarchical multi-task control strategy to automatically adapt the robot motion and minimize the force applied between the user and the robot caused by user movements; 2) the online update of the dressing trajectory based on the user movement limitations modeled with the Gaussian Process Latent Variable Model in a latent space, and the density information extracted from such latent space. The combination of these two contributions leads to a personalized dressing assistance that can cope with unpredicted user movements during the dressing while constantly minimizing the force that the robot may apply on the user. The experimental results demonstrate that the proposed method allows the Baxter humanoid robot to provide personalized dressing assistance for human users with simulated upper-body impairments.
Zambelli M, Fischer T, Petit M, et al., 2016, Towards Anchoring Self-Learned Representations to Those of Other Agents, Workshop on Bio-inspired Social Robot Learning in Home Scenarios IEEE/RSJ International Conference on Intelligent Robots and Systems, Publisher: Institute of Electrical and Electronics Engineers (IEEE)
In the future, robots will support humans in their every day activities. One particular challenge that robots will face is understanding and reasoning about the actions of other agents in order to cooperate effectively with humans. We propose to tackle this using a developmental framework, where the robot incrementally acquires knowledge, and in particular 1) self-learns a mapping between motor commands and sensory consequences, 2) rapidly acquires primitives and complex actions by verbal descriptions and instructions from a human partner, 3) discoverscorrespondences between the robots body and other articulated objects and agents, and 4) employs these correspondences to transfer the knowledge acquired from the robots point of view to the viewpoint of the other agent. We show that our approach requires very little a-priori knowledge to achieve imitation learning, to find correspondent body parts of humans, and allows taking the perspective of another agent. This represents a step towards the emergence of a mirror neuron like system based on self-learned representations.
Tarapore D, Clune J, Cully AHR, et al., 2016, How do different encodings influence the performance of the MAP-Elites algorithm?, Proceedings of the Genetic and Evolutionary Computation Conference 2016, Publisher: ACM, Pages: 173-180
The recently introduced Intelligent Trial and Error algorithm (IT&E) both improves the ability to automatically generate controllers that transfer to real robots, and enables robots to creatively adapt to damage in less than 2 minutes. A key component of IT&E is a new evolutionary algorithm called MAP-Elites, which creates a behavior-performance map that is provided as a set of "creative" ideas to an online learning algorithm. To date, all experiments with MAP-Elites have been performed with a directly encoded list of parameters: it is therefore unknown how MAP-Elites would behave with more advanced encodings, like HyperNeat and SUPG. In addition, because we ultimately want robots that respond to their environments via sensors, we investigate the ability of MAP-Elites to evolve closed-loop controllers, which are more complicated, but also more powerful. Our results show that the encoding critically impacts the quality of the results of MAP-Elites, and that the differences are likely linked to the locality of the encoding (the likelihood of generating a similar behavior after a single mutation). Overall, these results improve our understanding of both the dynamics of the MAP-Elites algorithm and how to best harness MAP-Elites to evolve effective and adaptable robotic controllers.
Cully A, Mouret J-B, 2016, Evolving a behavioral repertoire for a walking robot, Evolutionary Computation, Vol: 24, Pages: 59-88, ISSN: 1063-6560
Numerous algorithms have been proposed to allow legged robots to learn to walk.However, most of these algorithms are devised to learn walking in a straight line,which is not sufficient to accomplish any real-world mission. Here we introduce theTransferability-based Behavioral Repertoire Evolution algorithm (TBR-Evolution), anovel evolutionary algorithm that simultaneously discovers several hundreds of simplewalking controllers, one for each possible direction. By taking advantage of solutionsthat are usually discarded by evolutionary processes, TBR-Evolution is substantiallyfaster than independently evolving each controller. Our technique relies on two meth-ods: (1) novelty search with local competition, which searches for both high-performingand diverse solutions, and (2) the transferability approach, which combines simulationsand real tests to evolve controllers for a physical robot. We evaluate this new techniqueon a hexapod robot. Results show that with only a few dozen short experiments per-formed on the robot, the algorithm learns a repertoire of controllers that allows therobot to reach every point in its reachable space. Overall, TBR-Evolution introduceda new kind of learning algorithm that simultaneously optimizes all the achievablebehaviors of a robot.
Maestre C, Cully AHR, Gonzales C, et al., 2015, Bootstrapping interactions with objects from raw sensorimotor data: a Novelty Search based approach, 2015 Joint IEEE International Conference on Development and Learning and Epigenetic Robotics (ICDL-EpiRob), Publisher: IEEE
Determining in advance all objects that a robot will interact with in an open environment is very challenging, if not impossible. It makes difficult the development of models that will allow to perceive and recognize objects, to interact with them and to predict how these objects will react to interactions with other objects or with the robot. Developmental robotics proposes to make robots learn by themselves such models through a dedicated exploration step. It raises a chicken-and-egg problem: the robot needs to learn about objects to discover how to interact with them and, to this end, it needs to interact with them. In this work, we propose Novelty-driven Evolutionary Babbling (NovEB), an approach enabling to bootstrap this process and to acquire knowledge about objects in the surrounding environment without requiring to include a priori knowledge about the environment, including objects, or about the means to interact with them. Our approach consists in using an evolutionary algorithm driven by a novelty criterion defined in the raw sensorimotor flow: behaviours, described by a trajectory of the robot end effector, are generated with the goal to maximize the novelty of raw perceptions. The approach is tested on a simulated PR2 robot and is compared to a random motor babbling.
As robots leave the controlled environments of factories to autonomouslyfunction in more complex, natural environments, they will have to respond tothe inevitable fact that they will become damaged. However, while animals canquickly adapt to a wide variety of injuries, current robots cannot "thinkoutside the box" to find a compensatory behavior when damaged: they are limitedto their pre-specified self-sensing abilities, can diagnose only anticipatedfailure modes, and require a pre-programmed contingency plan for every type ofpotential damage, an impracticality for complex robots. Here we introduce anintelligent trial and error algorithm that allows robots to adapt to damage inless than two minutes, without requiring self-diagnosis or pre-specifiedcontingency plans. Before deployment, a robot exploits a novel algorithm tocreate a detailed map of the space of high-performing behaviors: This maprepresents the robot's intuitions about what behaviors it can perform and theirvalue. If the robot is damaged, it uses these intuitions to guide atrial-and-error learning algorithm that conducts intelligent experiments torapidly discover a compensatory behavior that works in spite of the damage.Experiments reveal successful adaptations for a legged robot injured in fivedifferent ways, including damaged, broken, and missing legs, and for a roboticarm with joints broken in 14 different ways. This new technique will enablemore robust, effective, autonomous robots, and suggests principles that animalsmay use to adapt to injury.
Jehanno J-M, Cully A, Grand C, et al., 2014, DESIGN OF A WHEEL-LEGGED HEXAPOD ROBOT FOR CREATIVE ADAPTATION, 17th International Conference on Climbing and Walking Robots (CLAWAR), Publisher: WORLD SCIENTIFIC PUBL CO PTE LTD, Pages: 267-+
Koos S, Cully A, Mouret J-B, 2013, Fast damage recovery in robotics with the T-resilience algorithm, The International Journal of Robotics Research, Vol: 32, Pages: 1700-1723, ISSN: 0278-3649
Damage recovery is critical for autonomous robots that need to operate for a long time without assistance. Most current methods are complex and costly because they require anticipating potential damage in order to have a contingency plan ready. As an alternative, we introduce the T-resilience algorithm, a new algorithm that allows robots to quickly and autonomously discover compensatory behavior in unanticipated situations. This algorithm equips the robot with a self-model and discovers new behavior by learning to avoid those that perform differently in the self-model and in reality. Our algorithm thus does not identify the damaged parts but it implicitly searches for efficient behavior that does not use them. We evaluate the T-resilience algorithm on a hexapod robot that needs to adapt to leg removal, broken legs and motor failures; we compare it to stochastic local search, policy gradient and the self-modeling algorithm proposed by Bongard et al. The behavior of the robot is assessed on-board thanks to an RGB-D sensor and a SLAM algorithm. Using only 25 tests on the robot and an overall running time of 20 min, T-resilience consistently leads to substantially better results than the other approaches.
Cully AHR, Mouret J-B, 2013, Behavioral repertoire learning in robotics, Proceedings of the 15th annual conference on Genetic and evolutionary computation, Publisher: ACM, Pages: 175-182
Behavioral Repertoire Learning in RoboticsAntoine CullyISIR, Université Pierre et Marie Curie-Paris 6,CNRS UMR 72224 place Jussieu, F-75252, Paris Cedex 05,Francecully@isir.upmc.frJean-Baptiste MouretISIR, Université Pierre et Marie Curie-Paris 6,CNRS UMR 72224 place Jussieu, F-75252, Paris Cedex 05,Francemouret@isir.upmc.frABSTRACTLearning in robotics typically involves choosing a simple goal(e.g. walking) and assessing the performance of each con-troller with regard to this task (e.g. walking speed). How-ever, learning advanced, input-driven controllers (e.g. walk-ing in each direction) requires testing each controller on alarge sample of the possible input signals. This costly pro-cess makes difficult to learn useful low-level controllers inrobotics.Here we introduce BR-Evolution, a new evolutionary learn-ing technique that generates a behavioral repertoire by tak-ing advantage of the candidate solutions that are usuallydiscarded. Instead of evolving a single, general controller,BR-evolution thus evolves a collection of simple controllers,one for each variant of the target behavior; to distinguishsimilar controllers, it uses a performance objective that al-lows it to produce a collection of diverse but high-performingbehaviors. We evaluated this new technique by evolving gaitcontrollers for a simulated hexapod robot. Results show thata single run of the EA quickly finds a collection of controllersthat allows the robot to reach each point of the reachablespace. Overall, BR-Evolution opens a new kind of learningalgorithm that simultaneously optimizes all the achievablebehaviors of a robot.
Brych S, Cully A, Competitiveness of MAP-Elites against Proximal Policy Optimization on locomotion tasks in deterministic simulations
The increasing importance of robots and automation creates a demand forlearnable controllers which can be obtained through various approaches such asEvolutionary Algorithms (EAs) or Reinforcement Learning (RL). Unfortunately,these two families of algorithms have mainly developed independently and thereare only a few works comparing modern EAs with deep RL algorithms. We show thatMultidimensional Archive of Phenotypic Elites (MAP-Elites), which is a modernEA, can deliver better-performing solutions than one of the state-of-the-art RLmethods, Proximal Policy Optimization (PPO) in the generation of locomotioncontrollers for a simulated hexapod robot. Additionally, extensivehyper-parameter tuning shows that MAP-Elites displays greater robustness acrossseeds and hyper-parameter sets. Generally, this paper demonstrates that EAscombined with modern computational resources display promising characteristicsand have the potential to contribute to the state-of-the-art in controllerlearning.
Chatzilygeroudis K, Cully A, Mouret J-B, Towards semi-episodic learning for robot damage recovery
The recently introduced Intelligent Trial and Error algorithm (IT\&E) enablesrobots to creatively adapt to damage in a matter of minutes by combining anoff-line evolutionary algorithm and an on-line learning algorithm based onBayesian Optimization. We extend the IT\&E algorithm to allow for robots tolearn to compensate for damages while executing their task(s). This leads to asemi-episodic learning scheme that increases the robot's lifetime autonomy andadaptivity. Preliminary experiments on a toy simulation and a 6-legged robotlocomotion task show promising results.
Wang R, Cully A, Chang HJ, et al., MAGAN: Margin Adaptation for Generative Adversarial Networks
We propose the Margin Adaptation for Generative Adversarial Networks (MAGANs)algorithm, a novel training procedure for GANs to improve stability andperformance by using an adaptive hinge loss function. We estimate theappropriate hinge loss margin with the expected energy of the targetdistribution, and derive principled criteria for when to update the margin. Weprove that our method converges to its global optimum under certainassumptions. Evaluated on the task of unsupervised image generation, theproposed training procedure is simple yet robust on a diverse set of data, andachieves qualitative and quantitative improvements compared to thestate-of-the-art.
This data is extracted from the Web of Science and reproduced under a licence from Thomson Reuters. You may not copy or re-distribute this data in whole or in part without the written consent of the Science business of Thomson Reuters.