Imperial College London

DrAntoineCully

Faculty of EngineeringDepartment of Computing

Senior Lecturer
 
 
 
//

Contact

 

+44 (0)20 7594 8204a.cully Website

 
 
//

Location

 

354ACE ExtensionSouth Kensington Campus

//

Summary

 

Publications

Publication Type
Year
to

37 results found

Lim BWT, Grillotti L, Bernasconi L, Cully Aet al., 2022, Dynamics-aware quality-diversity for efficient learning of skill repertoires, IEEE International Conference on Robotics and Automation, Publisher: IEEE, Pages: 5360-5366

Quality-Diversity (QD) algorithms are powerful exploration algorithms that allow robots to discover large repertoires of diverse and high-performing skills. However, QD algorithms are sample inefficient and require millionsof evaluations. In this paper, we propose Dynamics-Aware Quality-Diversity (DA-QD), a framework to improve the sample efficiency of QD algorithms through the use of dynamics models. We also show how DA-QD can then be used for continual acquisition of new skill repertoires. To do so, weincrementally train a deep dynamics model from experience obtained when performing skill discovery using QD. We can then perform QD exploration in imagination with an imagined skill repertoire. We evaluate our approach on three robotic experiments. First, our experiments show DA-QD is 20 timesmore sample efficient than existing QD approaches for skill discovery. Second, we demonstrate learning an entirely new skill repertoire in imagination to perform zero-shot learning. Finally, we show how DA-QD is useful and effective for solving a long horizon navigation task and for damage adaptation in the real world. Videos and source code are available at: https://sites.google.com/view/da-qd.

Conference paper

Pierrot T, Macé V, Chalumeau F, Flajolet A, Cideron G, Beguir K, Cully A, Sigaud O, Perrin-Gilbert Net al., 2022, Diversity Policy Gradient for Sample Efficient Quality-Diversity Optimization, The Genetic and Evolutionary Computation Conference (GECCO)

Conference paper

Lim BWT, Reichenbach A, Cully A, 2022, Learning to walk autonomously via reset-free quality-diversity, The Genetic and Evolutionary Computation Conference (GECCO)

Quality-Diversity (QD) algorithms can discover large and complex behavioural repertoires consisting of both diverse and high-performing skills. However, the generation of behavioural repertoires has mainly been limited to simulation environments instead of real-world learning. This is because existing QD algorithms need large numbers of evaluations as well as episodic resets, which require manual human supervision and interventions. This paper proposes Reset-Free Quality-Diversity optimization (RF-QD) as a step towards autonomous learning for robotics in open-ended environments. We build on Dynamics-Aware Quality-Diversity (DA-QD) and introduce a behaviour selection policy that leverages the diversity of the imagined repertoire and environmental information to intelligently select of behaviours that can act as automatic resets. We demonstrate this through a task of learning to walk within defined training zones with obstacles. Our experiments show that we can learn full repertoires of legged locomotion controllers autonomously without manual resets with high sample efficiency in spite of harsh safety constraints. Finally, using an ablation of different target objectives, we show that it is important for RF-QD to have diverse types solutions available for the behaviour selection policy over solutions optimised with a specific objective. Videos and code available at this https URL.

Conference paper

Lim B, Allard M, Grillotti L, Cully Aet al., 2022, QDax: On the Benefits of Massive Parallelization for Quality-Diversity, Pages: 128-131

Quality-Diversity (QD) algorithms are a well-known approach to generate large collections of diverse and high-quality policies. However, QD algorithms are also known to be data-inefficient, requiring large amounts of computational resources and are slow when used in practice for robotics tasks. Policy evaluations are already commonly performed in parallel to speed up QD algorithms but have limited capabilities on a single machine as most physics simulators run on CPUs. With recent advances in simulators that run on accelerators, thousands of evaluations can be performed in parallel on single GPU/TPU. In this paper, we present QDax, an implementation of MAP-Elites which leverages massive parallelism on accelerators to make QD algorithms more accessible. We show that QD algorithms are ideal candidates and can scale with massive parallelism to be run at interactive timescales. The increase in parallelism does not significantly affect the performance of QD algorithms, while reducing experiment runtimes by two factors of magnitudes, turning days of computation into minutes. These results show that QD can now benefit from hardware acceleration, which contributed significantly to the bloom of deep learning.

Conference paper

Allard M, Smith Bize S, Chatzilygeroudis K, Cully Aet al., 2022, Hierarchical Quality-Diversity For Online Damage Recovery, The Genetic and Evolutionary Computation Conference, Publisher: ACM

Adaptation capabilities, like damage recovery, are crucial for the deployment of robots in complex environments. Several works have demonstrated that using repertoires of pre-trained skills can enable robots to adapt to unforeseen mechanical damages in a few minutes. These adaptation capabilities are directly linked to the behavioural diversity in the repertoire. The more alternatives the robot has to execute a skill, the better are the chances that it can adapt to a new situation. However, solving complex tasks, like maze navigation, usually requires multiple different skills. Finding a large behavioural diversity for these multiple skills often leads to an intractable exponential growth of the number of required solutions.In this paper, we introduce the Hierarchical Trial and Error algorithm, which uses a hierarchical behavioural repertoire to learn diverse skills and leverages them to make the robot more adaptive to different situations. We show that the hierarchical decomposition of skills enables the robot to learn more complex behaviours while keeping the learning of the repertoire tractable. The experiments with a hexapod robot show that our method solves maze navigation tasks with 20% less actions in the most challenging scenarios than the best baseline while having 57% less complete failures.

Conference paper

Grillotti L, Cully A, 2022, Relevance-guided unsupervised discovery of abilities with quality-diversity algorithms, Genetic and Evolutionary Computation Conference (GECCO), Publisher: ACM, Pages: 77-85

Quality-Diversity algorithms provide efficient mechanisms to generate large collections of diverse and high-performing solutions, which have shown to be instrumental for solving downstream tasks. However, most of those algorithms rely on a behavioural descriptor to characterise the diversity that is hand-coded, hence requiring prior knowledge about the considered tasks. In this work, we introduce Relevance-guided Unsupervised Discovery of Abilities; a Quality-Diversity algorithm that autonomously finds a behavioural characterisation tailored to the task at hand. In particular, our method introduces a custom diversity metric that leads to higher densities of solutions near the areas of interest in the learnt behavioural descriptor space. We evaluate our approach on a simulated robotic environment, where the robot has to autonomously discover its abilities based on its full sensory data. We evaluated the algorithms on three tasks: navigation to random targets, moving forward with a high velocity, and performing half-rolls. The experimental results show that our method manages to discover collections of solutions that are not only diverse, but also well-adapted to the considered downstream task.

Conference paper

Grillotti L, Cully A, 2022, Unsupervised Behaviour Discovery with Quality-Diversity Optimisation, IEEE Transactions on Evolutionary Computation, ISSN: 1089-778X

Quality-Diversity algorithms refer to a class of evolutionary algorithms designed to find a collection of diverse and high-performing solutions to a given problem. In robotics, such algorithms can be used for generating a collection of controllers covering most of the possible behaviours of a robot. To do so, these algorithms associate a behavioural descriptor to each of these behaviours. Each behavioural descriptor is used for estimating the novelty of one behaviour compared to the others. In most existing algorithms, the behavioural descriptor needs to be hand-coded, thus requiring prior knowledge about the task to solve. In this paper, we introduce: Autonomous Robots Realising their Abilities, an algorithm that uses a dimensionality reduction technique to automatically learn behavioural descriptors based on raw sensory data. The performance of this algorithm is assessed on three robotic tasks in simulation. The experimental results show that it performs similarly to traditional hand-coded approaches without the requirement to provide any hand-coded behavioural descriptor. In the collection of diverse and high-performing solutions, it also manages to find behaviours that are novel with respect to more features than its hand-coded baselines. Finally, we introduce a variant of the algorithm which is robust to the dimensionality of the behavioural descriptor space.

Journal article

Rakicevic N, Cully A, Kormushev P, 2021, Policy manifold search: exploring the manifold hypothesis for diversity-based neuroevolution, Genetic and Evolutionary Computation Conference (GECCO '21), Pages: 901-909

Neuroevolution is an alternative to gradient-based optimisation that has the potential to avoid local minima and allows parallelisation. The main limiting factor is that usually it does not scale well with parameter space dimensionality. Inspired by recent work examining neural network intrinsic dimension and loss landscapes, we hypothesise that there exists a low-dimensional manifold, embedded in the policy network parameter space, around which a high-density of diverse and useful policies are located. This paper proposes a novel method for diversity-based policy search via Neuroevolution, that leverages learned representations of the policy network parameters, by performing policy search in this learned representation space. Our method relies on the Quality-Diversity (QD) framework which provides a principled approach to policy search, and maintains a collection of diverse policies, used as a dataset for learning policy representations. Further, we use the Jacobian of the inverse-mapping function to guide the search in the representation space. This ensures that the generated samples remain in the high-density regions, after mapping back to the original space. Finally, we evaluate our contributions on four continuous-control tasks in simulated environments, and compare to diversity-based baselines.

Conference paper

Cully A, 2021, Multi-Emitter MAP-Elites: Improving quality, diversity and convergence speed with heterogeneous sets of emitters, Genetic and Evolutionary Computation Conference (GECCO), Publisher: ACM, Pages: 84-92

Quality-Diversity (QD) optimisation is a new family of learning algorithmsthat aims at generating collections of diverse and high-performing solutions.Among those algorithms, MAP-Elites is a simple yet powerful approach that hasshown promising results in numerous applications. In this paper, we introduce anovel algorithm named Multi-Emitter MAP-Elites (ME-MAP-Elites) that improvesthe quality, diversity and convergence speed of MAP-Elites. It is based on therecently introduced concept of emitters, which are used to drive thealgorithm's exploration according to predefined heuristics. ME-MAP-Elitesleverages the diversity of a heterogeneous set of emitters, in which eachemitter type is designed to improve differently the optimisation process.Moreover, a bandit algorithm is used to dynamically find the best emitter setdepending on the current situation. We evaluate the performance ofME-MAP-Elites on six tasks, ranging from standard optimisation problems (in 100dimensions) to complex locomotion tasks in robotics. Our comparisons againstMAP-Elites and existing approaches using emitters show that ME-MAP-Elites isfaster at providing collections of solutions that are significantly morediverse and higher performing. Moreover, in the rare cases where no fruitfulsynergy can be found between the different emitters, ME-MAP-Elites isequivalent to the best of the compared algorithms.

Conference paper

Chatzilygeroudis K, Cully A, Vassiliades V, Mouret J-Bet al., 2021, Quality-diversity optimization: a novel branch of stochastic optimization, Black Box Optimization, Machine Learning, and No-Free Lunch Theorems, Editors: Pardalos, Rasskazova, Vrahatis, Publisher: Springer International Publishing, Pages: 109-135, ISBN: 978-3-030-66515-9

Traditional optimization algorithms search for a single global optimum that maximizes (or minimizes) the objective function. Multimodal optimization algorithms search for the highest peaks in the search space that can be more than one. Quality-Diversity algorithms are a recent addition to the evolutionary computation toolbox that do not only search for a single set of local optima, but instead try to illuminate the search space. In effect, they provide a holistic view of how high-performing solutions are distributed throughout a search space. The main differences with multimodal optimization algorithms are that (1) Quality-Diversity typically works in the behavioral space (or feature space), and not in the genotypic (or parameter) space, and (2) Quality-Diversity attempts to fill the whole behavior space, even if the niche is not a peak in the fitness landscape. In this chapter, we provide a gentle introduction to Quality-Diversity optimization, discuss the main representative algorithms, and the main current topics under consideration in the community. Throughout the chapter, we also discuss several successful applications of Quality-Diversity algorithms, including deep learning, robotics, and reinforcement learning.

Book chapter

Nilsson O, Cully A, 2021, Policy Gradient Assisted MAP-Elites, 2nd Genetic and Evolutionary Computation Conference (GECCO), Publisher: ASSOC COMPUTING MACHINERY, Pages: 866-875

Conference paper

Rakicevic N, Cully A, Kormushev P, 2020, Policy manifold search for improving diversity-based neuroevolution, Publisher: arXiv

Diversity-based approaches have recently gained popularity as an alternativeparadigm to performance-based policy search. A popular approach from thisfamily, Quality-Diversity (QD), maintains a collection of high-performingpolicies separated in the diversity-metric space, defined based on policies'rollout behaviours. When policies are parameterised as neural networks, i.e.Neuroevolution, QD tends to not scale well with parameter space dimensionality.Our hypothesis is that there exists a low-dimensional manifold embedded in thepolicy parameter space, containing a high density of diverse and feasiblepolicies. We propose a novel approach to diversity-based policy search viaNeuroevolution, that leverages learned latent representations of the policyparameters which capture the local structure of the data. Our approachiteratively collects policies according to the QD framework, in order to (i)build a collection of diverse policies, (ii) use it to learn a latentrepresentation of the policy parameters, (iii) perform policy search in thelearned latent space. We use the Jacobian of the inverse transformation(i.e.reconstruction function) to guide the search in the latent space. Thisensures that the generated samples remain in the high-density regions of theoriginal space, after reconstruction. We evaluate our contributions on threecontinuous control tasks in simulated environments, and compare todiversity-based baselines. The findings suggest that our approach yields a moreefficient and robust policy search process.

Conference paper

Kusters R, Misevic D, Berry H, Cully A, Le Cunff Y, Dandoy L, Díaz-Rodríguez N, Ficher M, Grizou J, Othmani A, Palpanas T, Komorowski M, Loiseau P, Moulin Frier C, Nanini S, Quercia D, Sebag M, Soulié Fogelman F, Taleb S, Tupikina L, Sahu V, Vie J-J, Wehbi Fet al., 2020, Interdisciplinary research in artificial intelligence: challenges and opportunities, Frontiers in Big Data, Vol: 3, Pages: 1-7, ISSN: 2624-909X

The use of artificial intelligence (AI) in a variety of research fields is speeding up multiple digital revolutions, from shifting paradigms in healthcare, precision medicine and wearable sensing, to public services and education offered to the masses around the world, to future cities made optimally efficient by autonomous driving. When a revolution happens, the consequences are not obvious straight away, and to date, there is no uniformly adapted framework to guide AI research to ensure a sustainable societal transition. To answer this need, here we analyze three key challenges to interdisciplinary AI research, and deliver three broad conclusions: 1) future development of AI should not only impact other scientific domains but should also take inspiration and benefit from other fields of science, 2) AI research must be accompanied by decision explainability, dataset bias transparency as well as development of evaluation methodologies and creation of regulatory agencies to ensure responsibility, and 3) AI education should receive more attention, efforts and innovation from the educational and scientific communities. Our analysis is of interest not only to AI practitioners but also to other researchers and the general public as it offers ways to guide the emerging collaborations and interactions toward the most fruitful outcomes.

Journal article

Flageat M, Cully A, 2020, Fast and stable MAP-Elites in noisy domains using deep grids, 2020 Conference on Artificial Life, Publisher: Massachusetts Institute of Technology, Pages: 273-282

Quality-Diversity optimisation algorithms enable the evolutionof collections of both high-performing and diverse solutions.These collections offer the possibility to quickly adapt andswitch from one solution to another in case it is not workingas expected. It therefore finds many applications in real-worlddomain problems such as robotic control. However, QD algo-rithms, like most optimisation algorithms, are very sensitive touncertainty on the fitness function, but also on the behaviouraldescriptors. Yet, such uncertainties are frequent in real-worldapplications. Few works have explored this issue in the spe-cific case of QD algorithms, and inspired by the literature inEvolutionary Computation, mainly focus on using samplingto approximate the ”true” value of the performances of a solu-tion. However, sampling approaches require a high number ofevaluations, which in many applications such as robotics, canquickly become impractical.In this work, we propose Deep-Grid MAP-Elites, a variantof the MAP-Elites algorithm that uses an archive of similarpreviously encountered solutions to approximate the perfor-mance of a solution. We compare our approach to previouslyexplored ones on three noisy tasks: a standard optimisationtask, the control of a redundant arm and a simulated Hexapodrobot. The experimental results show that this simple approachis significantly more resilient to noise on the behavioural de-scriptors, while achieving competitive performances in termsof fitness optimisation, and being more sample-efficient thanother existing approaches.

Conference paper

Lehman J, Clune J, Misevic D, Adami C, Altenberg L, Beaulieu J, Bentley PJ, Bernard S, Beslon G, Bryson DM, Cheney N, Chrabaszcz P, Cully A, Doncieux S, Dyer FC, Ellefsen KO, Feldt R, Fischer S, Forrest S, Frenoy A, Gagn C, Le Goff L, Grabowski LM, Hodjat B, Hutter F, Keller L, Knibbe C, Krcah P, Lenski RE, Lipson H, MacCurdy R, Maestre C, Miikkulainen R, Mitri S, Moriarty DE, Mouret J-B, Anh N, Ofria C, Parizeau M, Parsons D, Pennock RT, Punch WF, Ray TS, Schoenauer M, Shulte E, Sims K, Stanley KO, Taddei F, Tarapore D, Thibault S, Watson R, Weimer W, Yosinski Jet al., 2020, The surprising creativity of digital evolution: a collection of anecdotes from the evolutionary computation and artificial life research communities, Artificial Life, Vol: 26, Pages: 274-306, ISSN: 1064-5462

Evolution provides a creative fount of complex and subtle adaptations that often surprise the scientists who discover them. However, the creativity of evolution is not limited to the natural world: Artificial organisms evolving in computational environments have also elicited surprise and wonder from the researchers studying them. The process of evolution is an algorithmic process that transcends the substrate in which it occurs. Indeed, many researchers in the field of digital evolution can provide examples of how their evolving algorithms and organisms have creatively subverted their expectations or intentions, exposed unrecognized bugs in their code, produced unexpectedly adaptations, or engaged in behaviors and outcomes, uncannily convergent with ones found in nature. Such stories routinely reveal surprise and creativity by evolution in these digital worlds, but they rarely fit into the standard scientific narrative. Instead they are often treated as mere obstacles to be overcome, rather than results that warrant study in their own right. Bugs are fixed, experiments are refocused, and one-off surprises are collapsed into a single data point. The stories themselves are traded among researchers through oral tradition, but that mode of information transmission is inefficient and prone to error and outright loss. Moreover, the fact that these stories tend to be shared only among practitioners means that many natural scientists do not realize how interesting and lifelike digital organisms are and how natural their evolution can be. To our knowledge, no collection of such anecdotes has been published before. This article is the crowd-sourced product of researchers in the fields of artificial life and evolutionary computation who have provided first-hand accounts of such cases. It thus serves as a written, fact-checked collection of scientifically important and even entertaining stories. In doing so we also present here substantial evidence that the existence and impor

Journal article

Zambelli M, Cully A, Demiris Y, 2020, Multimodal representation models for prediction and control from partial information, Robotics and Autonomous Systems, Vol: 123, ISSN: 0921-8890

Similar to humans, robots benefit from interacting with their environment through a number of different sensor modalities, such as vision, touch, sound. However, learning from different sensor modalities is difficult, because the learning model must be able to handle diverse types of signals, and learn a coherent representation even when parts of the sensor inputs are missing. In this paper, a multimodal variational autoencoder is proposed to enable an iCub humanoid robot to learn representations of its sensorimotor capabilities from different sensor modalities. The proposed model is able to (1) reconstruct missing sensory modalities, (2) predict the sensorimotor state of self and the visual trajectories of other agents actions, and (3) control the agent to imitate an observed visual trajectory. Also, the proposed multimodal variational autoencoder can capture the kinematic redundancy of the robot motion through the learned probability distribution. Training multimodal models is not trivial due to the combinatorial complexity given by the possibility of missing modalities. We propose a strategy to train multimodal models, which successfully achieves improved performance of different reconstruction models. Finally, extensive experiments have been carried out using an iCub humanoid robot, showing high performance in multiple reconstruction, prediction and imitation tasks.

Journal article

Flageat M, Cully A, 2020, Fast and stable MAP-Elites in noisy domains using deep grids, The 2020 Conference on Artificial Life, Publisher: MIT Press

Conference paper

Zhang F, Cully A, Demiris Y, 2019, Probabilistic real-time user posture tracking for personalized robot-assisted dressing, IEEE Transactions on Robotics, Vol: 35, Pages: 873-888, ISSN: 1552-3098

Robotic solutions to dressing assistance have the potential to provide tremendous support for elderly and disabled people. However, unexpected user movements may lead to dressing failures or even pose a risk to the user. Tracking such user movements with vision sensors is challenging due to severe visual occlusions created by the robot and clothes. In this paper, we propose a probabilistic tracking method using Bayesian networks in latent spaces, which fuses robot end-effector positions and force information to enable cameraless and real-time estimation of the user postures during dressing. The latent spaces are created before dressing by modeling the user movements with a Gaussian process latent variable model, taking the user’s movement limitations into account. We introduce a robot-assisted dressing system that combines our tracking method with hierarchical multitask control to minimize the force between the user and the robot. The experimental results demonstrate the robustness and accuracy of our tracking method. The proposed method enables the Baxter robot to provide personalized dressing assistance in putting on a sleeveless jacket for users with (simulated) upper-body impairments.

Journal article

Cully A, 2019, Autonomous skill discovery with quality-diversity and unsupervised descriptors, Genetic and Evolutionary Computation Conference (GECCO '19), Publisher: ACM, Pages: 81-89

Quality-Diversity optimization is a new family of optimization al-gorithms that, instead of searching for a single optimal solutionto solving a task, searches for a large collection of solutions thatall solve the task in a different way. This approach is particularly promising for learning behavioral repertoires in robotics, as sucha diversity of behaviors enables robots to be more versatile and resilient. However, these algorithms require the user to manually defi€ne behavioral descriptors, which is used to determine whethertwo solutions are different or similar. The choice of a behavioral de-scriptor is crucial, as it completely changes the solution types thatthe algorithm derives. In this paper, we introduce a new method to automatically de€fine this descriptor by combining Quality-Diversityalgorithms with unsupervised dimensionality reduction algorithms. This approach enables robots to autonomously discover the rangeof their capabilities while interacting with their environment. The results from two experimental scenarios demonstrate that robot canautonomously discover a large range of possible behaviors, without any prior knowledge about their morphology and environment. Furthermore, these behaviors are deemed to be similar to hand-crafted solutions that uses domain knowledge and signi€cantly more diverse than when using existing unsupervised methods.

Conference paper

Arulkumaran K, Cully A, Togelius J, 2019, AlphaStar: an evolutionary computation perspective, The Genetic and Evolutionary Computation Conference 2019, Publisher: ACM, Pages: 314-315

In January 2019, DeepMind revealed AlphaStar to the world—thefirst artificial intelligence (AI) system to beat a professional playerat the game of StarCraft II—representing a milestone in the progressof AI. AlphaStar draws on many areas of AI research, includingdeep learning, reinforcement learning, game theory, and evolution-ary computation (EC). In this paper we analyze AlphaStar primar-ily through the lens of EC, presenting a new look at the systemandrelating it to many concepts in the field. We highlight some ofitsmost interesting aspects—the use of Lamarckian evolution,com-petitive co-evolution, and quality diversity. In doing so,we hopeto provide a bridge between the wider EC community and one ofthe most significant AI systems developed in recent times.

Conference paper

Cully A, Demiris Y, 2019, Online knowledge level tracking with data-driven student models and collaborative filtering, IEEE Transactions on Knowledge and Data Engineering, Vol: 32, Pages: 2000-2013, ISSN: 1041-4347

Intelligent Tutoring Systems are promising tools for delivering optimal and personalised learning experiences to students. A key component for their personalisation is the student model, which infers the knowledge level of the students to balance the difficulty of the exercises. While important advances have been achieved, several challenges remain. In particular, the models should be able to track in real-time the evolution of the students' knowledge levels. These evolutions are likely to follow different profiles for each student, while measuring the exact knowledge level remains difficult given the limited and noisy information provided by the interactions. This paper introduces a novel model that addresses these challenges with three contributions: 1) the model relies on Gaussian Processes to track online the evolution of the student's knowledge level over time, 2) it uses collaborative filtering to rapidly provide long-term predictions by leveraging the information from previous users, and 3) it automatically generates abstract representations of knowledge components via automatic relevance determination of covariance matrices. The model is evaluated on three datasets, including real users. The results demonstrate that the model converges to accurate predictions in average 4 times faster than the compared methods.

Journal article

Arulkumaran K, Cully A, Togelius J, 2019, AlphaStar : An Evolutionary Computation Perspective

Working paper

Cully AHR, Demiris Y, 2018, Hierarchical behavioral repertoires with unsupervised descriptors, Genetic and Evolutionary Computation Conference 2018, Publisher: ACM

Enabling artificial agents to automatically learn complex, versatile and high-performing behaviors is a long-lasting challenge. This paper presents a step in this direction with hierarchical behavioral repertoires that stack several behavioral repertoires to generate sophisticated behaviors. Each repertoire of this architecture uses the lower repertoires to create complex behaviors as sequences of simpler ones, while only the lowest repertoire directly controls the agent's movements. This paper also introduces a novel approach to automatically define behavioral descriptors thanks to an unsupervised neural network that organizes the produced high-level behaviors. The experiments show that the proposed architecture enables a robot to learn how to draw digits in an unsupervised manner after having learned to draw lines and arcs. Compared to traditional behavioral repertoires, the proposed architecture reduces the dimensionality of the optimization problems by orders of magnitude and provides behaviors with a twice better fitness. More importantly, it enables the transfer of knowledge between robots: a hierarchical repertoire evolved for a robotic arm to draw digits can be transferred to a humanoid robot by simply changing the lowest layer of the hierarchy. This enables the humanoid to draw digits although it has never been trained for this task.

Conference paper

Cully A, Chatzilygeroudis K, Allocati F, Mouret J-B, Rama R, Papaspyros Vet al., 2018, Limbo: A Flexible High-performance Library for Gaussian Processes modeling and Data-Efficient Optimization

Limbo (LIbrary for Model-Based Optimization) is an open-source C++11 library for Gaussian Processes and data-efficient optimization (e.g., Bayesian optimization) that is designed to be both highly flexible and very fast. It can be used as a state-of-the-art optimization library or to experiment with novel algorithms with “plugin” components. Limbo is currently mostly used for data-efficient policy search in robot learning and online adaptation because computation time matters when using the low-power embedded computers of robots. For example, Limbo was the key library to develop a new algorithm that allows a legged robot to learn a new gait after a mechanical damage in about 10-15 trials (2 minutes), and a 4-DOF manipulator to learn neural networks policies for goal reaching in about 5 trials.The implementation of Limbo follows a policy-based design that leverages C++ templates: this allows it to be highly flexible without the cost induced by classic object-oriented designs (cost of virtual functions). The regression benchmarks show that the query time of Limbo’s Gaussian processes is several orders of magnitude better than the one of GPy (a state-of-the-art Python library for Gaussian processes) for a similar accuracy (the learning time highly depends on the optimization algorithm chosen to optimize the hyper-parameters). The black-box optimization benchmarks demonstrate that Limbo is about 2 times faster than BayesOpt (a C++ library for data-efficient optimization) for a similar accuracy and data-efficiency. In practice, changing one of the components of the algorithms in Limbo (e.g., changing the acquisition function) usually requires changing only a template definition in the source code. This design allows users to rapidly experiment and test new ideas while keeping the software as fast as specialized code.Limbo takes advantage of multi-core architectures to parallelize the internal optimization processes (optimization of the acquisition funct

Software

Cully AHR, Demiris Y, 2018, Quality and diversity optimization: a unifying modular framework, IEEE Transactions on Evolutionary Computation, Vol: 22, Pages: 245-259, ISSN: 1941-0026

The optimization of functions to find the best solution according to one or several objectives has a central role in many engineering and research fields. Recently, a new family of optimization algorithms, named Quality-Diversity optimization, has been introduced, and contrasts with classic algorithms. Instead of searching for a single solution, Quality-Diversity algorithms are searching for a large collection of both diverse and high-performing solutions. The role of this collection is to cover the range of possible solution types as much as possible, and to contain the best solution for each type. The contribution of this paper is threefold. Firstly, we present a unifying framework of Quality-Diversity optimization algorithms that covers the two main algorithms of this family (Multi-dimensional Archive of Phenotypic Elites and the Novelty Search with Local Competition), and that highlights the large variety of variants that can be investigated within this family. Secondly, we propose algorithms with a new selection mechanism for Quality-Diversity algorithms that outperforms all the algorithms tested in this paper. Lastly, we present a new collection management that overcomes the erosion issues observed when using unstructured collections. These three contributions are supported by extensive experimental comparisons of Quality-Diversity algorithms on three different experimental scenarios.

Journal article

Zhang F, Cully A, Demiris YIANNIS, 2017, Personalized Robot-assisted Dressing using User Modeling in Latent Spaces, 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Publisher: IEEE, ISSN: 2153-0866

Robots have the potential to provide tremendous support to disabled and elderly people in their everyday tasks, such as dressing. Many recent studies on robotic dressing assistance usually view dressing as a trajectory planning problem. However, the user movements during the dressing process are rarely taken into account, which often leads to the failures of the planned trajectory and may put the user at risk. The main difficulty of taking user movements into account is caused by severe occlusions created by the robot, the user, and the clothes during the dressing process, which prevent vision sensors from accurately detecting the postures of the user in real time. In this paper, we address this problem by introducing an approach that allows the robot to automatically adapt its motion according to the force applied on the robot's gripper caused by user movements. There are two main contributions introduced in this paper: 1) the use of a hierarchical multi-task control strategy to automatically adapt the robot motion and minimize the force applied between the user and the robot caused by user movements; 2) the online update of the dressing trajectory based on the user movement limitations modeled with the Gaussian Process Latent Variable Model in a latent space, and the density information extracted from such latent space. The combination of these two contributions leads to a personalized dressing assistance that can cope with unpredicted user movements during the dressing while constantly minimizing the force that the robot may apply on the user. The experimental results demonstrate that the proposed method allows the Baxter humanoid robot to provide personalized dressing assistance for human users with simulated upper-body impairments.

Conference paper

Zambelli M, Fischer T, Petit M, Chang HJ, Cully A, Demiris Yet al., 2016, Towards Anchoring Self-Learned Representations to Those of Other Agents, Workshop on Bio-inspired Social Robot Learning in Home Scenarios IEEE/RSJ International Conference on Intelligent Robots and Systems, Publisher: Institute of Electrical and Electronics Engineers (IEEE)

In the future, robots will support humans in their every day activities. One particular challenge that robots will face is understanding and reasoning about the actions of other agents in order to cooperate effectively with humans. We propose to tackle this using a developmental framework, where the robot incrementally acquires knowledge, and in particular 1) self-learns a mapping between motor commands and sensory consequences, 2) rapidly acquires primitives and complex actions by verbal descriptions and instructions from a human partner, 3) discoverscorrespondences between the robots body and other articulated objects and agents, and 4) employs these correspondences to transfer the knowledge acquired from the robots point of view to the viewpoint of the other agent. We show that our approach requires very little a-priori knowledge to achieve imitation learning, to find correspondent body parts of humans, and allows taking the perspective of another agent. This represents a step towards the emergence of a mirror neuron like system based on self-learned representations.

Conference paper

Tarapore D, Clune J, Cully AHR, Mouret J-Bet al., 2016, How do different encodings influence the performance of the MAP-Elites algorithm?, Proceedings of the Genetic and Evolutionary Computation Conference 2016, Publisher: ACM, Pages: 173-180

The recently introduced Intelligent Trial and Error algorithm (IT&E) both improves the ability to automatically generate controllers that transfer to real robots, and enables robots to creatively adapt to damage in less than 2 minutes. A key component of IT&E is a new evolutionary algorithm called MAP-Elites, which creates a behavior-performance map that is provided as a set of "creative" ideas to an online learning algorithm. To date, all experiments with MAP-Elites have been performed with a directly encoded list of parameters: it is therefore unknown how MAP-Elites would behave with more advanced encodings, like HyperNeat and SUPG. In addition, because we ultimately want robots that respond to their environments via sensors, we investigate the ability of MAP-Elites to evolve closed-loop controllers, which are more complicated, but also more powerful. Our results show that the encoding critically impacts the quality of the results of MAP-Elites, and that the differences are likely linked to the locality of the encoding (the likelihood of generating a similar behavior after a single mutation). Overall, these results improve our understanding of both the dynamics of the MAP-Elites algorithm and how to best harness MAP-Elites to evolve effective and adaptable robotic controllers.

Conference paper

Cully A, Mouret J-B, 2016, Evolving a behavioral repertoire for a walking robot, Evolutionary Computation, Vol: 24, Pages: 59-88, ISSN: 1063-6560

Numerous algorithms have been proposed to allow legged robots to learn to walk.However, most of these algorithms are devised to learn walking in a straight line,which is not sufficient to accomplish any real-world mission. Here we introduce theTransferability-based Behavioral Repertoire Evolution algorithm (TBR-Evolution), anovel evolutionary algorithm that simultaneously discovers several hundreds of simplewalking controllers, one for each possible direction. By taking advantage of solutionsthat are usually discarded by evolutionary processes, TBR-Evolution is substantiallyfaster than independently evolving each controller. Our technique relies on two meth-ods: (1) novelty search with local competition, which searches for both high-performingand diverse solutions, and (2) the transferability approach, which combines simulationsand real tests to evolve controllers for a physical robot. We evaluate this new techniqueon a hexapod robot. Results show that with only a few dozen short experiments per-formed on the robot, the algorithm learns a repertoire of controllers that allows therobot to reach every point in its reachable space. Overall, TBR-Evolution introduceda new kind of learning algorithm that simultaneously optimizes all the achievablebehaviors of a robot.

Journal article

Maestre C, Cully AHR, Gonzales C, Doncieux Set al., 2015, Bootstrapping interactions with objects from raw sensorimotor data: a Novelty Search based approach, 2015 Joint IEEE International Conference on Development and Learning and Epigenetic Robotics (ICDL-EpiRob), Publisher: IEEE

Determining in advance all objects that a robot will interact with in an open environment is very challenging, if not impossible. It makes difficult the development of models that will allow to perceive and recognize objects, to interact with them and to predict how these objects will react to interactions with other objects or with the robot. Developmental robotics proposes to make robots learn by themselves such models through a dedicated exploration step. It raises a chicken-and-egg problem: the robot needs to learn about objects to discover how to interact with them and, to this end, it needs to interact with them. In this work, we propose Novelty-driven Evolutionary Babbling (NovEB), an approach enabling to bootstrap this process and to acquire knowledge about objects in the surrounding environment without requiring to include a priori knowledge about the environment, including objects, or about the means to interact with them. Our approach consists in using an evolutionary algorithm driven by a novelty criterion defined in the raw sensorimotor flow: behaviours, described by a trajectory of the robot end effector, are generated with the goal to maximize the novelty of raw perceptions. The approach is tested on a simulated PR2 robot and is compared to a random motor babbling.

Conference paper

This data is extracted from the Web of Science and reproduced under a licence from Thomson Reuters. You may not copy or re-distribute this data in whole or in part without the written consent of the Science business of Thomson Reuters.

Request URL: http://wlsprd.imperial.ac.uk:80/respub/WEB-INF/jsp/search-html.jsp Request URI: /respub/WEB-INF/jsp/search-html.jsp Query String: respub-action=search.html&id=00886339&limit=30&person=true