- Showing results for:
- Reset all filters
Conference paperLim BWT, Flageat M, Cully A, 2023,
Exploration is a key challenge in Reinforcement Learning,especially in long-horizon, deceptive and sparse-reward environments. For such applications, population-based approaches have proven effective. Methods such as Quality-Diversity deals with this by encouraging novel solutions and producing a diversity of behaviours. However, these methods are driven by either undirected sampling (i.e. mutations) or use approximated gradients (i.e. Evolution Strategies) in the parameter space, which makes them highly sample-inefficient. In this paper, we propose Dynamics-Aware QD-Ext (DA-QD-ext) and Gradient and Dynamics Aware QD (GDA-QD), two model-based Quality-Diversity approaches. They extend existing QD methods to use gradients for efficient exploitation and leverage perturbations in imagination for efficient exploration.Our approach takes advantage of the effectiveness of QD algorithms as good data generators to train deep models and use these models to learn diverse and high-performing populations. We demonstrate that they outperform baseline RL approaches on tasks with deceptive rewards, and maintain the divergent search capabilities of QD approaches while exceeding their performance by ∼ 1.5 times and reaching the same results in 5 times less samples.
Conference paperGrillotti L, Flageat M, Lim B, et al., 2023,
Don't bet on luck alone: enhancing behavioral reproducibility of quality-diversity solutions in uncertain domains, Genetic and Evolutionary Computation Conference (GECCO), Publisher: ACM
Quality-Diversity (QD) algorithms are designed to generate collections of high-performing solutions while maximizing their diversity in a given descriptor space. However, in the presence of unpredictable noise, the fitness and descriptor of the same solution can differ significantly from one evaluation to another, leading to uncertainty in the estimation of such values. Given the elitist nature of QD algorithms, they commonly end up with many degeneratesolutions in such noisy settings. In this work, we introduce Archive Reproducibility Improvement Algorithm (ARIA); a plug-and-play approach that improves the reproducibility of the solutions present in an archive. We propose it as a separate optimization module, relying on natural evolution strategies, that can be executed on top of any QD algorithm. Our module mutates solutions to (1) optimize their probability of belonging to their niche, and (2) maximize their fitness. The performance of our method is evaluated on various tasks, including a classical optimization problem and two high-dimensional control tasks in simulated robotic environments. We show that our algorithm enhances the quality and descriptor space coverage of any given archive by at least 50%.
Conference paperFaldor M, Chalumeau F, Flageat M, et al., 2023,
Quality-Diversity algorithms, such as MAP-Elites, are a branch of Evolutionary Computation generating collections of diverse and high-performing solutions, that have been successfully applied to a variety of domains and particularly in evolutionary robotics. However, MAP-Elites performs a divergent search based on random mutations originating from Genetic Algorithms, and thus, is limited to evolving populations of low-dimensional solutions. PGA-MAP-Elites overcomes this limitation by integrating a gradient-based variation operator inspired by Deep Reinforcement Learning which enables the evolution of large neural networks. Although high-performing in many environments, PGA-MAP-Elites fails on several tasks where the convergent search of the gradient-based operator does not direct mutations towards archive-improving solutions. In this work, we present two contributions: (1) we enhance the Policy Gradient variation operator with a descriptor-conditioned critic that improves the archive across the entire descriptor space, (2) we exploit the actor-critic training to learn a descriptor-conditioned policy at no additional cost, distilling the knowledge of the archive into one single versatile policy that can execute the entire range of behaviors contained in the archive. Our algorithm, DCG-MAP-Elites improves the QD score over PGA-MAP-Elites by 82% on average, on a set of challenging locomotion tasks.
Journal articleFlageat M, Chalumeau F, Cully A, 2023,
Quality-Diversity algorithms, among which MAP-Elites, have emerged as powerful alternatives to performance-only optimisation approaches as they enable generating collections of diverse and high-performing solutions to an optimisation problem. However, they are often limited to low-dimensional search spaces and deterministic environments. The recently introduced Policy Gradient Assisted MAP-Elites (PGA-MAP-Elites) algorithm overcomes this limitation by pairing the traditional Genetic operator of MAP-Elites with a gradient-based operator inspired by Deep Reinforcement Learning. This new operator guides mutations toward high-performing solutions using policy-gradients. In this work, we propose an in-depth study of PGA-MAP-Elites. We demonstrate the benefits of policy-gradients on the performance of the algorithm and the reproducibility of the generated solutions when considering uncertain domains. We first prove that PGA-MAP-Elites is highly performant in both deterministic and uncertain high-dimensional environments, decorrelating the two challenges it tackles. Secondly, we show that in addition to outperforming all the considered baselines, the collections of solutions generated by PGA-MAP-Elites are highly reproducible in uncertain environments, approaching the reproducibility of solutions found by Quality-Diversity approaches built specifically for uncertain applications. Finally, we propose an ablation and in-depth analysis of the dynamic of the policy-gradients-based variation. We demonstrate that the policy-gradient variation operator is determinant to guarantee the performance of PGA-MAP-Elites but is only essential during the early stage of the process, where it finds high-performing regions of the search space.
Conference paperChalumeau F, Boige R, Lim BWT, et al., 2023,
Neuroevolution is a Competitive Alternative to Reinforcement Learning for Skill Discovery, The 11th International Conference on Learning Representations (ICLR) 2023
Conference paperSurana S, Lim BWT, Cully A, 2023,
Efficient Learning of Locomotion Skills through the Discovery of Diverse Environmental Trajectory Generator Priors, IEEE International Conference on Robotics and Automation, ISSN: 2152-4092
Journal articleGrillotti L, Cully A, 2022,
Quality-Diversity algorithms refer to a class of evolutionary algorithms designed to find a collection of diverse and high-performing solutions to a given problem. In robotics, such algorithms can be used for generating a collection of controllers covering most of the possible behaviours of a robot. To do so, these algorithms associate a behavioural descriptor to each of these behaviours. Each behavioural descriptor is used for estimating the novelty of one behaviour compared to the others. In most existing algorithms, the behavioural descriptor needs to be hand-coded, thus requiring prior knowledge about the task to solve. In this paper, we introduce: Autonomous Robots Realising their Abilities, an algorithm that uses a dimensionality reduction technique to automatically learn behavioural descriptors based on raw sensory data. The performance of this algorithm is assessed on three robotic tasks in simulation. The experimental results show that it performs similarly to traditional hand-coded approaches without the requirement to provide any hand-coded behavioural descriptor. In the collection of diverse and high-performing solutions, it also manages to find behaviours that are novel with respect to more features than its hand-coded baselines. Finally, we introduce a variant of the algorithm which is robust to the dimensionality of the behavioural descriptor space.
Conference paperCretu A-M, Houssiau F, Cully A, et al., 2022,
Although query-based systems (QBS) have become one of the main solutions to share data anonymously, building QBSes that robustly protect the privacy of individuals contributing to the dataset is a hard problem. Theoretical solutions relying on differential privacy guarantees are difficult to implement correctly with reasonable accuracy, while ad-hoc solutions might contain unknown vulnerabilities. Evaluating the privacy provided by QBSes must thus be done by evaluating the accuracy of a wide range of privacy attacks. However, existing attacks against QBSes require time and expertise to develop, need to be manually tailored to the specific systems attacked, and are limited in scope. In this paper, we develop QuerySnout, the first method to automatically discover vulnerabilities in query-based systems. QuerySnout takes as input a target record and the QBS as a black box, analyzes its behavior on one or more datasets, and outputs a multiset of queries together with a rule to combine answers to them in order to reveal the sensitive attribute of the target record. QuerySnout uses evolutionary search techniques based on a novel mutation operator to find a multiset of queries susceptible to lead to an attack, and a machine learning classifier to infer the sensitive attribute from answers to the queries selected. We showcase the versatility of QuerySnout by applying it to two attack scenarios (assuming access to either the private dataset or to a different dataset from the same distribution), three real-world datasets, and a variety of protection mechanisms. We show the attacks found by QuerySnout to consistently equate or outperform, sometimes by a large margin, the best attacks from the literature. We finally show how QuerySnout can be extended to QBSes that require a budget, and apply QuerySnout to a simple QBS based on the Laplace mechanism. Taken together, our results show how powerful and accurate attacks against QBSes can already be found by an automated system, allo
Conference paperLim BWT, Grillotti L, Bernasconi L, et al., 2022,
Quality-Diversity (QD) algorithms are powerful exploration algorithms that allow robots to discover large repertoires of diverse and high-performing skills. However, QD algorithms are sample inefficient and require millionsof evaluations. In this paper, we propose Dynamics-Aware Quality-Diversity (DA-QD), a framework to improve the sample efficiency of QD algorithms through the use of dynamics models. We also show how DA-QD can then be used for continual acquisition of new skill repertoires. To do so, weincrementally train a deep dynamics model from experience obtained when performing skill discovery using QD. We can then perform QD exploration in imagination with an imagined skill repertoire. We evaluate our approach on three robotic experiments. First, our experiments show DA-QD is 20 timesmore sample efficient than existing QD approaches for skill discovery. Second, we demonstrate learning an entirely new skill repertoire in imagination to perform zero-shot learning. Finally, we show how DA-QD is useful and effective for solving a long horizon navigation task and for damage adaptation in the real world. Videos and source code are available at: https://sites.google.com/view/da-qd.
Conference paperLim BWT, Reichenbach A, Cully A, 2022,
Quality-Diversity (QD) algorithms can discover large and complex behavioural repertoires consisting of both diverse and high-performing skills. However, the generation of behavioural repertoires has mainly been limited to simulation environments instead of real-world learning. This is because existing QD algorithms need large numbers of evaluations as well as episodic resets, which require manual human supervision and interventions. This paper proposes Reset-Free Quality-Diversity optimization (RF-QD) as a step towards autonomous learning for robotics in open-ended environments. We build on Dynamics-Aware Quality-Diversity (DA-QD) and introduce a behaviour selection policy that leverages the diversity of the imagined repertoire and environmental information to intelligently select of behaviours that can act as automatic resets. We demonstrate this through a task of learning to walk within defined training zones with obstacles. Our experiments show that we can learn full repertoires of legged locomotion controllers autonomously without manual resets with high sample efficiency in spite of harsh safety constraints. Finally, using an ablation of different target objectives, we show that it is important for RF-QD to have diverse types solutions available for the behaviour selection policy over solutions optimised with a specific objective. Videos and code available at this https URL.
Conference paperPierrot T, Macé V, Chalumeau F, et al., 2022,
Conference paperLim B, Allard M, Grillotti L, et al., 2022,
Quality-Diversity (QD) algorithms are a well-known approach to generate large collections of diverse and high-quality policies. However, QD algorithms are also known to be data-inefficient, requiring large amounts of computational resources and are slow when used in practice for robotics tasks. Policy evaluations are already commonly performed in parallel to speed up QD algorithms but have limited capabilities on a single machine as most physics simulators run on CPUs. With recent advances in simulators that run on accelerators, thousands of evaluations can be performed in parallel on single GPU/TPU. In this paper, we present QDax, an implementation of MAP-Elites which leverages massive parallelism on accelerators to make QD algorithms more accessible. We show that QD algorithms are ideal candidates and can scale with massive parallelism to be run at interactive timescales. The increase in parallelism does not significantly affect the performance of QD algorithms, while reducing experiment runtimes by two factors of magnitudes, turning days of computation into minutes. These results show that QD can now benefit from hardware acceleration, which contributed significantly to the bloom of deep learning.
Conference paperAllard M, Smith Bize S, Chatzilygeroudis K, et al., 2022,
Adaptation capabilities, like damage recovery, are crucial for the deployment of robots in complex environments. Several works have demonstrated that using repertoires of pre-trained skills can enable robots to adapt to unforeseen mechanical damages in a few minutes. These adaptation capabilities are directly linked to the behavioural diversity in the repertoire. The more alternatives the robot has to execute a skill, the better are the chances that it can adapt to a new situation. However, solving complex tasks, like maze navigation, usually requires multiple different skills. Finding a large behavioural diversity for these multiple skills often leads to an intractable exponential growth of the number of required solutions.In this paper, we introduce the Hierarchical Trial and Error algorithm, which uses a hierarchical behavioural repertoire to learn diverse skills and leverages them to make the robot more adaptive to different situations. We show that the hierarchical decomposition of skills enables the robot to learn more complex behaviours while keeping the learning of the repertoire tractable. The experiments with a hexapod robot show that our method solves maze navigation tasks with 20% less actions in the most challenging scenarios than the best baseline while having 57% less complete failures.
Conference paperGrillotti L, Cully A, 2022,
Quality-Diversity algorithms provide efficient mechanisms to generate large collections of diverse and high-performing solutions, which have shown to be instrumental for solving downstream tasks. However, most of those algorithms rely on a behavioural descriptor to characterise the diversity that is hand-coded, hence requiring prior knowledge about the considered tasks. In this work, we introduce Relevance-guided Unsupervised Discovery of Abilities; a Quality-Diversity algorithm that autonomously finds a behavioural characterisation tailored to the task at hand. In particular, our method introduces a custom diversity metric that leads to higher densities of solutions near the areas of interest in the learnt behavioural descriptor space. We evaluate our approach on a simulated robotic environment, where the robot has to autonomously discover its abilities based on its full sensory data. We evaluated the algorithms on three tasks: navigation to random targets, moving forward with a high velocity, and performing half-rolls. The experimental results show that our method manages to discover collections of solutions that are not only diverse, but also well-adapted to the considered downstream task.
Conference paperCully A, 2021,
Multi-Emitter MAP-Elites: Improving quality, diversity and convergence speed with heterogeneous sets of emitters, Genetic and Evolutionary Computation Conference (GECCO), Publisher: ACM, Pages: 84-92
Quality-Diversity (QD) optimisation is a new family of learning algorithmsthat aims at generating collections of diverse and high-performing solutions.Among those algorithms, MAP-Elites is a simple yet powerful approach that hasshown promising results in numerous applications. In this paper, we introduce anovel algorithm named Multi-Emitter MAP-Elites (ME-MAP-Elites) that improvesthe quality, diversity and convergence speed of MAP-Elites. It is based on therecently introduced concept of emitters, which are used to drive thealgorithm's exploration according to predefined heuristics. ME-MAP-Elitesleverages the diversity of a heterogeneous set of emitters, in which eachemitter type is designed to improve differently the optimisation process.Moreover, a bandit algorithm is used to dynamically find the best emitter setdepending on the current situation. We evaluate the performance ofME-MAP-Elites on six tasks, ranging from standard optimisation problems (in 100dimensions) to complex locomotion tasks in robotics. Our comparisons againstMAP-Elites and existing approaches using emitters show that ME-MAP-Elites isfaster at providing collections of solutions that are significantly morediverse and higher performing. Moreover, in the rare cases where no fruitfulsynergy can be found between the different emitters, ME-MAP-Elites isequivalent to the best of the compared algorithms.
Conference paperFlageat M, Cully A, 2020,
Quality-Diversity optimisation algorithms enable the evolutionof collections of both high-performing and diverse solutions.These collections offer the possibility to quickly adapt andswitch from one solution to another in case it is not workingas expected. It therefore finds many applications in real-worlddomain problems such as robotic control. However, QD algo-rithms, like most optimisation algorithms, are very sensitive touncertainty on the fitness function, but also on the behaviouraldescriptors. Yet, such uncertainties are frequent in real-worldapplications. Few works have explored this issue in the spe-cific case of QD algorithms, and inspired by the literature inEvolutionary Computation, mainly focus on using samplingto approximate the ”true” value of the performances of a solu-tion. However, sampling approaches require a high number ofevaluations, which in many applications such as robotics, canquickly become impractical.In this work, we propose Deep-Grid MAP-Elites, a variantof the MAP-Elites algorithm that uses an archive of similarpreviously encountered solutions to approximate the perfor-mance of a solution. We compare our approach to previouslyexplored ones on three noisy tasks: a standard optimisationtask, the control of a redundant arm and a simulated Hexapodrobot. The experimental results show that this simple approachis significantly more resilient to noise on the behavioural de-scriptors, while achieving competitive performances in termsof fitness optimisation, and being more sample-efficient thanother existing approaches.
Journal articleZhang F, Cully A, Demiris Y, 2019,
Robotic solutions to dressing assistance have the potential to provide tremendous support for elderly and disabled people. However, unexpected user movements may lead to dressing failures or even pose a risk to the user. Tracking such user movements with vision sensors is challenging due to severe visual occlusions created by the robot and clothes. In this paper, we propose a probabilistic tracking method using Bayesian networks in latent spaces, which fuses robot end-effector positions and force information to enable cameraless and real-time estimation of the user postures during dressing. The latent spaces are created before dressing by modeling the user movements with a Gaussian process latent variable model, taking the user’s movement limitations into account. We introduce a robot-assisted dressing system that combines our tracking method with hierarchical multitask control to minimize the force between the user and the robot. The experimental results demonstrate the robustness and accuracy of our tracking method. The proposed method enables the Baxter robot to provide personalized dressing assistance in putting on a sleeveless jacket for users with (simulated) upper-body impairments.
Conference paperCully A, 2019,
Quality-Diversity optimization is a new family of optimization al-gorithms that, instead of searching for a single optimal solutionto solving a task, searches for a large collection of solutions thatall solve the task in a different way. This approach is particularly promising for learning behavioral repertoires in robotics, as sucha diversity of behaviors enables robots to be more versatile and resilient. However, these algorithms require the user to manually define behavioral descriptors, which is used to determine whethertwo solutions are different or similar. The choice of a behavioral de-scriptor is crucial, as it completely changes the solution types thatthe algorithm derives. In this paper, we introduce a new method to automatically define this descriptor by combining Quality-Diversityalgorithms with unsupervised dimensionality reduction algorithms. This approach enables robots to autonomously discover the rangeof their capabilities while interacting with their environment. The results from two experimental scenarios demonstrate that robot canautonomously discover a large range of possible behaviors, without any prior knowledge about their morphology and environment. Furthermore, these behaviors are deemed to be similar to hand-crafted solutions that uses domain knowledge and signicantly more diverse than when using existing unsupervised methods.
Conference paperArulkumaran K, Cully A, Togelius J, 2019,
In January 2019, DeepMind revealed AlphaStar to the world—thefirst artificial intelligence (AI) system to beat a professional playerat the game of StarCraft II—representing a milestone in the progressof AI. AlphaStar draws on many areas of AI research, includingdeep learning, reinforcement learning, game theory, and evolution-ary computation (EC). In this paper we analyze AlphaStar primar-ily through the lens of EC, presenting a new look at the systemandrelating it to many concepts in the field. We highlight some ofitsmost interesting aspects—the use of Lamarckian evolution,com-petitive co-evolution, and quality diversity. In doing so,we hopeto provide a bridge between the wider EC community and one ofthe most significant AI systems developed in recent times.
Journal articleCully A, Demiris Y, 2019,
Intelligent Tutoring Systems are promising tools for delivering optimal and personalised learning experiences to students. A key component for their personalisation is the student model, which infers the knowledge level of the students to balance the difficulty of the exercises. While important advances have been achieved, several challenges remain. In particular, the models should be able to track in real-time the evolution of the students' knowledge levels. These evolutions are likely to follow different profiles for each student, while measuring the exact knowledge level remains difficult given the limited and noisy information provided by the interactions. This paper introduces a novel model that addresses these challenges with three contributions: 1) the model relies on Gaussian Processes to track online the evolution of the student's knowledge level over time, 2) it uses collaborative filtering to rapidly provide long-term predictions by leveraging the information from previous users, and 3) it automatically generates abstract representations of knowledge components via automatic relevance determination of covariance matrices. The model is evaluated on three datasets, including real users. The results demonstrate that the model converges to accurate predictions in average 4 times faster than the compared methods.
This data is extracted from the Web of Science and reproduced under a licence from Thomson Reuters. You may not copy or re-distribute this data in whole or in part without the written consent of the Science business of Thomson Reuters.