52 results found
Lim BWT, Flageat M, Cully A, 2023, Efficient exploration using model-based quality-diversity with gradients, Conference on Artificial Life, Publisher: MIT Press, Pages: 1-10
Exploration is a key challenge in Reinforcement Learning,especially in long-horizon, deceptive and sparse-reward environments. For such applications, population-based approaches have proven effective. Methods such as Quality-Diversity deals with this by encouraging novel solutions and producing a diversity of behaviours. However, these methods are driven by either undirected sampling (i.e. mutations) or use approximated gradients (i.e. Evolution Strategies) in the parameter space, which makes them highly sample-inefficient. In this paper, we propose Dynamics-Aware QD-Ext (DA-QD-ext) and Gradient and Dynamics Aware QD (GDA-QD), two model-based Quality-Diversity approaches. They extend existing QD methods to use gradients for efficient exploitation and leverage perturbations in imagination for efficient exploration.Our approach takes advantage of the effectiveness of QD algorithms as good data generators to train deep models and use these models to learn diverse and high-performing populations. We demonstrate that they outperform baseline RL approaches on tasks with deceptive rewards, and maintain the divergent search capabilities of QD approaches while exceeding their performance by ∼ 1.5 times and reaching the same results in 5 times less samples.
Boige R, Richard G, Dona J, et al., 2023, Gradient-informed quality diversity for the illumination of discrete spaces, Pages: 119-128
Quality Diversity (qd) algorithms have been proposed to search for a large collection of both diverse and high-performing solutions instead of a single set of local optima. While early qd algorithms view the objective and descriptor functions as black-box functions, novel tools have been introduced to use gradient information to accelerate the search and improve overall performance of those algorithms over continuous input spaces. However a broad range of applications involve discrete spaces, such as drug discovery or image generation. Exploring those spaces is challenging as they are combinatorially large and gradients cannot be used in the same manner as in continuous spaces. We introduce map-elites with a Gradient-Informed Discrete Emitter (me-gide), which extends qd optimisation with differentiable functions over discrete search spaces. me-gide leverages the gradient information of the objective and descriptor functions with respect to its discrete inputs to propose gradient-informed updates that guide the search towards a diverse set of high quality solutions. We evaluate our method on challenging benchmarks including protein design and discrete latent space illumination and find that our method outperforms state-of-the-art qd algorithms in all benchmarks.
Flageat M, Grillotti L, Cully A, 2023, Benchmark Tasks for Quality-Diversity Applied to Uncertain Domains, GECCO '23 Companion: Companion Conference on Genetic and Evolutionary Computation, Publisher: ACM
Janmohamed H, Pierrot T, Cully A, 2023, Improving the data efficiency of multi-objective quality-diversity through gradient assistance and crowding exploration, GECCO 2023, Publisher: ACM, Pages: 165-173
Quality-Diversity (QD) algorithms have recently gained traction as optimisation methods due to their effectiveness at escaping local optima and capability of generating wide-ranging and high-performing solutions. Recently, Multi-Objective MAP-Elites (MOME) extended the QD paradigm to the multi-objective setting by maintaining a Pareto front in each cell of a MAP-ELITES grid. MOME achieved a global performance that competed with NSGA-11 and SPEA2, two well-established multi-objective evolutionary algorithms, while also acquiring a diverse repertoire of solutions. However, MOME is limited by non-directed genetic search mechanisms which struggle in high-dimensional search spaces. In this work, we present Multi-Objective MAP-Elites with Policy-Gradient Assistance and Crowding-based Exploration (MOME-PGX: a new QD algorithm that extends MOME to improve its data efficiency and performance. MOME-PGX uses gradient-based optimisation to efficiently drive solutions towards higher performance. It also introduces crowding-based mechanisms to create an improved exploration strategy and to encourage greater uniformity across Pareto fronts. We evaluate MOME-PGX in four simulated robot locomotion tasks and demonstrate that it converges faster and to a higher performance than all other baselines. We show that MOME-PGX is between 4.3 and 42 times more data-efficient than MOME and doubles the performance of MOME, NSGA-11 and SPEA2 in challenging environments.
Grillotti L, Flageat M, Lim B, et al., 2023, Don't bet on luck alone: enhancing behavioral reproducibility of quality-diversity solutions in uncertain domains, Genetic and Evolutionary Computation Conference (GECCO), Publisher: ACM
Quality-Diversity (QD) algorithms are designed to generate collections of high-performing solutions while maximizing their diversity in a given descriptor space. However, in the presence of unpredictable noise, the fitness and descriptor of the same solution can differ significantly from one evaluation to another, leading to uncertainty in the estimation of such values. Given the elitist nature of QD algorithms, they commonly end up with many degeneratesolutions in such noisy settings. In this work, we introduce Archive Reproducibility Improvement Algorithm (ARIA); a plug-and-play approach that improves the reproducibility of the solutions present in an archive. We propose it as a separate optimization module, relying on natural evolution strategies, that can be executed on top of any QD algorithm. Our module mutates solutions to (1) optimize their probability of belonging to their niche, and (2) maximize their fitness. The performance of our method is evaluated on various tasks, including a classical optimization problem and two high-dimensional control tasks in simulated robotic environments. We show that our algorithm enhances the quality and descriptor space coverage of any given archive by at least 50%.
Faldor M, Chalumeau F, Flageat M, et al., 2023, MAP-elites with descriptor-conditioned gradients and archive distillation into a single policy, The Genetic and Evolutionary Computation Conference, Publisher: Association for Computing Machinery, Pages: 138-146
Quality-Diversity algorithms, such as MAP-Elites, are a branch of Evolutionary Computation generating collections of diverse and high-performing solutions, that have been successfully applied to a variety of domains and particularly in evolutionary robotics. However, MAP-Elites performs a divergent search based on random mutations originating from Genetic Algorithms, and thus, is limited to evolving populations of low-dimensional solutions. PGA-MAP-Elites overcomes this limitation by integrating a gradient-based variation operator inspired by Deep Reinforcement Learning which enables the evolution of large neural networks. Although high-performing in many environments, PGA-MAP-Elites fails on several tasks where the convergent search of the gradient-based operator does not direct mutations towards archive-improving solutions. In this work, we present two contributions: (1) we enhance the Policy Gradient variation operator with a descriptor-conditioned critic that improves the archive across the entire descriptor space, (2) we exploit the actor-critic training to learn a descriptor-conditioned policy at no additional cost, distilling the knowledge of the archive into one single versatile policy that can execute the entire range of behaviors contained in the archive. Our algorithm, DCG-MAP-Elites improves the QD score over PGA-MAP-Elites by 82% on average, on a set of challenging locomotion tasks.
Lim BWT, Flageat M, Cully A, 2023, Understanding the synergies between quality-diversity and deep reinforcement learning, The Genetic and Evolutionary Computation Conference (GECCO '23), Publisher: ACM, Pages: 1212-1220
The synergies between Quality-Diversity (QD) and Deep Reinforcement Learning (RL) have led to powerful hybrid QD-RL algorithms that have shown tremendous potential, and bring the best of both fields. However, only a single deep RL algorithm (TD3) has been used in prior hybrid methods despite notable progress made by other RL algorithms. Additionally, there are fundamental differencesin the optimization procedures between QD and RL which would benefit from a more principled approach. We propose Generalized Actor-Critic QD-RL, a unified modular framework for actor-critic deep RL methods in the QD-RL setting. This framework provides a path to study insights from Deep RL in the QD-RL setting, which is an important and efficient way to make progress in QD-RL. Weintroduce two new algorithms, PGA-ME (SAC) and PGA-ME (DroQ) which apply recent advancements in Deep RL to the QD-RL setting, and solve the humanoid environment which was not possible using existing QD-RL algorithms. However, we also find that not all insights from Deep RL can be effectively translated to QD-RL. Critically, this work also demonstrates that the actor-critic models in QD-RL are generally insufficiently trained and performance gainscan be achieved without any additional environment evaluations.
Allard M, Smith SC, Chatzilygeroudis K, et al., 2023, Online damage recovery for physical robots with hierarchical quality-diversity, ACM Transactions on Evolutionary Learning and Optimization, Vol: 3, Pages: 1-23, ISSN: 2688-299X
In real-world environments, robots need to be resilient to damages and robust to unforeseen scenarios. Quality-Diversity (QD) algorithms have been successfully used to make robots adapt to damages in seconds by leveraging a diverse set of learned skills. A high diversity of skills increases the chances of a robot to succeed at overcoming new situations since there are more potential alternatives to solve a new task. However, finding and storing a large behavioural diversity of multiple skills often leads to an increase in computational complexity. Furthermore, robot planning in a large skill space is an additional challenge that arises with an increased number of skills. Hierarchical structures can help to reduce this search and storage complexity by breaking down skills into primitive skills. In this paper, we extend the analysis of the Hierarchical Trial and Error algorithm, which uses a hierarchical behavioural repertoire to learn diverse skills and leverages them to make the robot adapt quickly in the physical world. We show that the hierarchical decomposition of skills enables the robot to learn more complex behaviours while keeping the learning of the repertoire tractable. Experiments with a hexapod robot both in simulation and the physical world show that our method solves a maze navigation task with up to 20% respectively 43% less actions than the best baselines while having 78% less complete failures.
Jain S, Cretu A-M, Cully A, et al., 2023, Deep perceptual hashing algorithms with hidden dual purpose: when client-side scanning does facial recognition, IEEE Symposium on Security and Privacy, Publisher: IEEE, Pages: 234-252
End-to-end encryption (E2EE) provides strong technical protections to individuals from interferences. Governments and law enforcement agencies around the world have however raised concerns that E2EE also allows illegal content to be shared undetected. Client-side scanning (CSS), using perceptual hashing (PH) to detect known illegal content before it is shared, is seen as a promising solution to prevent the diffusion of illegal content while preserving encryption. While these proposals raise strong privacy concerns, proponents of the solutions have argued that the risk is limited as the technology has a limited scope: detecting known illegal content. In this paper, we show that modern perceptual hashing algorithms are actually fairly flexible pieces of technology and that this flexibility could be used by an adversary to add a secondary hidden feature to a client-side scanning system. More specifically, we show that an adversary providing the PH algorithm can “hide” a secondary purpose of face recognition of a target individual alongside its primary purpose of image copy detection. We first propose a procedure to train a dual-purpose deep perceptual hashing model by jointly optimizing for both the image copy detection and the targeted facial recognition task. Second, we extensively evaluate our dual-purpose model and show it to be able to reliably identify a target individual 67% of the time while not impacting its performance at detecting illegal content. We also show that our model is neither a general face detection nor a facial recognition model, allowing its secondary purpose to be hidden. Finally, we show that the secondary purpose can be enabled by adding a single illegal looking image to the database. Taken together, our results raise concerns that a deep perceptual hashing-based CSS system could turn billions of user devices into tools to locate targeted individuals.
Flageat M, Chalumeau F, Cully A, 2023, Empirical analysis of PGA-MAP-Elites for neuroevolution in uncertain domains, ACM Transactions on Evolutionary Learning and Optimization, Vol: 3, Pages: 1-32, ISSN: 2688-299X
Quality-Diversity algorithms, among which MAP-Elites, have emerged as powerful alternatives to performance-only optimisation approaches as they enable generating collections of diverse and high-performing solutions to an optimisation problem. However, they are often limited to low-dimensional search spaces and deterministic environments. The recently introduced Policy Gradient Assisted MAP-Elites (PGA-MAP-Elites) algorithm overcomes this limitation by pairing the traditional Genetic operator of MAP-Elites with a gradient-based operator inspired by Deep Reinforcement Learning. This new operator guides mutations toward high-performing solutions using policy-gradients. In this work, we propose an in-depth study of PGA-MAP-Elites. We demonstrate the benefits of policy-gradients on the performance of the algorithm and the reproducibility of the generated solutions when considering uncertain domains. We first prove that PGA-MAP-Elites is highly performant in both deterministic and uncertain high-dimensional environments, decorrelating the two challenges it tackles. Secondly, we show that in addition to outperforming all the considered baselines, the collections of solutions generated by PGA-MAP-Elites are highly reproducible in uncertain environments, approaching the reproducibility of solutions found by Quality-Diversity approaches built specifically for uncertain applications. Finally, we propose an ablation and in-depth analysis of the dynamic of the policy-gradients-based variation. We demonstrate that the policy-gradient variation operator is determinant to guarantee the performance of PGA-MAP-Elites but is only essential during the early stage of the process, where it finds high-performing regions of the search space.
Lim BWT, Allard M, Grillotti L, et al., 2023, Accelerated quality-diversity through massive parallelism, Transactions on Machine Learning Research, ISSN: 2835-8856
Quality-Diversity (QD) optimization algorithms are a well-known approach to generate large collections of diverse and high-quality solutions. However, derived from evolutionary computation, QD algorithms are population-based methods which are known to be data-inefficient and requires large amounts of computational resources. This makes QD algorithms slow when used in applications where solution evaluations are computationally costly. A common approach to speed up QD algorithms is to evaluate solutions in parallel, for instance by using physical simulators in robotics. Yet, this approach is limited to several dozen of parallel evaluations as most physics simulators can only be parallelized more with a greater number of CPUs. With recent advances in simulators that run on accelerators, thousands of evaluations can now be performed in parallel on single GPU/TPU. In this paper, we present QDax, an accelerated implementation of MAP-Elites which leverages massive parallelism on accelerators to make QD algorithms more accessible. We show that QD algorithms are ideal candidates to take advantage of progress in hardware acceleration. We demonstrate that QD algorithms can scale with massive parallelism to be run at interactive timescales without any significant effect on the performance. Results across standard optimization functions and four neuroevolution benchmark environments shows that experiment runtimes are reduced by two factors of magnitudes, turning days of computation into minutes. More surprising, we observe that reducing the number of generations by two orders of magnitude, and thus having significantly shorter lineage does not impact the performance of QD algorithms. These results show that QD can now benefit from hardware acceleration, which contributed significantly to the bloom of deep learning.
Chalumeau F, Boige R, Lim BWT, et al., 2023, Neuroevolution is a Competitive Alternative to Reinforcement Learning for Skill Discovery, The 11th International Conference on Learning Representations (ICLR) 2023
Surana S, Lim BWT, Cully A, 2023, Efficient Learning of Locomotion Skills through the Discovery of Diverse Environmental Trajectory Generator Priors, IEEE International Conference on Robotics and Automation, ISSN: 2152-4092
Flageat M, Cully A, 2023, Uncertain Quality-Diversity: Evaluation methodology and new methods for Quality-Diversity in Uncertain Domains, IEEE Transactions on Evolutionary Computation, Pages: 1-1, ISSN: 1089-778X
Grillotti L, Cully A, 2022, Unsupervised behaviour discovery with quality-diversity optimisation, IEEE Transactions on Evolutionary Computation, Vol: 26, Pages: 1539-1552, ISSN: 1089-778X
Quality-Diversity algorithms refer to a class of evolutionary algorithms designed to find a collection of diverse and high-performing solutions to a given problem. In robotics, such algorithms can be used for generating a collection of controllers covering most of the possible behaviours of a robot. To do so, these algorithms associate a behavioural descriptor to each of these behaviours. Each behavioural descriptor is used for estimating the novelty of one behaviour compared to the others. In most existing algorithms, the behavioural descriptor needs to be hand-coded, thus requiring prior knowledge about the task to solve. In this paper, we introduce: Autonomous Robots Realising their Abilities, an algorithm that uses a dimensionality reduction technique to automatically learn behavioural descriptors based on raw sensory data. The performance of this algorithm is assessed on three robotic tasks in simulation. The experimental results show that it performs similarly to traditional hand-coded approaches without the requirement to provide any hand-coded behavioural descriptor. In the collection of diverse and high-performing solutions, it also manages to find behaviours that are novel with respect to more features than its hand-coded baselines. Finally, we introduce a variant of the algorithm which is robust to the dimensionality of the behavioural descriptor space.
Cretu A-M, Houssiau F, Cully A, et al., 2022, QuerySnout: automating the discovery of attribute inference attacks against query-based systems, CCS '22: 2022 ACM SIGSAC Conference on Computer and Communications Security, Publisher: ACM, Pages: 623-637
Although query-based systems (QBS) have become one of the main solutions to share data anonymously, building QBSes that robustly protect the privacy of individuals contributing to the dataset is a hard problem. Theoretical solutions relying on differential privacy guarantees are difficult to implement correctly with reasonable accuracy, while ad-hoc solutions might contain unknown vulnerabilities. Evaluating the privacy provided by QBSes must thus be done by evaluating the accuracy of a wide range of privacy attacks. However, existing attacks against QBSes require time and expertise to develop, need to be manually tailored to the specific systems attacked, and are limited in scope. In this paper, we develop QuerySnout, the first method to automatically discover vulnerabilities in query-based systems. QuerySnout takes as input a target record and the QBS as a black box, analyzes its behavior on one or more datasets, and outputs a multiset of queries together with a rule to combine answers to them in order to reveal the sensitive attribute of the target record. QuerySnout uses evolutionary search techniques based on a novel mutation operator to find a multiset of queries susceptible to lead to an attack, and a machine learning classifier to infer the sensitive attribute from answers to the queries selected. We showcase the versatility of QuerySnout by applying it to two attack scenarios (assuming access to either the private dataset or to a different dataset from the same distribution), three real-world datasets, and a variety of protection mechanisms. We show the attacks found by QuerySnout to consistently equate or outperform, sometimes by a large margin, the best attacks from the literature. We finally show how QuerySnout can be extended to QBSes that require a budget, and apply QuerySnout to a simple QBS based on the Laplace mechanism. Taken together, our results show how powerful and accurate attacks against QBSes can already be found by an automated system, allo
Lim BWT, Grillotti L, Bernasconi L, et al., 2022, Dynamics-aware quality-diversity for efficient learning of skill repertoires, IEEE International Conference on Robotics and Automation, Publisher: IEEE, Pages: 5360-5366
Quality-Diversity (QD) algorithms are powerful exploration algorithms that allow robots to discover large repertoires of diverse and high-performing skills. However, QD algorithms are sample inefficient and require millionsof evaluations. In this paper, we propose Dynamics-Aware Quality-Diversity (DA-QD), a framework to improve the sample efficiency of QD algorithms through the use of dynamics models. We also show how DA-QD can then be used for continual acquisition of new skill repertoires. To do so, weincrementally train a deep dynamics model from experience obtained when performing skill discovery using QD. We can then perform QD exploration in imagination with an imagined skill repertoire. We evaluate our approach on three robotic experiments. First, our experiments show DA-QD is 20 timesmore sample efficient than existing QD approaches for skill discovery. Second, we demonstrate learning an entirely new skill repertoire in imagination to perform zero-shot learning. Finally, we show how DA-QD is useful and effective for solving a long horizon navigation task and for damage adaptation in the real world. Videos and source code are available at: https://sites.google.com/view/da-qd.
Pierrot T, Macé V, Chalumeau F, et al., 2022, Diversity Policy Gradient for Sample Efficient Quality-Diversity Optimization, The Genetic and Evolutionary Computation Conference (GECCO)
Lim BWT, Reichenbach A, Cully A, 2022, Learning to walk autonomously via reset-free quality-diversity, The Genetic and Evolutionary Computation Conference (GECCO)
Quality-Diversity (QD) algorithms can discover large and complex behavioural repertoires consisting of both diverse and high-performing skills. However, the generation of behavioural repertoires has mainly been limited to simulation environments instead of real-world learning. This is because existing QD algorithms need large numbers of evaluations as well as episodic resets, which require manual human supervision and interventions. This paper proposes Reset-Free Quality-Diversity optimization (RF-QD) as a step towards autonomous learning for robotics in open-ended environments. We build on Dynamics-Aware Quality-Diversity (DA-QD) and introduce a behaviour selection policy that leverages the diversity of the imagined repertoire and environmental information to intelligently select of behaviours that can act as automatic resets. We demonstrate this through a task of learning to walk within defined training zones with obstacles. Our experiments show that we can learn full repertoires of legged locomotion controllers autonomously without manual resets with high sample efficiency in spite of harsh safety constraints. Finally, using an ablation of different target objectives, we show that it is important for RF-QD to have diverse types solutions available for the behaviour selection policy over solutions optimised with a specific objective. Videos and code available at this https URL.
Allard M, Smith Bize S, Chatzilygeroudis K, et al., 2022, Hierarchical Quality-Diversity For Online Damage Recovery, The Genetic and Evolutionary Computation Conference, Publisher: ACM
Adaptation capabilities, like damage recovery, are crucial for the deployment of robots in complex environments. Several works have demonstrated that using repertoires of pre-trained skills can enable robots to adapt to unforeseen mechanical damages in a few minutes. These adaptation capabilities are directly linked to the behavioural diversity in the repertoire. The more alternatives the robot has to execute a skill, the better are the chances that it can adapt to a new situation. However, solving complex tasks, like maze navigation, usually requires multiple different skills. Finding a large behavioural diversity for these multiple skills often leads to an intractable exponential growth of the number of required solutions.In this paper, we introduce the Hierarchical Trial and Error algorithm, which uses a hierarchical behavioural repertoire to learn diverse skills and leverages them to make the robot more adaptive to different situations. We show that the hierarchical decomposition of skills enables the robot to learn more complex behaviours while keeping the learning of the repertoire tractable. The experiments with a hexapod robot show that our method solves maze navigation tasks with 20% less actions in the most challenging scenarios than the best baseline while having 57% less complete failures.
Grillotti L, Cully A, 2022, Relevance-guided unsupervised discovery of abilities with quality-diversity algorithms, Genetic and Evolutionary Computation Conference (GECCO), Publisher: ACM, Pages: 77-85
Quality-Diversity algorithms provide efficient mechanisms to generate large collections of diverse and high-performing solutions, which have shown to be instrumental for solving downstream tasks. However, most of those algorithms rely on a behavioural descriptor to characterise the diversity that is hand-coded, hence requiring prior knowledge about the considered tasks. In this work, we introduce Relevance-guided Unsupervised Discovery of Abilities; a Quality-Diversity algorithm that autonomously finds a behavioural characterisation tailored to the task at hand. In particular, our method introduces a custom diversity metric that leads to higher densities of solutions near the areas of interest in the learnt behavioural descriptor space. We evaluate our approach on a simulated robotic environment, where the robot has to autonomously discover its abilities based on its full sensory data. We evaluated the algorithms on three tasks: navigation to random targets, moving forward with a high velocity, and performing half-rolls. The experimental results show that our method manages to discover collections of solutions that are not only diverse, but also well-adapted to the considered downstream task.
Kaptein F, Kiefer B, Cully A, et al., 2022, A cloud-based robot system for long-term interaction: principles, implementation, lessons learned, ACM Transactions on Human-Robot Interaction, Vol: 11, ISSN: 2573-9522
Making the transition to long-term interaction with social-robot systems has been identified as one of the main challenges in human-robot interaction. This article identifies four design principles to address this challenge and applies them in a real-world implementation: cloud-based robot control, a modular design, one common knowledge base for all applications, and hybrid artificial intelligence for decision making and reasoning. The control architecture for this robot includes a common Knowledge-base (ontologies), Data-base, “Hybrid Artificial Brain” (dialogue manager, action selection and explainable AI), Activities Centre (Timeline, Quiz, Break and Sort, Memory, Tip of the Day, ), Embodied Conversational Agent (ECA, i.e., robot and avatar), and Dashboards (for authoring and monitoring the interaction). Further, the ECA is integrated with an expandable set of (mobile) health applications. The resulting system is a Personal Assistant for a healthy Lifestyle (PAL), which supports diabetic children with self-management and educates them on health-related issues (48 children, aged 6–14, recruited via hospitals in the Netherlands and in Italy). It is capable of autonomous interaction “in the wild” for prolonged periods of time without the need for a “Wizard-of-Oz” (up until 6 months online). PAL is an exemplary system that provides personalised, stable and diverse, long-term human-robot interaction.
Lim B, Allard M, Grillotti L, et al., 2022, QDax: On the Benefits of Massive Parallelization for Quality-Diversity, Genetic and Evolutionary Computation Conference (GECCO), Publisher: ASSOC COMPUTING MACHINERY, Pages: 128-131
Cully A, 2021, Multi-Emitter MAP-Elites: Improving quality, diversity and convergence speed with heterogeneous sets of emitters, Genetic and Evolutionary Computation Conference (GECCO), Publisher: ACM, Pages: 84-92
Quality-Diversity (QD) optimisation is a new family of learning algorithmsthat aims at generating collections of diverse and high-performing solutions.Among those algorithms, MAP-Elites is a simple yet powerful approach that hasshown promising results in numerous applications. In this paper, we introduce anovel algorithm named Multi-Emitter MAP-Elites (ME-MAP-Elites) that improvesthe quality, diversity and convergence speed of MAP-Elites. It is based on therecently introduced concept of emitters, which are used to drive thealgorithm's exploration according to predefined heuristics. ME-MAP-Elitesleverages the diversity of a heterogeneous set of emitters, in which eachemitter type is designed to improve differently the optimisation process.Moreover, a bandit algorithm is used to dynamically find the best emitter setdepending on the current situation. We evaluate the performance ofME-MAP-Elites on six tasks, ranging from standard optimisation problems (in 100dimensions) to complex locomotion tasks in robotics. Our comparisons againstMAP-Elites and existing approaches using emitters show that ME-MAP-Elites isfaster at providing collections of solutions that are significantly morediverse and higher performing. Moreover, in the rare cases where no fruitfulsynergy can be found between the different emitters, ME-MAP-Elites isequivalent to the best of the compared algorithms.
Rakicevic N, Cully A, Kormushev P, 2021, Policy manifold search: exploring the manifold hypothesis for diversity-based neuroevolution, Genetic and Evolutionary Computation Conference (GECCO '21), Pages: 901-909
Neuroevolution is an alternative to gradient-based optimisation that has the potential to avoid local minima and allows parallelisation. The main limiting factor is that usually it does not scale well with parameter space dimensionality. Inspired by recent work examining neural network intrinsic dimension and loss landscapes, we hypothesise that there exists a low-dimensional manifold, embedded in the policy network parameter space, around which a high-density of diverse and useful policies are located. This paper proposes a novel method for diversity-based policy search via Neuroevolution, that leverages learned representations of the policy network parameters, by performing policy search in this learned representation space. Our method relies on the Quality-Diversity (QD) framework which provides a principled approach to policy search, and maintains a collection of diverse policies, used as a dataset for learning policy representations. Further, we use the Jacobian of the inverse-mapping function to guide the search in the representation space. This ensures that the generated samples remain in the high-density regions, after mapping back to the original space. Finally, we evaluate our contributions on four continuous-control tasks in simulated environments, and compare to diversity-based baselines.
Zhou L, Qin K, Cully A, et al., 2021, On the just-in-time discovery of profit-generating transactions in DeFi Protocols, Pages: 919-936, ISSN: 1081-6011
Decentralized Finance (DeFi) is a blockchain-asset-enabled finance ecosystem with millions of daily USD transaction volume, billions of locked up USD, as well as a plethora of newly emerging protocols (for lending, staking, and exchanges). Because all transactions, user balances, and total value locked in DeFi are publicly readable, a natural question that arises is: how can we automatically craft profitable transactions across the intertwined DeFi platforms?In this paper, we investigate two methods that allow us to automatically create profitable DeFi trades, one well-suited to arbitrage and the other applicable to more complicated settings. We first adopt the Bellman-Ford-Moore algorithm with DeFiPoser-ARB and then create logical DeFi protocol models for a theorem prover in DeFiPoser-SMT. While DeFiPoser-ARB focuses on DeFi transactions that form a cycle and performs very well for arbitrage, DeFiPoser-SMT can detect more complicated profitable transactions. We estimate that DeFiPoser-ARB and DeFiPoser-SMT can generate an average weekly revenue of 191.48 ETH (76, 592 USD) and 72.44 ETH (28, 976 USD) respectively, with the highest transaction revenue being 81.31 ETH (32, 524 USD) and 22.40 ETH (8, 960 USD) respectively. We further show that DeFiPoser-SMT finds the known economic bZx attack from February 2020, which yields 0.48M USD. Our forensic investigations show that this opportunity existed for 69 days and could have yielded more revenue if exploited one day earlier. Our evaluation spans 150 days, given 96 DeFi protocol actions, and 25 assets.Looking beyond the financial gains mentioned above, forks deteriorate the blockchain consensus security, as they increase the risks of double-spending and selfish mining. We explore the implications of DeFiPoser-ARB and DeFiPoser-SMT on blockchain consensus. Specifically, we show that the trades identified by our tools exceed the Ethereum block reward by up to 874×. Given optimal adversarial strategies provided by a M
Chatzilygeroudis K, Cully A, Vassiliades V, et al., 2021, Quality-diversity optimization: a novel branch of stochastic optimization, Black Box Optimization, Machine Learning, and No-Free Lunch Theorems, Editors: Pardalos, Rasskazova, Vrahatis, Publisher: Springer International Publishing, Pages: 109-135, ISBN: 978-3-030-66515-9
Traditional optimization algorithms search for a single global optimum that maximizes (or minimizes) the objective function. Multimodal optimization algorithms search for the highest peaks in the search space that can be more than one. Quality-Diversity algorithms are a recent addition to the evolutionary computation toolbox that do not only search for a single set of local optima, but instead try to illuminate the search space. In effect, they provide a holistic view of how high-performing solutions are distributed throughout a search space. The main differences with multimodal optimization algorithms are that (1) Quality-Diversity typically works in the behavioral space (or feature space), and not in the genotypic (or parameter) space, and (2) Quality-Diversity attempts to fill the whole behavior space, even if the niche is not a peak in the fitness landscape. In this chapter, we provide a gentle introduction to Quality-Diversity optimization, discuss the main representative algorithms, and the main current topics under consideration in the community. Throughout the chapter, we also discuss several successful applications of Quality-Diversity algorithms, including deep learning, robotics, and reinforcement learning.
Nilsson O, Cully A, 2021, Policy Gradient Assisted MAP-Elites, 2nd Genetic and Evolutionary Computation Conference (GECCO), Publisher: ASSOC COMPUTING MACHINERY, Pages: 866-875
Rakicevic N, Cully A, Kormushev P, 2020, Policy manifold search for improving diversity-based neuroevolution, Publisher: arXiv
Diversity-based approaches have recently gained popularity as an alternativeparadigm to performance-based policy search. A popular approach from thisfamily, Quality-Diversity (QD), maintains a collection of high-performingpolicies separated in the diversity-metric space, defined based on policies'rollout behaviours. When policies are parameterised as neural networks, i.e.Neuroevolution, QD tends to not scale well with parameter space dimensionality.Our hypothesis is that there exists a low-dimensional manifold embedded in thepolicy parameter space, containing a high density of diverse and feasiblepolicies. We propose a novel approach to diversity-based policy search viaNeuroevolution, that leverages learned latent representations of the policyparameters which capture the local structure of the data. Our approachiteratively collects policies according to the QD framework, in order to (i)build a collection of diverse policies, (ii) use it to learn a latentrepresentation of the policy parameters, (iii) perform policy search in thelearned latent space. We use the Jacobian of the inverse transformation(i.e.reconstruction function) to guide the search in the latent space. Thisensures that the generated samples remain in the high-density regions of theoriginal space, after reconstruction. We evaluate our contributions on threecontinuous control tasks in simulated environments, and compare todiversity-based baselines. The findings suggest that our approach yields a moreefficient and robust policy search process.
Kusters R, Misevic D, Berry H, et al., 2020, Interdisciplinary research in artificial intelligence: challenges and opportunities, Frontiers in Big Data, Vol: 3, Pages: 1-7, ISSN: 2624-909X
The use of artificial intelligence (AI) in a variety of research fields is speeding up multiple digital revolutions, from shifting paradigms in healthcare, precision medicine and wearable sensing, to public services and education offered to the masses around the world, to future cities made optimally efficient by autonomous driving. When a revolution happens, the consequences are not obvious straight away, and to date, there is no uniformly adapted framework to guide AI research to ensure a sustainable societal transition. To answer this need, here we analyze three key challenges to interdisciplinary AI research, and deliver three broad conclusions: 1) future development of AI should not only impact other scientific domains but should also take inspiration and benefit from other fields of science, 2) AI research must be accompanied by decision explainability, dataset bias transparency as well as development of evaluation methodologies and creation of regulatory agencies to ensure responsibility, and 3) AI education should receive more attention, efforts and innovation from the educational and scientific communities. Our analysis is of interest not only to AI practitioners but also to other researchers and the general public as it offers ways to guide the emerging collaborations and interactions toward the most fruitful outcomes.
This data is extracted from the Web of Science and reproduced under a licence from Thomson Reuters. You may not copy or re-distribute this data in whole or in part without the written consent of the Science business of Thomson Reuters.