21 results found
Dai T, Liu H, Arulkumaran K, et al., 2021, Diversity-based trajectory and goal selection with hindsight experience replay, 18th Pacific Rim International Conference on Artificial Intelligence (PRICAI)
Hindsight experience replay (HER) is a goal relabelling technique typicallyused with off-policy deep reinforcement learning algorithms to solvegoal-oriented tasks; it is well suited to robotic manipulation tasks thatdeliver only sparse rewards. In HER, both trajectories and transitions aresampled uniformly for training. However, not all of the agent's experiencescontribute equally to training, and so naive uniform sampling may lead toinefficient learning. In this paper, we propose diversity-based trajectory andgoal selection with HER (DTGSH). Firstly, trajectories are sampled according tothe diversity of the goal states as modelled by determinantal point processes(DPPs). Secondly, transitions with diverse goal states are selected from thetrajectories by using k-DPPs. We evaluate DTGSH on five challenging roboticmanipulation tasks in simulated robot environments, where we show that ourmethod can learn more quickly and reach higher performance than otherstate-of-the-art approaches on all tasks.
Arulkumaran K, Lillrank DO, 2021, A pragmatic look at deep imitation learning, Publisher: arXiv
The introduction of the generative adversarial imitation learning (GAIL)algorithm has spurred the development of scalable imitation learning approachesusing deep neural networks. The GAIL objective can be thought of as 1) matchingthe expert policy's state distribution; 2) penalising the learned policy'sstate distribution; and 3) maximising entropy. While theoretically motivated,in practice GAIL can be difficult to apply, not least due to the instabilitiesof adversarial training. In this paper, we take a pragmatic look at GAIL andrelated imitation learning algorithms. We implement and automatically tune arange of algorithms in a unified experimental setup, presenting a fairevaluation between the competing methods. From our results, our primaryrecommendation is to consider non-adversarial methods. Furthermore, we discussthe common components of imitation learning objectives, and present promisingavenues for future research.
Pan K, Hurault G, Arulkumaran K, et al., 2020, EczemaNet: automating detection and severity assessment of atopic dermatitis, International Workshop on Machine Learning in Medical Imaging, Publisher: Springer Verlag, Pages: 220-230, ISSN: 0302-9743
Atopic dermatitis (AD), also known as eczema, is one of themost common chronic skin diseases. AD severity is primarily evaluatedbased on visual inspections by clinicians, but is subjective and has largeinter- and intra-observer variability in many clinical study settings. Toaid the standardisation and automating the evaluation of AD severity,this paper introduces a CNN computer vision pipeline, EczemaNet, thatfirst detects areas of AD from photographs and then makes probabilisticpredictions on the severity of the disease. EczemaNet combines trans-fer and multitask learning, ordinal classification, and ensembling overcrops to make its final predictions. We test EczemaNet using a set of im-ages acquired in a published clinical trial, and demonstrate low RMSEwith well-calibrated prediction intervals. We show the effectiveness of us-ing CNNs for non-neoplastic dermatological diseases with a medium-sizedataset, and their potential for more efficiently and objectively evaluatingAD severity, which has greater clinical relevance than mere classification.
Kamienny P-A, Arulkumaran K, Behbahani F, et al., 2020, Privileged information dropout in reinforcement learning, Publisher: arXiv
Using privileged information during training can improve the sampleefficiency and performance of machine learning systems. This paradigm has beenapplied to reinforcement learning (RL), primarily in the form of distillationor auxiliary tasks, and less commonly in the form of augmenting the inputs ofagents. In this work, we investigate Privileged Information Dropout (\pid) forachieving the latter which can be applied equally to value-based andpolicy-based RL algorithms. Within a simple partially-observed environment, wedemonstrate that \pid outperforms alternatives for leveraging privilegedinformation, including distillation and auxiliary tasks, and can successfullyutilise different types of privileged information. Finally, we analyse itseffect on the learned representations.
Dai T, Arulkumaran K, Gerbert T, et al., 2020, Analysing deep reinforcement learning agents trained with domain randomisation, Publisher: arXiv
Deep reinforcement learning has the potential to train robots to performcomplex tasks in the real world without requiring accurate models of the robotor its environment. A practical approach is to train agents in simulation, andthen transfer them to the real world. One popular method for achievingtransferability is to use domain randomisation, which involves randomlyperturbing various aspects of a simulated environment in order to make trainedagents robust to the reality gap. However, less work has gone intounderstanding such agents - which are deployed in the real world - beyond taskperformance. In this work we examine such agents, through qualitative andquantitative comparisons between agents trained with and without visual domainrandomisation. We train agents for Fetch and Jaco robots on a visuomotorcontrol task and evaluate how well they generalise using different testingconditions. Finally, we investigate the internals of the trained agents byusing a suite of interpretability techniques. Our results show that the primaryoutcome of domain randomisation is more robust, entangled representations,accompanied with larger weights with greater spatial structure; moreover, thetypes of changes are heavily influenced by the task setup and presence ofadditional proprioceptive inputs. Additionally, we demonstrate that our domainrandomised agents require higher sample complexity, can overfit and moreheavily rely on recurrent processing. Furthermore, even with an improvedsaliency method introduced in this work, we show that qualitative studies maynot always correspond with quantitative measures, necessitating the combinationof inspection tools in order to provide sufficient insights into the behaviourof trained agents.
Sarrico M, Arulkumaran K, Agostinelli A, et al., 2019, Sample-efficient reinforcement learning with maximum entropy mellowmax episodic control, Publisher: arXiv
Deep networks have enabled reinforcement learning to scale to more complexand challenging domains, but these methods typically require large quantitiesof training data. An alternative is to use sample-efficient episodic controlmethods: neuro-inspired algorithms which use non-/semi-parametric models thatpredict values based on storing and retrieving previously experiencedtransitions. One way to further improve the sample efficiency of theseapproaches is to use more principled exploration strategies. In this work, wetherefore propose maximum entropy mellowmax episodic control (MEMEC), whichsamples actions according to a Boltzmann policy with a state-dependenttemperature. We demonstrate that MEMEC outperforms other uncertainty- andsoftmax-based exploration methods on classic reinforcement learningenvironments and Atari games, achieving both more rapid learning and higherfinal rewards.
Agostinelli A, Arulkumaran K, Sarrico M, et al., 2019, Memory-efficient episodic control reinforcement learning with dynamic online k-means
Recently, neuro-inspired episodic control (EC) methods have been developed toovercome the data-inefficiency of standard deep reinforcement learningapproaches. Using non-/semi-parametric models to estimate the value function,they learn rapidly, retrieving cached values from similar past states. Inrealistic scenarios, with limited resources and noisy data, maintainingmeaningful representations in memory is essential to speed up the learning andavoid catastrophic forgetting. Unfortunately, EC methods have a large space andtime complexity. We investigate different solutions to these problems based onprioritising and ranking stored states, as well as online clusteringtechniques. We also propose a new dynamic online k-means algorithm that is bothcomputationally-efficient and yields significantly better performance atsmaller memory sizes; we validate this approach on classic reinforcementlearning environments and Atari games.
Arulkumaran K, Cully A, Togelius J, 2019, AlphaStar: an evolutionary computation perspective, The Genetic and Evolutionary Computation Conference 2019, Publisher: ACM, Pages: 314-315
In January 2019, DeepMind revealed AlphaStar to the world—thefirst artificial intelligence (AI) system to beat a professional playerat the game of StarCraft II—representing a milestone in the progressof AI. AlphaStar draws on many areas of AI research, includingdeep learning, reinforcement learning, game theory, and evolution-ary computation (EC). In this paper we analyze AlphaStar primar-ily through the lens of EC, presenting a new look at the systemandrelating it to many concepts in the field. We highlight some ofitsmost interesting aspects—the use of Lamarckian evolution,com-petitive co-evolution, and quality diversity. In doing so,we hopeto provide a bridge between the wider EC community and one ofthe most significant AI systems developed in recent times.
Tanno R, Arulkumaran K, Alexander DC, et al., 2019, Adaptive neural trees
Deep neural networks and decision trees operate on largely separateparadigms; typically, the former performs representation learning withpre-specified architectures, while the latter is characterised by learninghierarchies over pre-specified features with data-driven architectures. Weunite the two via adaptive neural trees (ANTs) that incorporates representationlearning into edges, routing functions and leaf nodes of a decision tree, alongwith a backpropagation-based training algorithm that adaptively grows thearchitecture from primitive modules (e.g., convolutional layers). Wedemonstrate that, whilst achieving competitive performance on classificationand regression datasets, ANTs benefit from (i) lightweight inference viaconditional computation, (ii) hierarchical separation of features useful to thetask e.g. learning meaningful class associations, such as separating naturalvs. man-made objects, and (iii) a mechanism to adapt the architecture to thesize and complexity of the training dataset.
Arulkumaran K, Cully A, Togelius J, 2019, AlphaStar : An Evolutionary Computation Perspective
Arulkumaran N, McLaren CS, Arulkumaran K, et al., 2018, An analysis of emergency tracheal intubations in critically ill patients by critical care trainees., J Intensive Care Soc, Vol: 19, Pages: 180-187, ISSN: 1751-1437
Introduction: We evaluated intensive care medicine trainees' practice of emergency intubations in the United Kingdom. Methods: Retrospective analysis of 881 in-hospital emergency intubations over a three-year period using an online trainee logbook. Results: Emergency intubations out-of-hours were less frequent than in-hours, both on weekdays and weekends. Complications occurred in 9% of cases, with no association with time of day/day of week (p = 0.860). Complications were associated with higher Cormack and Lehane grades (p=0.004) and number of intubation attempts (p < 0.001), but not American Society of Anesthesiologist grade. Capnography usage was ≥99% in all locations except in wards (85%; p = 0.001). Ward patients were the oldest (p < 0.001), had higher American Society of Anesthesiologist grades (p < 0.001) and lowest Glasgow Coma Scale (p < 0.001). Conclusions: Complications of intubations are associated with higher Cormack and Lehane grades and number of attempts, but not time of day/day of week. The uptake of capnography is reassuring, although there is scope for improvement on the ward.
Tschiatschek S, Arulkumaran K, Stühmer J, et al., 2018, Variational inference for data-efficient model learning in POMDPs
Partially observable Markov decision processes (POMDPs) are a powerfulabstraction for tasks that require decision making under uncertainty, andcapture a wide range of real world tasks. Today, effective planning approachesexist that generate effective strategies given black-box models of a POMDPtask. Yet, an open question is how to acquire accurate models for complexdomains. In this paper we propose DELIP, an approach to model learning forPOMDPs that utilizes amortized structured variational inference. We empiricallyshow that our model leads to effective control strategies when coupled withstate-of-the-art planners. Intuitively, model-based approaches should beparticularly beneficial in environments with changing reward structures, orwhere rewards are initially unknown. Our experiments confirm that DELIP isparticularly effective in this setting.
Generative adversarial networks (GANs) provide a way to learn deep representations without extensively annotated training data. They achieve this by deriving backpropagation signals through a competitive process involving a pair of networks. The representations that can be learned by GANs may be used in a variety of applications, including image synthesis, semantic image editing, style transfer, image superresolution, and classification. The aim of this review article is to provide an overview of GANs for the signal processing community, drawing on familiar analogies and concepts where possible. In addition to identifying different methods for training and constructing GANs, we also point to remaining challenges in their theory and application.
Deep reinforcement learning (DRL) is poised to revolutionize the field of artificial intelligence (AI) and represents a step toward building autonomous systems with a higherlevel understanding of the visual world. Currently, deep learning is enabling reinforcement learning (RL) to scale to problems that were previously intractable, such as learning to play video games directly from pixels. DRL algorithms are also applied to robotics, allowing control policies for robots to be learned directly from camera inputs in the real world. In this survey, we begin with an introduction to the general field of RL, then progress to the main streams of value-based and policy-based methods. Our survey will cover central algorithms in deep RL, including the deep Q-network (DQN), trust region policy optimization (TRPO), and asynchronous advantage actor critic. In parallel, we highlight the unique advantages of deep neural networks, focusing on visual understanding via RL. To conclude, we describe several current areas of research within the field.
Creswell A, Arulkumaran K, Bharath AA, 2017, On denoising autoencoders trained to minimise binary cross-entropy
Denoising autoencoders (DAEs) are powerful deep learning models used forfeature extraction, data generation and network pre-training. DAEs consist ofan encoder and decoder which may be trained simultaneously to minimise a loss(function) between an input and the reconstruction of a corrupted version ofthe input. There are two common loss functions used for training autoencoders,these include the mean-squared error (MSE) and the binary cross-entropy (BCE).When training autoencoders on image data a natural choice of loss function isBCE, since pixel values may be normalised to take values in [0,1] and thedecoder model may be designed to generate samples that take values in (0,1). Weshow theoretically that DAEs trained to minimise BCE may be used to takegradient steps in the data space towards regions of high probability under thedata-generating distribution. Previously this had only been shown for DAEstrained using MSE. As a consequence of the theory, iterative application of atrained DAE moves a data sample from regions of low probability to regions ofhigher probability under the data-generating distribution. Firstly, we validatethe theory by showing that novel data samples, consistent with the trainingdata, may be synthesised when the initial data samples are random noise.Secondly, we motivate the theory by showing that initial data samplessynthesised via other methods may be improved via iterative application of atrained DAE to those initial samples.
Dilokthanakul N, Mediano PAM, Garnelo M, et al., 2016, Deep unsupervised clustering with Gaussian mixture variational autoencoders
We study a variant of the variational autoencoder model with a Gaussian mixture as a prior distribution, with the goal of performing unsupervised clustering through deep generative models. We observe that the standard variational approach in these models is unsuited for unsupervised clustering, and mitigate this problem by leveraging a principled information-theoretic regularisation term known as consistency violation. Adding this term to the standard variational optimisation objective yields networks with both meaningful internal representations and well-defined clusters. We demonstrate the performance of this scheme on synthetic data, MNIST and SVHN, showing that the obtained clusters are distinct, interpretable and result in achieving higher performance on unsupervised clustering classification than previous approaches.
Rivera-Rubio J, Arulkumaran K, Rishi H, et al., 2016, An assistive haptic interface for appearance-based indoor navigation, Computer Vision and Image Understanding, Vol: 149, Pages: 126-145, ISSN: 1077-3142
Computer vision remains an under-exploited technology for assistive devices. Here, we propose a navigation technique using low-resolution images from wearable or hand-held cameras to identify landmarks that are indicative of a user’s position along crowdsourced paths. We test the components of a system that is able to provide blindfolded users with information about location via tactile feedback. We assess the accuracy of vision-based localisation by making comparisons with estimates of location derived from both a recent SLAM-based algorithm and from indoor surveying equipment. We evaluate the precision and reliability by which location information can be conveyed to human subjects by analysing their ability to infer position from electrostatic feedback in the form of textural (haptic) cues on a tablet device. Finally, we describe a relatively lightweight systems architecture that enables images to be captured and location results to be served back to the haptic device based on journey information from multiple users and devices.
Arulkumaran K, Dilokthanakul N, Shanahan M, et al., 2016, Classifying Options for Deep Reinforcement Learning
In this paper we combine one method for hierarchical reinforcement learning -the options framework - with deep Q-networks (DQNs) through the use ofdifferent "option heads" on the policy network, and a supervisory network forchoosing between the different options. We utilise our setup to investigate theeffects of architectural constraints in subtasks with positive and negativetransfer, across a range of network capacities. We empirically show that ouraugmented DQN has lower sample complexity when simultaneously learning subtaskswith negative transfer, without degrading performance when learning subtaskswith positive transfer.
Arulkumaran K, 2016, FGLab: Machine Learning Dashboard
FGLab is a machine learning dashboard, designed to make prototyping experiments easier. Experiment results are stored in a database, allowing analytics to be performed after their completion. FGLab is designed to be used with existing code in a way that is independent of either programming language or operating system, and scales from a single machine to a cluster. By addressing a practical need over a broad class of machine learning problems, FGLab facilitates—but does not enforce—more methodological research.
Creswell A, Arulkumaran K, Bharath AA, 2016, Improving Sampling from Generative Autoencoders with Markov Chains
We focus on generative autoencoders, such as variational or adversarialautoencoders, which jointly learn a generative model alongside an inferencemodel. Generative autoencoders are those which are trained to softly enforce aprior on the latent distribution learned by the inference model. We call thedistribution to which the inference model maps observed samples, the learnedlatent distribution, which may not be consistent with the prior. We formulate aMarkov chain Monte Carlo (MCMC) sampling process, equivalent to iterativelydecoding and encoding, which allows us to sample from the learned latentdistribution. Since, the generative model learns to map from the learned latentdistribution, rather than the prior, we may use MCMC to improve the quality ofsamples drawn from the generative model, especially when the learned latentdistribution is far from the prior. Using MCMC sampling, we are able to revealpreviously unseen differences between generative autoencoders trained eitherwith or without a denoising criterion.
Garnelo M, Arulkumaran K, Shanahan M, Towards Deep Symbolic Reinforcement Learning
Deep reinforcement learning (DRL) brings the power of deep neural networks tobear on the generic task of trial-and-error learning, and its effectiveness hasbeen convincingly demonstrated on tasks such as Atari video games and the gameof Go. However, contemporary DRL systems inherit a number of shortcomings fromthe current generation of deep learning techniques. For example, they requirevery large datasets to work effectively, entailing that they are slow to learneven when such datasets are available. Moreover, they lack the ability toreason on an abstract level, which makes it difficult to implement high-levelcognitive functions such as transfer learning, analogical reasoning, andhypothesis-based reasoning. Finally, their operation is largely opaque tohumans, rendering them unsuitable for domains in which verifiability isimportant. In this paper, we propose an end-to-end reinforcement learningarchitecture comprising a neural back end and a symbolic front end with thepotential to overcome each of these shortcomings. As proof-of-concept, wepresent a preliminary implementation of the architecture and apply it toseveral variants of a simple video game. We show that the resulting system --though just a prototype -- learns effectively, and, by acquiring a set ofsymbolic rules that are easily comprehensible to humans, dramaticallyoutperforms a conventional, fully neural DRL system on a stochastic variant ofthe game.
This data is extracted from the Web of Science and reproduced under a licence from Thomson Reuters. You may not copy or re-distribute this data in whole or in part without the written consent of the Science business of Thomson Reuters.