Imperial College London

ProfessorMurrayShanahan

Faculty of EngineeringDepartment of Computing

Professor in Cognitive Robotics
 
 
 
//

Contact

 

+44 (0)20 7594 8262m.shanahan Website

 
 
//

Location

 

407BHuxley BuildingSouth Kensington Campus

//

Summary

 

Publications

Publication Type
Year
to

119 results found

Shanahan M, 2024, Talking about large language models, Communications of the ACM, Vol: 67, Pages: 68-79, ISSN: 0001-0782

Interacting with a contemporary LLM-based conversational agent can create an illusion of being in the presence of a thinking creature. Yet, in their very nature, such systems are fundamentally not like us.

Journal article

Shanahan M, McDonell K, Reynolds L, 2023, Role-play with large language models, Nature, Vol: 623, Pages: 493-498, ISSN: 0028-0836

As dialogue agents become increasingly human-like in their performance, we must develop effective ways to describe their behaviour in high-level terms without falling into the trap of anthropomorphism. In this paper, we foreground the concept of role-play. Casting dialogue agent behaviour in terms of role-play allows us to draw on familiar folk psychological terms, without ascribing human characteristics to language models they in fact lack. Two important cases of dialogue agent behaviour are addressed this way, namely (apparent) deception and (apparent) self-awareness.

Journal article

Burnell R, Schellaert W, Burden J, Ullman TD, Martinez-Plumed F, Tenenbaum JB, Rutar D, Cheke LG, Sohl-Dickstein J, Mitchell M, Kiela D, Shanahan M, Voorhees EM, Cohn AG, Leibo JZ, Hernandez-Orallo Jet al., 2023, Rethink reporting of evaluation results in AI Aggregate metrics and lack of access to results limit understanding, SCIENCE, Vol: 380, Pages: 136-138, ISSN: 0036-8075

Journal article

Shanahan M, Mitchell M, 2022, Abstraction for deep reinforcement learning, International Joint Conference on Artificial Intelligence (IJCAI-ECAI 2022), Publisher: IJCAI, Pages: 5588-5596, ISSN: 1045-0823

We characterise the problem of abstraction in thecontext of deep reinforcement learning. Variouswell established approaches to analogical reasoningand associative memory might be brought to bearon this issue, but they present difficulties becauseof the need for end-to-end differentiability. We re-view developments in AI and machine learning thatcould facilitate their adoption.

Conference paper

Fountas Z, Sylaidi A, Nikiforou K, Seth AK, Shanahan M, Roseboom Wet al., 2022, A Predictive Processing Model of Episodic Memory and Time Perception, NEURAL COMPUTATION, Vol: 34, Pages: 1501-1544, ISSN: 0899-7667

Journal article

Voudouris K, Crosby M, Beyret B, Hernandez-Orallo J, Shanahan M, Halina M, Cheke LGet al., 2022, Direct Human-AI Comparison in the Animal-AI Environment, FRONTIERS IN PSYCHOLOGY, Vol: 13, ISSN: 1664-1078

Journal article

León BG, Shanahan M, Belardinelli F, 2022, IN A NUTSHELL, THE HUMAN ASKED FOR THIS: LATENT GOALS FOR FOLLOWING TEMPORAL SPECIFICATIONS

We address the problem of building agents whose goal is to learn to execute out-of distribution (OOD) multi-task instructions expressed in temporal logic (TL) by using deep reinforcement learning (DRL). Recent works provided evidence that the agent's neural architecture is a key feature when DRL agents are learning to solve OOD tasks in TL. Yet, the studies on this topic are still in their infancy. In this work, we propose a new deep learning configuration with inductive biases that lead agents to generate latent representations of their current goal, yielding a stronger generalization performance. We use these latent-goal networks within a neuro-symbolic framework that executes multi-task formally-defined instructions and contrast the performance of the proposed neural networks against employing different state-of-the-art (SOTA) architectures when generalizing to unseen instructions in OOD environments.

Conference paper

Mediano PAM, Rosas FE, Farah JC, Shanahan M, Bor D, Barrett ABet al., 2022, Integrated information as a common signature of dynamical and information-processing complexity, Chaos: an interdisciplinary journal of nonlinear science, Vol: 32, Pages: 1-12, ISSN: 1054-1500

The apparent dichotomy between information-processing and dynamical approaches to complexity science forces researchers to choose between two diverging sets of tools and explanations, creating conflict and often hindering scientific progress. Nonetheless, given the shared theoretical goals between both approaches, it is reasonable to conjecture the existence of underlying common signatures that capture interesting behavior in both dynamical and information-processing systems. Here, we argue that a pragmatic use of integrated information theory (IIT), originally conceived in theoretical neuroscience, can provide a potential unifying framework to study complexity in general multivariate systems. By leveraging metrics put forward by the integrated information decomposition framework, our results reveal that integrated information can effectively capture surprisingly heterogeneous signatures of complexity—including metastability and criticality in networks of coupled oscillators as well as distributed computation and emergent stable particles in cellular automata—without relying on idiosyncratic, ad hoc criteria. These results show how an agnostic use of IIT can provide important steps toward bridging the gap between informational and dynamical approaches to complex systems.Originally conceived within theoretical neuroscience, integrated information theory (IIT) has been rarely used in other fields—such as complex systems or non-linear dynamics—despite the great value it has to offer. In this article, we inspect the basics of IIT, dissociating it from its contentious claims about the nature of consciousness. Relieved of this philosophical burden, IIT presents itself as an appealing formal framework to study complexity in biological or artificial systems, applicable in a wide range of domains. To illustrate this, we present an exploration of integrated information in complex systems and relate it to other notions of complexity commonly used in sys

Journal article

Shanahan M, Mitchell M, 2022, Abstraction for Deep Reinforcement Learning, IJCAI International Joint Conference on Artificial Intelligence, Pages: 5588-5596, ISSN: 1045-0823

We characterise the problem of abstraction in the context of deep reinforcement learning. Various well established approaches to analogical reasoning and associative memory might be brought to bear on this issue, but they present difficulties because of the need for end-to-end differentiability. We review developments in AI and machine learning that could facilitate their adoption.

Journal article

Shanahan M, Crosby M, Beyret B, Cheke Let al., 2020, Artificial Intelligence and the Common Sense of Animals, TRENDS IN COGNITIVE SCIENCES, Vol: 24, Pages: 862-872, ISSN: 1364-6613

Journal article

Fountas Z, Sylaidi A, Nikiforou K, Seth AK, Shanahan M, Roseboom Wet al., 2020, A predictive processing model of episodic memory and time perception

<jats:title>Abstract</jats:title><jats:p>Human perception and experience of time is strongly influenced by ongoing stimulation, memory of past experiences, and required task context. When paying attention to time, time experience seems to expand; when distracted, it seems to contract. When considering time based on memory, the experience may be different than in the moment, exemplified by sayings like “time flies when you’re having fun”. Experience of time also depends on the content of perceptual experience – rapidly changing or complex perceptual scenes seem longer in duration than less dynamic ones. The complexity of interactions between attention, memory, and perceptual stimulation is a likely reason that an overarching theory of time perception has been difficult to achieve. Here, we introduce a model of perceptual processing and episodic memory that makes use of hierarchical predictive coding, short-term plasticity, spatio-temporal attention, and episodic memory formation and recall, and apply this model to the problem of human time perception. In an experiment with ~ 13, 000 human participants we investigated the effects of memory, cognitive load, and stimulus content on duration reports of dynamic natural scenes up to ~ 1 minute long. Using our model to generate duration estimates, we compared human and model performance. Model-based estimates replicated key qualitative biases, including differences by cognitive load (attention), scene type (stimulation), and whether the judgement was made based on current or remembered experience (memory). Our work provides a comprehensive model of human time perception and a foundation for exploring the computational basis of episodic memory within a hierarchical predictive coding framework.</jats:p><jats:sec><jats:title>Author summary</jats:title><jats:p>Experience of the duration of present or past events is a central aspect of human experience, the

Journal article

Shanahan M, Nikiforou K, Creswell A, Kaplanis C, Barrett D, Garnelo Met al., 2020, An explicitly relational neural network architecture

With a view to bridging the gap between deep learning and symbolic AI, we present a novel endto-end neural network architecture that learns to form propositional representations with an explicitly relational structure from raw pixel data. In order to evaluate and analyse the architecture, we introduce a family of simple visual relational reasoning tasks of varying complexity. We show that the proposed architecture, when pre-trained on a curriculum of such tasks, learns to generate reusable representations that better facilitate subsequent learning on previously unseen tasks when compared to a number of baseline architectures. The workings of a successfully trained model are visualised to shed some light on how the architecture functions.

Working paper

Mittal S, Lamb A, Goyal A, Voleti V, Shanahan M, Lajoie G, Mozer M, Bengio Yet al., 2020, Learning to combine top-down and bottom-up signals in recurrent neural networks with attention over modules, Pages: 6928-6942

Robust perception relies on both bottom-up and top-down signals. Bottom-up signals consist of what s directly observed through sensation. Topdown signals consist of beliefs and expectations based on past experience and short-Term memory, such as how the phrase peanut butter and . will be completed. The optimal combination of bottom-up and top-down information remains an open question, but the manner of combination must be dynamic and both context and task dependent. To effectively utilize the wealth of potential top-down information available, and to prevent the cacophony of intermixed signals in a bidirectional architecture, mechanisms are needed to restrict information flow. We explore deep recurrent neural net architectures in which bottom-up and top-down signals are dynamically combined using attention. Modularity of the architecture further restricts the sharing and communication of information. Together, attention and modularity direct information flow, which leads to reliable performance improvements in perceptual and language tasks, and in particular improves robustness to distractions and noisy data. We demonstrate on a variety of benchmarks in language modeling, sequential image classification, video prediction and reinforcement learning that the bidirectional information flow can improve results over strong baselines.

Conference paper

Dilokthanakul N, Kaplanis C, Pawlowski N, Shanahan Met al., 2019, Feature Control as Intrinsic Motivation for Hierarchical Reinforcement Learning, IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, Vol: 30, Pages: 3409-3418, ISSN: 2162-237X

Journal article

Garnelo M, Shanahan M, 2019, Reconciling deep learning with symbolic artificial intelligence: representing objects and relations, Current Opinion in Behavioral Sciences, Vol: 29, Pages: 17-23, ISSN: 2352-1546

In the history of the quest for human-level artificial intelligence, a number of rival paradigms have vied for supremacy. Symbolic artificial intelligence was dominant for much of the 20th century, but currently a connectionist paradigm is in the ascendant, namely machine learning with deep neural networks. However, both paradigms have strengths and weaknesses, and a significant challenge for the field today is to effect a reconciliation. A central tenet of the symbolic paradigm is that intelligence results from the manipulation of abstract compositional representations whose elements stand for objects and relations. If this is correct, then a key objective for deep learning is to develop architectures capable of discovering objects and relations in raw data, and learning how to represent them in ways that are useful for downstream processing. This short review highlights recent progress in this direction.

Journal article

Beyret B, Hernández-Orallo J, Cheke L, Halina M, Shanahan M, Crosby Met al., 2019, The Animal-AI Environment: Training and Testing Animal-Like Artificial Cognition

Recent advances in artificial intelligence have been strongly driven by theuse of game environments for training and evaluating agents. Games are oftenaccessible and versatile, with well-defined state-transitions and goalsallowing for intensive training and experimentation. However, agents trained ina particular environment are usually tested on the same or slightly varieddistributions, and solutions do not necessarily imply any understanding. If wewant AI systems that can model and understand their environment, we needenvironments that explicitly test for this. Inspired by the extensiveliterature on animal cognition, we present an environment that keeps all thepositive elements of standard gaming environments, but is explicitly designedfor the testing of animal-like artificial cognition.

Working paper

Roseboom W, Fountas Z, Nikiforou K, Bhowmik D, Shanahan M, Seth AKet al., 2019, Activity in perceptual classification networks as a basis for human subjective time perception, Nature Communications, Vol: 10, ISSN: 2041-1723

Despite being a fundamental dimension of experience, how the human brain generates the perception of time remains unknown. Here, we provide a novel explanation for how human time perception might be accomplished, based on non-temporal perceptual classification processes. To demonstrate this proposal, we build an artificial neural system centred on a feed-forward image classification network, functionally similar to human visual processing. In this system, input videos of natural scenes drive changes in network activation, and accumulation of salient changes in activation are used to estimate duration. Estimates produced by this system match human reports made about the same videos, replicating key qualitative biases, including differentiating between scenes of walking around a busy city or sitting in a cafe or office. Our approach provides a working model of duration perception from stimulus to estimation and presents a new direction for examining the foundations of this central aspect of human experience.

Journal article

Crosby M, Beyret B, Shanahan M, Hernández-Orallo J, Cheke L, Halina Met al., 2019, The Animal-AI Testbed and Competition, Pages: 164-176

Modern machine learning systems are still lacking in the kind of general intelligence and common sense reasoning found, not only in humans, but across the animal kingdom. Many animals are capable of solving seemingly simple tasks such as inferring object location through object persistence and spatial elimination, and navigating efficiently in out-of-distribution novel environments. Such tasks are difficult for AI, but provide a natural stepping stone towards the goal of more complex human-like general intelligence. The extensive literature on animal cognition provides methodology and experimental paradigms for testing such abilities but, so far, these experiments have not been translated en masse into an AI-friendly setting. We present a new testbed, Animal-AI, first released as part of the Animal-AI Olympics competition at NeurIPS 2019, which is a comprehensive environment and testing paradigm for tasks inspired by animal cognition. In this paper we outline the environment, the testbed, the results of the competition, and discuss the open challenges for building and testing artificial agents capable of the kind of nonverbal common sense reasoning found in many non-human animals.

Conference paper

Shanahan M, 2019, ARTIFICIAL INTELLIGENCE, ROUTLEDGE HANDBOOK OF THE COMPUTATIONAL MIND, Editors: Sprevak, Colombo, Publisher: ROUTLEDGE, Pages: 91-100, ISBN: 978-1-138-18668-2

Book chapter

Kaplanis C, Shanahan M, Clopath C, 2019, Policy Consolidation for Continual Reinforcement Learning, INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 97, Vol: 97, ISSN: 2640-3498

Journal article

Kaplanis C, Shanahan M, Clopath C, 2018, Continual Reinforcement Learning with Complex Synapses, 35th International Conference on Machine Learning (ICML), Publisher: JMLR-JOURNAL MACHINE LEARNING RESEARCH, ISSN: 2640-3498

Conference paper

Garnelo M, Rosenbaum D, Maddison CJ, Ramalho T, Saxton D, Shanahan M, Teh YW, Rezende DJ, Eslami SMAet al., 2018, Conditional Neural Processes, 35th International Conference on Machine Learning (ICML), Publisher: JMLR-JOURNAL MACHINE LEARNING RESEARCH, ISSN: 2640-3498

Conference paper

Dilokthanakul N, Shanahan M, 2018, Deep Reinforcement Learning with Risk-Seeking Exploration, 15th International Conference on the Simulation of Adaptive Behavior (SAB), Publisher: SPRINGER INTERNATIONAL PUBLISHING AG, Pages: 201-211, ISSN: 0302-9743

Conference paper

Fountas Z, Shanahan M, 2017, The role of cortical oscillations in a spiking neural network model of the basal ganglia., PLoS ONE, Vol: 12, ISSN: 1932-6203

Although brain oscillations involving the basal ganglia (BG) have been the target of extensive research, the main focus lies disproportionally on oscillations generated within the BG circuit rather than other sources, such as cortical areas. We remedy this here by investigating the influence of various cortical frequency bands on the intrinsic effective connectivity of the BG, as well as the role of the latter in regulating cortical behaviour. To do this, we construct a detailed neural model of the complete BG circuit based on fine-tuned spiking neurons, with both electrical and chemical synapses as well as short-term plasticity between structures. As a measure of effective connectivity, we estimate information transfer between nuclei by means of transfer entropy. Our model successfully reproduces firing and oscillatory behaviour found in both the healthy and Parkinsonian BG. We found that, indeed, effective connectivity changes dramatically for different cortical frequency bands and phase offsets, which are able to modulate (or even block) information flow in the three major BG pathways. In particular, alpha (8-12Hz) and beta (13-30Hz) oscillations activate the direct BG pathway, and favour the modulation of the indirect and hyper-direct pathways via the subthalamic nucleus-globus pallidus loop. In contrast, gamma (30-90Hz) frequencies block the information flow from the cortex completely through activation of the indirect pathway. Finally, below alpha, all pathways decay gradually and the system gives rise to spontaneous activity generated in the globus pallidus. Our results indicate the existence of a multimodal gating mechanism at the level of the BG that can be entirely controlled by cortical oscillations, and provide evidence for the hypothesis of cortically-entrained but locally-generated subthalamic beta activity. These two findings suggest new insights into the pathophysiology of specific BG disorders.

Journal article

Fountas Z, Shanahan M, 2017, Assessing Selectivity in the Basal Ganglia: The “Gearbox” Hypothesis

<jats:title>Abstract</jats:title><jats:p>Despite experimental evidence, the literature so far contains no systematic attempt to address the impact of cortical oscillations on the ability of the basal ganglia (BG) to select. In this study, we employed a state-of-the-art spiking neural model of the BG circuitry and investigated the effectiveness of this circuitry as an action selection device. We found that cortical frequency, phase, dopamine and the examined time scale, all have a very important impact on this process. Our simulations resulted in a canonical profile of selectivity, termed selectivity portraits, which suggests that the cortex is the structure that determines whether selection will be performed in the BG and what strategy will be utilized. Some frequency ranges promote the exploitation of highly salient actions, others promote the exploration of alternative options, while the remaining frequencies halt the selection process. Based on this behaviour, we propose that the BG circuitry can be viewed as the “gearbox” of action selection. Coalitions of rhythmic cortical areas are able to switch between a repertoire of available BG modes which, in turn, change the course of information flow within the cortico-BG-thalamo-cortical loop. Dopamine, akin to “control pedals”, either stops or initiates a decision, while cortical frequencies, as a “gear lever”, determine whether a decision can be triggered and what type of decision this will be. Finally, we identified a selection cycle with a period of around 200ms, which was used to assess the biological plausibility of the popular cognitive architectures.</jats:p><jats:sec><jats:title>Author summary</jats:title><jats:p>Our brains are continuously called to select the most appropriate action between alternative competing choices. A plethora of evidence and theoretical work indicates that a fundamental brain region called the basal ga

Working paper

Tax T, Martinez Mediano PA, Shanahan M, 2017, The Partial Information Decomposition of GenerativeNeural Network Models, Entropy, Vol: 19, ISSN: 1099-4300

In this work we study the distributed representations learnt by generative neural network models. In particular, we investigate the properties of redundant and synergistic information that groups of hidden neurons contain about the target variable. To this end, we use an emerging branch of information theory called partial information decomposition (PID) and track the informational properties of the neurons through training. We find two differentiated phases during the training process: a first short phase in which the neurons learn redundant information about the target, and a second phase in which neurons start specialising and each of them learns unique information about the target. We also find that in smaller networks individual neurons learn more specific information about certain features of the input, suggesting that learning pressure can encourage disentangled representations.

Journal article

Nikiforou K, Mediano PAM, Shanahan M, 2017, An Investigation of the Dynamical Transitions in Harmonically Driven Random Networks of Firing-Rate Neurons, Cognitive Computation, Vol: 9, Pages: 351-363, ISSN: 1866-9956

Continuous-time recurrent neural networks are widely used as models of neural dynamics and also have applications in machine learning. But their dynamics are not yet well understood, especially when they are driven by external stimuli. In this article, we study the response of stable and unstable networks to different harmonically oscillating stimuli by varying a parameter ρ, the ratio between the timescale of the network and the stimulus, and use the dimensionality of the network’s attractor as an estimate of the complexity of this response. Additionally, we propose a novel technique for exploring the stationary points and locally linear dynamics of these networks in order to understand the origin of input-dependent dynamical transitions. Attractors in both stable and unstable networks show a peak in dimensionality for intermediate values of ρ, with the latter consistently showing a higher dimensionality than the former, which exhibit a resonance-like phenomenon. We explain changes in the dimensionality of a network’s dynamics in terms of changes in the underlying structure of its vector field by analysing stationary points. Furthermore, we uncover the coexistence of underlying attractors with various geometric forms in unstable networks. As ρ is increased, our visualisation technique shows the network passing through a series of phase transitions with its trajectory taking on a sequence of qualitatively distinct figure-of-eight, cylinder, and spiral shapes. These findings bring us one step closer to a comprehensive theory of this important class of neural networks by revealing the subtle structure of their dynamics under different conditions.

Journal article

Dilokthanakul N, Mediano PAM, Garnelo M, Lee MCH, Salimbeni H, Arulkumaran K, Shanahan Met al., 2016, Deep unsupervised clustering with Gaussian mixture variational autoencoders

We study a variant of the variational autoencoder model with a Gaussian mixture as a prior distribution, with the goal of performing unsupervised clustering through deep generative models. We observe that the standard variational approach in these models is unsuited for unsupervised clustering, and mitigate this problem by leveraging a principled information-theoretic regularisation term known as consistency violation. Adding this term to the standard variational optimisation objective yields networks with both meaningful internal representations and well-defined clusters. We demonstrate the performance of this scheme on synthetic data, MNIST and SVHN, showing that the obtained clusters are distinct, interpretable and result in achieving higher performance on unsupervised clustering classification than previous approaches.

Working paper

Garnelo M, Arulkumaran K, Shanahan M, 2016, Towards Deep Symbolic Reinforcement Learning

Deep reinforcement learning (DRL) brings the power of deep neural networks tobear on the generic task of trial-and-error learning, and its effectiveness hasbeen convincingly demonstrated on tasks such as Atari video games and the gameof Go. However, contemporary DRL systems inherit a number of shortcomings fromthe current generation of deep learning techniques. For example, they requirevery large datasets to work effectively, entailing that they are slow to learneven when such datasets are available. Moreover, they lack the ability toreason on an abstract level, which makes it difficult to implement high-levelcognitive functions such as transfer learning, analogical reasoning, andhypothesis-based reasoning. Finally, their operation is largely opaque tohumans, rendering them unsuitable for domains in which verifiability isimportant. In this paper, we propose an end-to-end reinforcement learningarchitecture comprising a neural back end and a symbolic front end with thepotential to overcome each of these shortcomings. As proof-of-concept, wepresent a preliminary implementation of the architecture and apply it toseveral variants of a simple video game. We show that the resulting system --though just a prototype -- learns effectively, and, by acquiring a set ofsymbolic rules that are easily comprehensible to humans, dramaticallyoutperforms a conventional, fully neural DRL system on a stochastic variant ofthe game.

Working paper

Arulkumaran K, Dilokthanakul N, Shanahan M, Bharath AAet al., 2016, Classifying options for deep reinforcement learning, Publisher: IJCAI

Deep reinforcement learning is the learning of multiple levels ofhierarchical representations for reinforcement learning. Hierarchicalreinforcement learning focuses on temporal abstractions in planning andlearning, allowing temporally-extended actions to be transferred between tasks.In this paper we combine one method for hierarchical reinforcement learning -the options framework - with deep Q-networks (DQNs) through the use ofdifferent "option heads" on the policy network, and a supervisory network forchoosing between the different options. We show that in a domain where we haveprior knowledge of the mapping between states and options, our augmented DQNachieves a policy competitive with that of a standard DQN, but with much lowersample complexity. This is achieved through a straightforward architecturaladjustment to the DQN, as well as an additional supervised neural network.

Working paper

This data is extracted from the Web of Science and reproduced under a licence from Thomson Reuters. You may not copy or re-distribute this data in whole or in part without the written consent of the Science business of Thomson Reuters.

Request URL: http://wlsprd.imperial.ac.uk:80/respub/WEB-INF/jsp/search-html.jsp Request URI: /respub/WEB-INF/jsp/search-html.jsp Query String: respub-action=search.html&id=00169428&limit=30&person=true