Search or filter publications

Filter by type:

Filter by publication type

Filter by year:



  • Showing results for:
  • Reset all filters

Search results

    Saputra RP, Kormushev P, 2018,

    ResQbot: A Mobile Rescue Robot with Immersive Teleperception for Casualty Extraction

    Tavakoli A, Pardo F, Kormushev P, 2018,

    Action Branching Architectures for Deep Reinforcement Learning

    Discrete-action algorithms have been central to numerous recent successes ofdeep reinforcement learning. However, applying these algorithms tohigh-dimensional action tasks requires tackling the combinatorial increase ofthe number of possible actions with the number of action dimensions. Thisproblem is further exacerbated for continuous-action tasks that require finecontrol of actions via discretization. In this paper, we propose a novel neuralarchitecture featuring a shared decision module followed by several networkbranches, one for each action dimension. This approach achieves a linearincrease of the number of network outputs with the number of degrees of freedomby allowing a level of independence for each individual action dimension. Toillustrate the approach, we present a novel agent, called Branching DuelingQ-Network (BDQ), as a branching variant of the Dueling Double Deep Q-Network(Dueling DDQN). We evaluate the performance of our agent on a set ofchallenging continuous control tasks. The empirical results show that theproposed agent scales gracefully to environments with increasing actiondimensionality and indicate the significance of the shared decision module incoordination of the distributed action branches. Furthermore, we show that theproposed agent performs competitively against a state-of-the-art continuouscontrol algorithm, Deep Deterministic Policy Gradient (DDPG).

    Wang K, Shah A, Kormushev P, 2018,

    SLIDER: A Bipedal Robot with Knee-less Legs and Vertical Hip Sliding Motion

    Wang K, Shah A, Kormushev P, 2018,

    SLIDER: A Novel Bipedal Walking Robot without Knees

    Arulkumaran K, Deisenroth MP, Brundage M, Bharath AAet al., 2017,

    A brief survey of deep reinforcement learning

    , IEEE Signal Processing Magazine, Vol: 34, Pages: 26-38, ISSN: 1053-5888

    Deep reinforcement learning (DRL) is poised to revolutionize the field of artificial intelligence (AI) and represents a step toward building autonomous systems with a higherlevel understanding of the visual world. Currently, deep learning is enabling reinforcement learning (RL) to scale to problems that were previously intractable, such as learning to play video games directly from pixels. DRL algorithms are also applied to robotics, allowing control policies for robots to be learned directly from camera inputs in the real world. In this survey, we begin with an introduction to the general field of RL, then progress to the main streams of value-based and policy-based methods. Our survey will cover central algorithms in deep RL, including the deep Q-network (DQN), trust region policy optimization (TRPO), and asynchronous advantage actor critic. In parallel, we highlight the unique advantages of deep neural networks, focusing on visual understanding via RL. To conclude, we describe several current areas of research within the field.

    Chamberlain BP, Cardoso Â, Bryan Liu CH, Pagliari R, Deisenroth MPet al., 2017,

    Customer lifetime value prediction using embeddings

    , Pages: 1753-1762

    © 2017 Copyright held by the owner/author(s). We describe the Customer LifeTime Value (CLTV) prediction system deployed at, a global online fashion retailer. CLTV prediction is an important problem in e-commerce where an accurate estimate of future value allows retailers to effectively allocate marketing spend, identify and nurture high value customers and mitigate exposure to losses. The system at ASOS provides daily estimates of the future value of every customer and is one of the cornerstones of the personalised shopping experience. The state of the art in this domain uses large numbers of handcrafted features and ensemble regressors to forecast value, predict churn and evaluate customer loyalty. Recently, domains including language, vision and speech have shown dramatic advances by replacing handcrafted features with features that are learned automatically from data. We detail the system deployed at ASOS and show that learning feature representations is a promising extension to the state of the art in CLTV modelling. We propose a novel way to generate embeddings of customers, which addresses the issue of the ever changing product catalogue and obtain a significant improvement over an exhaustive set of handcrafted features.

    Chamberlain BP, Humby C, Deisenroth MP, 2017,

    Probabilistic Inference of Twitter Users' Age Based on What They Follow.

    , Publisher: Springer, Pages: 191-203
    Eleftheriadis S, Rudovic O, Deisenroth MP, Pantic Met al., 2017,

    Variational Gaussian Process Auto-Encoder for Ordinal Prediction of Facial Action Units

    , 13th Asian Conference on Computer Vision (ACCV), Publisher: SPRINGER INTERNATIONAL PUBLISHING AG, Pages: 154-170, ISSN: 0302-9743
    Eleftheriadis S, Rudovic O, Deisenroth MP, Pantic Met al., 2017,

    Gaussian Process Domain Experts for Modeling of Facial Affect

    , IEEE TRANSACTIONS ON IMAGE PROCESSING, Vol: 26, Pages: 4697-4711, ISSN: 1057-7149
    Huang R, Lattimore T, György A, Szepesvári Cet al., 2017,

    Following the leader and fast rates in online linear prediction: Curved constraint sets and other regularities

    , ISSN: 1532-4435

    © 2017 Ruitong Huang, Tor Lattimore, András György, and Csaba Szepesvári. Follow the leader (FTL) is a simple online learning algorithm that is known to perform well when the loss functions are convex and positively curved. In this paper we ask whether there are other settings when FTL achieves low regret. In particular, we study the fundamental problem of linear prediction over a convex, compact domain with non-empty interior. Amongst other results, we prove that the curvature of the boundary of the domain can act as if the losses were curved: In this case, we prove that as long as the mean of the loss vectors have positive lengths bounded away from zero, FTL enjoys logarithmic regret, while for polytope domains and stochastic data it enjoys finite expected regret. The former result is also extended to strongly convex domains by establishing an equivalence between the strong convexity of sets and the minimum curvature of their boundary, which may be of independent interest. Building on a previously known meta-algorithm, we also get an algorithm that simultaneously enjoys the worst-case guarantees and the smaller regret of FTL when the data is 'easy'. Finally, we show that such guarantees are achievable directly (e.g., by the follow the regularized leader algorithm or by a shrinkage-based variant of FTL) when the constraint set is an ellipsoid.

This data is extracted from the Web of Science and reproduced under a licence from Thomson Reuters. You may not copy or re-distribute this data in whole or in part without the written consent of the Science business of Thomson Reuters.

Request URL: Request URI: /respub/WEB-INF/jsp/search-t4-html.jsp Query String: id=954&limit=10&page=2&respub-action=search.html Current Millis: 1534867048149 Current Time: Tue Aug 21 16:57:28 BST 2018