IPALM: Interactive Perception-Action-Learning for Modelling Objects

This is a collaborative project of leading experts from Imperial College London, University of Bordeaux (France), University of Aalto (Finland), Czech Technical University (Czechia), and Institut de Robòtica i Informàtica Industrial (Spain).

In IPALM, we develop methods for the automatic digitization of objects and their physical properties by exploratory manipulations. These methods are used to build a large collection of object models required for realistic grasping and manipulation experiments in robotics. Household objects such as tools, kitchenware, clothes, and food items are in focus of many potential applications of robotics, however they still pose great challenges for robot object perception and manipulation in realistic scenarios. We therefore advance the state of the art by considering household objects that can be deformable, articulated, interactive, specular or transparent, or without a strict geometry such as cloth and food items. Our methods learn the physical properties essential for perception and grasping simultaneously from different modalities: vision, touch, audio as well as text documents. These properties include: 3D model, surface properties such as friction and texture, elasticity, weight, size, together with the grasping techniques for intended use. At the core of our approach is a two-level modelling, where a category level model provides priors for capturing instance level attributes of specific objects. We build the category-level prior models by exploiting online available resources. A perception-action-learning loop then use the robot’s vision, audio, and tactile senses to model instance-level object properties, guided by the more general category-level model. In return, knowledge acquired from a new instance is used to improve the category-level knowledge. Our approach will allow us to efficiently create a large database of models for objects of diverse types, which will be suitable for example for training neural network based methods or enhancing existing simulators. We also propose a benchmark and evaluation metrics for object manipulation, to enable comparisons of results generated with various robotics platforms on our database.

More information can be found at the IPALM website

FACER2VM: Face Matching for Automatic Identity Retrieval, Recognition, Verification and Management

FACER2VM develops unconstrained face recognition technology for a broad spectrum of applications. The approach adopted endeavour to devise novel machine learning solutions, which combine the technique of deep learning with sophisticated prior information conveyed by 3D face models. The goal of the programme is to advance the science of machine face perception and to deliver step change in face matching technology to enable automatic retrieval, recognition, verification and management of faces in images and video.

This will be achieved by addressing the challenging problem posed by face appearance variations introduced by a range of natural and image degradation phenomena such as change of viewpoint, illumination, expression, resolution, blur and occlusion, by bringing together leading experts and their research teams at the University of Surrey, Imperial College London, and the University of Stirling.

More information can be found at the FACER2VM website.

VISEN: Visual Sense - tagging visual data with semantic descriptions

Today a typical web document will contain a mix of visual and textual content. Most traditional tools for search and retrieval can successfully handle textual content, but are not prepared to handle heterogeneous documents. The new type of content demands the development of new efficient tools for search and retrieval. The ViSen project aims at mining automatically the semantic content of visual data to enable “machine reading” of images. In recent years, we have witnessed significant advances in the automatic recognition of visual concepts (VCR). These advances allowed for the creation of systems that can automatically generate keyword-based image annotations. The goal of this project is to move a step forward and predict semantic image representations that can be used to generate more informative sentence-based image annotations. Thus, facilitating search and browsing of large multi-modal collections. More specifically, the project targets three case studies, namely image annotation, re-ranking for image search, and automatic image illustration of articles. The ViSen project will address the following key open research challenges:

  • To develop methods that can predict a semantic representation of visual content. This representation will go beyond the detection of objects and scenes and will also recognise a wide range of object relations.
  • To extend state-of-the-art natural language techniques to the tasks of mining large collections of multi-modal documents and generating image captions using both semantic representations of visual content and object/scene type models derived from semantic representations of the multi-modal documents.
  • To develop learning algorithms that can exploit available multi-modal data to discover mappings between visual and textual content. These algorithms should be able to leverage ‘weakly’ annotated data and be robust to large amounts of noise.

More information can be found at the ViSen project website.