Imperial College London

DrViktoriiaSharmanska

Faculty of EngineeringDepartment of Computing

Honorary Lecturer
 
 
 
//

Contact

 

+44 (0)20 7594 8241sharmanska.v Website

 
 
//

Location

 

452Huxley BuildingSouth Kensington Campus

//

Summary

 

Publications

Publication Type
Year
to

29 results found

Kollias D, Sharmanska V, Zafeiriou S, 2024, Distribution Matching for Multi-Task Learning of Classification Tasks: A Large-Scale Study on Faces & Beyond, Pages: 2813-2821, ISSN: 2159-5399

Multi-Task Learning (MTL) is a framework, where multiple related tasks are learned jointly and benefit from a shared representation space, or parameter transfer. To provide sufficient learning support, modern MTL uses annotated data with full, or sufficiently large overlap across tasks, i.e., each input sample is annotated for all, or most of the tasks. However, collecting such annotations is prohibitive in many real applications, and cannot benefit from datasets available for individual tasks. In this work, we challenge this setup and show that MTL can be successful with classification tasks with little, or non-overlapping annotations, or when there is big discrepancy in the size of labeled data per task. We explore task-relatedness for co-annotation and co-training, and propose a novel approach, where knowledge exchange is enabled between the tasks via distribution matching. To demonstrate the general applicability of our method, we conducted diverse case studies in the domains of affective computing, face recognition, species recognition, and shopping item classification using nine datasets. Our large-scale study of affective tasks for basic expression recognition and facial action unit detection illustrates that our approach is network agnostic and brings large performance improvements compared to the state-of-the-art in both tasks and across all studied databases. In all case studies, we show that co-training via task-relatedness is advantageous and prevents negative transfer (which occurs when MT model's performance is worse than that of at least one single-task model).

Conference paper

Ushenko N, Metelytsia V, Lytovchenko I, Yermolaieva M, Sharmanska V, Klopov Iet al., 2023, Development of digital infrastructure and blockchain in Ukraine, Naukovyi Visnyk Natsionalnoho Hirnychoho Universytetu, Pages: 162-168, ISSN: 2071-2227

<jats:p>Purpose.To determine the role of digital infrastructure in the context of the digital transformation of Ukraine and to analyze the possibilities of applying blockchain technologies. Methodology. To achieve the set goal, various research methods, including analysis and synthesis for illuminating the economic essence of digital infrastructure, were employed as well as statistical methods for analyzing global trends in information and communication technology development. Inductive, deductive, and abstract-logical methods were used to support the conclusions. Findings. The research results encompass several significant findings. Firstly, various definitions of the digital economy were analyzed, leading to the proposal of an original definition that takes into account the peculiarities of the Ukrainian context and aligns with contemporary trends in digital technology development. Key sectors of economic activity were highlighted where the implementation of digital technologies holds the greatest potential within the context of digital transformation. Special attention was given to sectors where the use of digital tools can have a decisive impact on the development and competitiveness of enterprises. The authors emphasized strategic tasks and instruments that would facilitate the creation of a conducive environment for the development of the digital economy in Ukraine. Additionally, the essence of blockchain technology was studied, and potential areas of its application in Ukraine were discussed. Significant focus was placed on aspects of ensuring cybersecurity and data protection, which are critical in the context of blockchain utilization. Originality. The introduced original definition of the digital economy places a primary emphasis on the implementation and actual utilization of digital technologies across various spheres of human activity. A comprehensive set of measures for the development of digital infrastructure in Ukraine was proposed, incl

Journal article

Metzler AB, Nathvani R, Sharmanska V, Bai W, Muller E, Moulds S, Agyei-Asabere C, Adjei-Boadih D, Kyere-Gyeabour E, Tetteh JD, Owusu G, Agyei-Mensah S, Baumgartner J, Robinson BE, Arku RE, Ezzati Met al., 2023, Phenotyping urban built and natural environments with high-resolution satellite images and unsupervised deep learning, Science of the Total Environment, Vol: 893, Pages: 1-14, ISSN: 0048-9697

Cities in the developing world are expanding rapidly, and undergoing changes to their roads, buildings, vegetation, and other land use characteristics. Timely data are needed to ensure that urban change enhances health, wellbeing and sustainability. We present and evaluate a novel unsupervised deep clustering method to classify and characterise the complex and multidimensional built and natural environments of cities into interpretable clusters using high-resolution satellite images. We applied our approach to a high-resolution (0.3 m/pixel) satellite image of Accra, Ghana, one of the fastest growing cities in sub-Saharan Africa, and contextualised the results with demographic and environmental data that were not used for clustering. We show that clusters obtained solely from images capture distinct interpretable phenotypes of the urban natural (vegetation and water) and built (building count, size, density, and orientation; length and arrangement of roads) environment, and population, either as a unique defining characteristic (e.g., bodies of water or dense vegetation) or in combination (e.g., buildings surrounded by vegetation or sparsely populated areas intermixed with roads). Clusters that were based on a single defining characteristic were robust to the spatial scale of analysis and the choice of cluster number, whereas those based on a combination of characteristics changed based on scale and number of clusters. The results demonstrate that satellite data and unsupervised deep learning provide a cost-effective, interpretable and scalable approach for real-time tracking of sustainable urban development, especially where traditional environmental and demographic data are limited and infrequent.

Journal article

Doukas MC, Ververas E, Sharmanska V, Zafeiriou Set al., 2023, Free-HeadGAN: Neural Talking Head Synthesis With Explicit Gaze Control, IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, Vol: 45, Pages: 9743-9756, ISSN: 0162-8828

Journal article

Metzler AB, Nathvani R, Sharmanska V, Bai W, Muller E, Moulds S, Agyei-Asabere C, Adjei-Boadi D, Kyere-Gyeabour E, Tetteh JD, Owusu G, Agyei-Mensah S, Baumgartner J, Robinson BE, Arku RE, Ezzati Met al., 2022, Characterization of urban built and natural environments with high-resolution satellite images and unsupervised deep learning, ISEE Conference Abstracts, Vol: 2022, ISSN: 1078-0475

Journal article

Martyniuk T, Kupyn O, Kurlyak Y, Krashenyi I, Matas J, Sharmanska Vet al., 2022, DAD-3DHeads: A Large-scale Dense, Accurate and Diverse Dataset for 3D Head Alignment from a Single Image, 2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2022), Pages: 20910-20920, ISSN: 1063-6919

Journal article

Romiti S, Inskip C, Sharmanska V, Quadrianto Net al., 2022, RealPatch: A Statistical Matching Framework for Model Patching with Real Samples, COMPUTER VISION, ECCV 2022, PT XXV, Vol: 13685, Pages: 146-162, ISSN: 0302-9743

Journal article

Kollias D, Sharmanska V, Zafeiriou S, 2021, Face behavior a la carte: expressions, affect and action units in a single Network, Publisher: arXiv

Automatic facial behavior analysis has a long history of studies in theintersection of computer vision, physiology and psychology. However it is onlyrecently, with the collection of large-scale datasets and powerful machinelearning methods such as deep neural networks, that automatic facial behavioranalysis started to thrive. Three of its iconic tasks are automatic recognitionof basic expressions (e.g. happy, sad, surprised), estimation of continuousemotions (e.g., valence and arousal), and detection of facial action units(activations of e.g. upper/inner eyebrows, nose wrinkles). Up until now thesetasks have been mostly studied independently collecting a dataset for the task.We present the first and the largest study of all facial behaviour taskslearned jointly in a single multi-task, multi-domain and multi-label network,which we call FaceBehaviorNet. For this we utilize all publicly availabledatasets in the community (around 5M images) that study facial behaviour tasksin-the-wild. We demonstrate that training jointly an end-to-end network for alltasks has consistently better performance than training each of the single-tasknetworks. Furthermore, we propose two simple strategies for coupling the tasksduring training, co-annotation and distribution matching, and show theadvantages of this approach. Finally we show that FaceBehaviorNet has learnedfeatures that encapsulate all aspects of facial behaviour, and can besuccessfully applied to perform tasks (compound emotion recognition) beyond theones that it has been trained in a zero- and few-shot learning setting.

Working paper

Kollias D, Sharmanska V, Zafeiriou S, 2021, Distribution matching for heterogeneous multi-task learning: a large-scale face study, Publisher: arXiv

Multi-Task Learning has emerged as a methodology in which multiple tasks arejointly learned by a shared learning algorithm, such as a DNN. MTL is based onthe assumption that the tasks under consideration are related; therefore itexploits shared knowledge for improving performance on each individual task.Tasks are generally considered to be homogeneous, i.e., to refer to the sametype of problem. Moreover, MTL is usually based on ground truth annotationswith full, or partial overlap across tasks. In this work, we deal withheterogeneous MTL, simultaneously addressing detection, classification &regression problems. We explore task-relatedness as a means for co-training, ina weakly-supervised way, tasks that contain little, or even non-overlappingannotations. Task-relatedness is introduced in MTL, either explicitly throughprior expert knowledge, or through data-driven studies. We propose a noveldistribution matching approach, in which knowledge exchange is enabled betweentasks, via matching of their predictions' distributions. Based on thisapproach, we build FaceBehaviorNet, the first framework for large-scale faceanalysis, by jointly learning all facial behavior tasks. We develop casestudies for: i) continuous affect estimation, action unit detection, basicemotion recognition; ii) attribute detection, face identification. We illustrate that co-training via task relatedness alleviates negativetransfer. Since FaceBehaviorNet learns features that encapsulate all aspects offacial behavior, we conduct zero-/few-shot learning to perform tasks beyond theones that it has been trained for, such as compound emotion recognition. Byconducting a very large experimental study, utilizing 10 databases, weillustrate that our approach outperforms, by large margins, thestate-of-the-art in all tasks and in all databases, even in these which havenot been used in its training.

Working paper

Sharmanska V, Hendricks LA, Darrell T, Quadrianto Net al., 2021, Contrastive examples for addressing the tyranny of the majority

Computer vision algorithms, e.g. for face recognition, favour groups ofindividuals that are better represented in the training data. This happensbecause of the generalization that classifiers have to make. It is simpler tofit the majority groups as this fit is more important to overall error. Wepropose to create a balanced training dataset, consisting of the originaldataset plus new data points in which the group memberships are intervened,minorities become majorities and vice versa. We show that current generativeadversarial networks are a powerful tool for learning these data points, calledcontrastive examples. We experiment with the equalized odds bias measure ontabular data as well as image data (CelebA and Diversity in Faces datasets).Contrastive examples allow us to expose correlations between group membershipand other seemingly neutral features. Whenever a causal graph is available, wecan put those contrastive examples in the perspective of counterfactuals.

Working paper

Doukas MC, Koujan MR, Sharmanska V, Roussos A, Zafeiriou Set al., 2021, Head2Head++: deep facial attributes re-targeting, IEEE Transactions on Biometrics, Behavior, and Identity Science, Vol: 3, Pages: 31-43, ISSN: 2637-6407

Facial video re-targeting is a challenging problem aiming to modify the facial attributes of a target subject in a seamless manner by a driving monocular sequence. We leverage the 3D geometry of faces and Generative Adversarial Networks (GANs) to design a novel deep learning architecture for the task of facial and head reenactment. Our method is different to purely 3D model-based approaches, or recent image-based methods that use Deep Convolutional Neural Networks (DCNNs) to generate individual frames. We manage to capture the complex non-rigid facial motion from the driving monocular performances and synthesise temporally consistent videos, with the aid of a sequential Generator and an ad-hoc Dynamics Discriminator network. We conduct a comprehensive set of quantitative and qualitative tests and demonstrate experimentally that our proposed method can successfully transfer facial expressions, head pose and eye gaze from a source video to a target subject, in a photo-realistic and faithful fashion, better than other state-of-the-art methods. Most importantly, our system performs end-to-end reenactment in nearly real-time speed (18 fps).

Journal article

Doukas MC, Zafeiriou S, Sharmanska V, 2021, HeadGAN: One-shot Neural Head Synthesis and Editing, 2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2021), Pages: 14378-14387

Journal article

Truong AH, Sharmanska V, Limback-Stanic C, Grech-Sollars Met al., 2020, Optimisation of deep learning methods for visualisation of tumour heterogeneity and brain tumour grading through digital pathology, Neuro-Oncology Advances, Vol: 2, ISSN: 2632-2498

BackgroundVariations in prognosis and treatment options for gliomas are dependent on tumour grading. When tissue is available for analysis, grade is established based on histological criteria. However, histopathological diagnosis is not always reliable or straight-forward due to tumour heterogeneity, sampling error and subjectivity, and hence there is great inter-observer variability in readings.MethodsWe trained convolutional neural network models to classify digital whole-slide histopathology images from The Cancer Genome Atlas. We tested a number of optimisation parameters.ResultsData augmentation did not improve model training, while smaller batch size helped to prevent overfitting and led to improved model performance. There was no significant difference in performance between a modular 2-class model and a single 3-class model system. The best models trained achieved a mean accuracy of 73% in classifying glioblastoma from other grades, and 53% between WHO grade II and III gliomas. A visualisation method was developed to convey the model output in a clinically relevant manner by overlaying colour-coded predictions over the original whole slide image.ConclusionsOur developed visualisation method reflects the clinical decision-making process by highlighting the intra-tumour heterogeneity and may be used in clinical setting to aid diagnosis. Explainable AI techniques may allow further evaluation of the model and underline areas for improvements such as biases. Due to intra-tumour heterogeneity, data annotation for training was imprecise, and hence performance was lower than expected. The models may be further improved by employing advanced data augmentation strategies and using more precise semi-automatic or manually labelled training data.

Journal article

Quadrianto N, Sharmanska V, Thomas O, 2020, Discovering fair representations in the data domain, 2019 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Publisher: IEEE, Pages: 8219-8228, ISSN: 2575-7075

Interpretability and fairness are critical in computer vi-sion and machine learning applications, in particular whendealing with human outcomes, e.g. inviting or not invitingfor a job interview based on application materials that mayinclude photographs. One promising direction to achievefairness is by learning data representations that removethe semantics of protected characteristics, and are there-fore able to mitigate unfair outcomes. All available modelshowever learn latent embeddings which comes at the costof being uninterpretable. We propose to cast this problemas data-to-data translation, i.e. learning a mapping froman input domain to a fair target domain, where a fairnessdefinition is being enforced. Here the data domain can beimages, or any tabular data representation. This task wouldbe straightforward if we had fair target data available, butthis is not the case. To overcome this, we learn a highlyunconstrained mapping by exploiting statistics of residuals– the difference between input data and its translated ver-sion – and the protected characteristics. When applied tothe CelebA dataset of face images with gender attribute asthe protected characteristic, our model enforces equality ofopportunity by adjusting the eyes and lips regions. Intrigu-ingly, on the same dataset we arrive at similar conclusionswhen using semantic attribute representations of images fortranslation. On face images of the recent DiF dataset, withthe same gender attribute, our method adjusts nose regions.In the Adult income dataset, also with protected genderattribute, our model achieves equality of opportunity by,among others, obfuscating the wife and husband relation-ship. Analyzing those systematic changes will allow us toscrutinize the interplay of fairness criterion, chosen pro-tected characteristics, and prediction performance.

Conference paper

Doukas MC, Sharmanska V, Zafeiriou S, 2019, Video-to-Video Translation for Visual Speech Synthesis, arXiv

Despite remarkable success in image-to-image translation that celebrates theadvancements of generative adversarial networks (GANs), very limited attemptsare known for video domain translation. We study the task of video-to-videotranslation in the context of visual speech generation, where the goal is totransform an input video of any spoken word to an output video of a differentword. This is a multi-domain translation, where each word forms a domain ofvideos uttering this word. Adaptation of the state-of-the-art image-to-imagetranslation model (StarGAN) to this setting falls short with a large vocabularysize. Instead we propose to use character encodings of the words and design anovel character-based GANs architecture for video-to-video translation calledVisual Speech GAN (ViSpGAN). We are the first to demonstrate video-to-videotranslation with a vocabulary of 500 words.

Journal article

Demir I, Bazazian D, Romero A, Sharmanska V, Tchapmi LPet al., 2018, WiCV 2018: The Fourth Women In Computer Vision Workshop, IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Publisher: IEEE, Pages: 1941-1943, ISSN: 2160-7508

We present WiCV 2018 - Women in Computer Vision Workshop to increase the visibility and inclusion of women researchers in computer vision field, organized in conjunction with CVPR 2018. Computer vision and machine learning have made incredible progress over the past years, yet the number of female researchers is still low both in academia and industry. WiCV is organized to raise visibility of female researchers, to increase the collaboration, and to provide mentorship and give opportunities to female-identifying junior researchers in the field. In its fourth year, we are proud to present the changes and improvements over the past years, summary of statistics for presenters and attendees, followed by expectations from future generations.

Conference paper

Quadrianto N, Sharmanska V, 2017, Recycling privileged learning and distribution matching for fairness, Advances in Neural Information Processing Systems (NIPS), Publisher: Neural Information Processing Systems Foundation, Inc.

Equipping machine learning models with ethical and legal constraints is a serious issue; without this, the future of machine learning is at risk. This paper takes a step forward in this direction and focuses on ensuring machine learning models deliver fair decisions. In legal scholarships, the notion of fairness itself is evolving and multi-faceted. We set an overarching goal to develop a unified machine learning framework that is able to handle any definitions of fairness, their combinations, and also new definitions that might be stipulated in the future. To achieve our goal, we recycle two well-established machine learning techniques, privileged learning and distribution matching, and harmonize them for satisfying multi-faceted fairness definitions. We consider protected characteristics such as race and gender as privileged information that is available at training but not at test time; this accelerates model training and delivers fairness through unawareness. Further, we cast demographic parity, equalized odds, and equality of opportunity as a classical two-sample problem of conditional distributions, which can be solved in a general form by using distance measures in Hilbert Space. We show several existing models are special cases of ours. Finally, we advocate returning the Pareto frontier of multi-objective minimization of error and unfairness in predictions. This will facilitate decision makers to select an operating point and to be accountable for it.

Conference paper

Sharmanska V, Quadrianto N, 2017, Learning Using Privileged Information, Encyclopedia of Machine Learning and Data Mining, Publisher: Springer US, Pages: 734-737

Book chapter

Sharmanska V, Quadrianto N, 2017, In the Era of Deep Convolutional Features: Are Attributes Still Useful Privileged Data?, VISUAL ATTRIBUTES, Editors: Feris, Lampert, Parikh, Publisher: SPRINGER INTERNATIONAL PUBLISHING AG, Pages: 31-48, ISBN: 978-3-319-50075-1

Book chapter

, 2017, Visual Attributes, Publisher: Springer International Publishing, ISBN: 9783319500751

Book

Sharmanska V, Quadrianto N, 2016, Learning from the mistakes of others: matching errors in cross-dataset learning, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Publisher: IEEE, Pages: 3967-3975, ISSN: 1063-6919

Can we learn about object classes in images by looking at a collection of relevant 3D models? Or if we want to learn about human (inter-)actions in images, can we benefit from videos or abstract illustrations that show these actions? A common aspect of these settings is the availability of additional or privileged data that can be exploited at training time and that will not be available and not of interest at test time. We seek to generalize the learning with privileged information (LUPI) framework, which requires additional information to be defined per image, to the setting where additional information is a data collection about the task of interest. Our framework minimizes the distribution mismatch between errors made in images and in privileged data. The proposed method is tested on four publicly available datasets: Image+ClipArt, Image+3Dobject, and Image+ Video. Experimental results reveal that our new LUPI paradigm naturally addresses the cross-dataset learning.

Conference paper

Sharmanska V, Hernandez-Lobato D, Hernandez-Lobato JM, Quadrianto Net al., 2016, Ambiguity helps: classification with disagreements in crowdsourced annotations, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Publisher: IEEE, Pages: 2194-2202, ISSN: 1063-6919

Imagine we show an image to a person and ask her/him to decide whether the scene in the image is warm or not warm, and whether it is easy or not to spot a squirrel in the image. For exactly the same image, the answers to those questions are likely to differ from person to person. This is because the task is inherently ambiguous. Such an ambiguous, therefore challenging, task is pushing the boundary of computer vision in showing what can and can not be learned from visual data. Crowdsourcing has been invaluable for collecting annotations. This is particularly so for a task that goes beyond a clear-cut dichotomy as multiple human judgments per image are needed to reach a consensus. This paper makes conceptual and technical contributions. On the conceptual side, we define disagreements among annotators as privileged information about the data instance. On the technical side, we propose a framework to incorporate annotation disagreements into the classifiers. The proposed framework is simple, relatively fast, and outperforms classifiers that do not take into account the disagreements, especially if tested on high confidence annotations.

Conference paper

Taylor J, Sharmanska V, Kersting K, Weir D, Quadrianto Net al., 2016, Learning using unselected features (LUFe), International Joint Conference on Artificial Intelligence, Publisher: AAAI, Pages: 2060-2066

Feature selection has been studied in machinelearning and data mining for many years, andis a valuable way to improve classification accu-racy while reducing model complexity. Two mainclasses of feature selection methods - filter andwrapper - discard those features which are not se-lected, and do not consider them in the predictivemodel. We propose that these unselected featuresmay instead be used as an additional source of in-formation at train time. We describe a strategycalled Learning using Unselected Features (LUFe)that allows selected and unselected features to servedifferent functions in classification. In this frame-work, selected features are used directly to setthe decision boundary, and unselected features areutilised in a secondary role, with no additional costat test time. Our empirical results on 49 textualdatasets show that LUFe can improve classificationperformance in comparison with standard wrapperand filter feature selection.

Conference paper

Sharmanska V, Quadrianto N, 2016, Learning Using Privileged Information, Encyclopedia of Machine Learning and Data Mining, Publisher: Springer US, Pages: 1-4

Book chapter

Pentina A, Sharmanska V, Lampert CH, 2015, Curriculum learning of multiple tasks, Boston, MA, USA, IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Publisher: IEEE Conference on Computer Vision and Pattern Recognition, Pages: 5492-5500, ISSN: 1063-6919

Sharing information between multiple tasks enables algorithms to achieve good generalization performance even from small amounts of training data. However, in a realistic scenario of multi-task learning not all tasks are equally related to each other, hence it could be advantageous to transfer information only between the most related tasks. In this work we propose an approach that processes multiple tasks in a sequence with sharing between subsequent tasks instead of solving all tasks jointly. Subsequently, we address the question of curriculum learning of tasks, i.e. finding the best order of tasks to be learned. Our approach is based on a generalization bound criterion for choosing the task order that optimizes the average expected classification performance over all tasks. Our experimental results show that learning multiple related tasks sequentially can be more effective than learning them jointly, the order in which tasks are being solved affects the overall performance, and that our model is able to automatically discover a favourable order of tasks.

Conference paper

Hernández-Lobato D, Sharmanska V, Kersting K, Lampert CH, Quadrianto Net al., 2014, Mind the nuisance: Gaussian process classification using privileged noise, Advances in Neural Information Processing Systems (NIPS), Publisher: Neural Information Processing Systems (NIPS)

The learning with privileged information setting has recently attracted a lotof attention within the machine learning community, as it allows theintegration of additional knowledge into the training process of a classifier,even when this comes in the form of a data modality that is not available attest time. Here, we show that privileged information can naturally be treatedas noise in the latent function of a Gaussian Process classifier (GPC). Thatis, in contrast to the standard GPC setting, the latent function is not just anuisance but a feature: it becomes a natural measure of confidence about the training data by modulating the slope of the GPC sigmoid likelihood function. Extensive experiments on public datasets show that the proposed GPC method using privileged noise, called GPC+, improves over a standard GPC without privileged knowledge, and also over the current state-of-the-art SVM-based method, SVM+. Moreover, we show that advanced neural networks and deep learning methods can be compressed as privileged information.

Conference paper

Sharmanska V, Quadrianto N, Lampert CH, 2014, Learning to rank using privileged information, IEEE International Conference on Computer Vision (ICCV), Publisher: IEEE, Pages: 825-832, ISSN: 1550-5499

Many computer vision problems have an asymmetric distribution of information between training and test time. In this work, we study the case where we are given additional information about the training data, which however will not be available at test time. This situation is called learning using privileged information (LUPI). We introduce two maximum-margin techniques that are able to make use of this additional source of information, and we show that the framework is applicable to several scenarios that have been studied in computer vision before. Experiments with attributes, bounding boxes, image tags and rationales as additional information in object classification show promising results.

Conference paper

Quadrianto N, Sharmanska V, Knowles DA, Ghahramani Zet al., 2013, The Supervised IBP: Neighbourhood Preserving Infinite Latent Feature Models, Uncertainty in Artificial Intelligence

We propose a probabilistic model to infer supervised latent variables in theHamming space from observed data. Our model allows simultaneous inference ofthe number of binary latent variables, and their values. The latent variablespreserve neighbourhood structure of the data in a sense that objects in thesame semantic concept have similar latent values, and objects in differentconcepts have dissimilar latent values. We formulate the supervised infinitelatent variable problem based on an intuitive principle of pulling objectstogether if they are of the same type, and pushing them apart if they are not.We then combine this principle with a flexible Indian Buffet Process prior onthe latent variables. We show that the inferred supervised latent variables canbe directly used to perform a nearest neighbour search for the purpose ofretrieval. We introduce a new application of dynamically extending hash codes,and show how to effectively couple the structure of the hash codes withcontinuously growing structure of the neighbourhood preserving infinite latentfeature space.

Conference paper

Sharmanska V, Quadrianto N, Lampert CH, 2012, Augmented Attribute Representations, 12th European Conference on Computer Vision (ECCV), Publisher: SPRINGER-VERLAG BERLIN, Pages: 242-255, ISSN: 0302-9743

We propose a new learning method to infer a mid-level feature representation that combines the advantage of semantic attribute representations with the higher expressive power of non-semantic features. The idea lies in augmenting an existing attribute-based representation with additional dimensions for which an autoencoder model is coupled with a large-margin principle. This construction allows a smooth transition between the zero-shot regime with no training example, the unsupervised regime with training examples but without class labels, and the supervised regime with training examples and with class labels. The resulting optimization problem can be solved efficiently, because several of the necessity steps have closed-form solutions. Through extensive experiments we show that the augmented representation achieves better results in terms of object categorization accuracy than the semantic representation alone.

Conference paper

This data is extracted from the Web of Science and reproduced under a licence from Thomson Reuters. You may not copy or re-distribute this data in whole or in part without the written consent of the Science business of Thomson Reuters.

Request URL: http://wlsprd.imperial.ac.uk:80/respub/WEB-INF/jsp/search-html.jsp Request URI: /respub/WEB-INF/jsp/search-html.jsp Query String: respub-action=search.html&id=00938785&limit=30&person=true