Publications

Koniusz P, Mikolajczyk K, 2011, Spatial coordinate coding to reduce histogram representations, dominant angle and colour pyramid match, ICIP 2011, Publisher: IEEE, Pages: 661-664, ISSN: 1522-4880

Spatial Pyramid Match lies at a heart of modern object category recognition systems. Once image descriptors are expressed as histograms of visual words, they are further deployed across spatial pyramid with coarse-to-fine spatial location grids. However, such representation results in extreme histogram vectors of 200K or more elements increasing computational and memory requirements. This paper investigates alternative ways of introducing spatial information during formation of histograms. Specifically, we propose to apply spatial location information at a descriptor level and refer to it as Spatial Coordinate Coding. Alternatively, x, y, radius, or angle is used to perform semi-coding. This is achieved by adding one of the spatial components at the descriptor level whilst applying Pyramid Match to another. Lastly, we demonstrate that Pyramid Match can be applied robustly to other measurements: Dominant Angle and Colour. We demonstrate state-of-the art results on two datasets with means of Soft Assignment and Sparse Coding.

Abstract
Cite

Conference paper

Koniusz P, Mikolajczyk K, 2011, Soft assignment of visual words as linear coordinate coding and optimisation of its reconstruction error, ICIP 2011, Publisher: IEEE, Pages: 2413-2416, ISSN: 1522-4880

Visual Word Uncertainty also referred to as Soft Assignment is a well established technique for representing images as histograms by flexible assignment of image descriptors to a visual vocabulary. Recently, an attention of the community dealing with the object category recognition has been drawn to Linear Coordinate Coding methods. In this work, we focus on Soft Assignment as it yields good results amidst competitive methods. We show that one can take two views on Soft Assignment: an approach derived from Gaussian Mixture Model or special case of Linear Coordinate Coding. The latter view helps us propose how to optimise smoothing factor of Soft Assignment in a way that minimises descriptor reconstruction error and maximises classification performance. In turns, this renders tedious cross-validation towards establishing this parameter unnecessary and yields it a handy technique. We demonstrate state-of-the-art performance of such optimised assignment on two image datasets and several types of descriptors.

Abstract
Cite

Conference paper

Yan F, Kittler J, Mikolajczyk K, 2010, Multiple Kernel Learning and Feature Space Denoising, 2010 International Conference on Machine Learning and Cybernetics (ICMLC), Publisher: IEEE, Pages: 1771-1776

We review a multiple kernel learning (MKL) technique called ℓp regularised multiple kernel Fisher discriminant analysis (MK-FDA), and investigate the effect of feature space denoising on MKL. Experiments show that with both the original kernels or denoised kernels, ℓp MK-FDA outperforms its fixed-norm counterparts. Experiments also show that feature space denoising boosts the performance of both single kernel FDA and ℓp MK-FDA, and that there is a positive correlation between the learnt kernel weights and the amount of variance kept by feature space denoising. Based on these observations, we argue that in the case where the base feature spaces are noisy, linear combination of kernels cannot be optimal. An MKL objective function which can take care of feature space denoising automatically, and which can learn a truly optimal (non-linear) combination of the base kernels, is yet to be found.

Abstract
Cite

Conference paper

Cai H, Yan F, Mikolajczyk K, 2010, Learning Weights for Codebook in Image Classification and Retrieval, IEEE Conference on Computer Vision and Pattern Recognition, Publisher: IEEE, Pages: 2320-2327, ISSN: 1063-6919

This paper presents a codebook learning approach for image classification and retrieval. It corresponds to learning a weighted similarity metric to satisfy that the weighted similarity between the same labeled images is larger than that between the differently labeled images with largest margin. We formulate the learning problem as a convex quadratic programming and adopt alternating optimization to solve it efficiently. Experiments on both synthetic and real datasets validate the approach. The codebook learning improves the performance, in particular in the case where the number of training examples is not sufficient for large size codebook.

Abstract
Cite

Conference paper

Yan F, Mikolajczyk K, Kittler J, Tahir MAet al., 2010, Combining Multiple Kernels by Augmenting the Kernel Matrix, 9th International Workshop on Multiple Classifier Systems, Publisher: SPRINGER-VERLAG BERLIN, Pages: 175-184, ISSN: 0302-9743

Conference paper

Tahir MA, Kittler J, Mikolajczyk K, Yan Fet al., 2010, Improving Multilabel Classification Performance by Using Ensemble of Multi-label Classifiers, 9th International Workshop on Multiple Classifier Systems, Publisher: SPRINGER-VERLAG BERLIN, Pages: 11-21, ISSN: 0302-9743

Conference paper

Yan F, Mikolajczyk K, Barnard M, Cai H, Kittler Jet al., 2010, Lp Norm Multiple Kernel Fisher Discriminant Analysis for Object and Image Categorisation, IEEE Conference on Computer Vision and Pattern Recognition

Cite

Conference paper

Kalal Z, Matas J, Mikolajczyk K, 2010, P-N Learning: Bootstrapping Binary Classifiers by Structural Constraints, 23rd IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Publisher: IEEE COMPUTER SOC, Pages: 49-56, ISSN: 1063-6919

Conference paper

Koniusz P, Mikolajczyk K, 2010, On a quest for image descriptors based on unsupervised segmentation maps, 2010 20th ICPR, Pages: 762-765, ISSN: 1051-4651

This paper investigates segmentation-based image descriptors for object category recognition. In contrast to commonly used interest points the proposed descriptors are extracted from pairs of adjacent regions given by a segmentation method. In this way we exploit semi-local structural information from the image. We propose to use the segments as spatial bins for descriptors of various image statistics based on gradient, colour and region shape. Proposed descriptors are validated on standard recognition benchmarks. Results show they outperform state-of-the-art reference descriptors with 5.6x less data and achieve comparable results to them with 8.6x less data. The proposed descriptors are complementary to SIFT and achieve state-of-the-art results when combined together within a kernel based classifier.

Abstract
Cite

Conference paper

Awais M, Mikolajczyk K, 2010, Feature pairs connected by lines for object recognition, 2010 20th ICPR, Pages: 3093-3096, ISSN: 1051-4651

In this paper we exploit image edges and segmentation maps to build features for object category recognition. We build a parametric line based image approximation to identify the dominant edge structures. Line ends are used as features described by histograms of gradient orientations. We then form descriptors based on connected line ends to incorporate weak topological constraints which improve their discriminative power. Using point pairs connected by an edge assures higher repeatability than a random pair of points or edges. The results are compared with state-of-the-art, and show significant improvement on challenging recognition benchmark Pascal VOC 2007. Kernel based fusion is performed to emphasize the complementary nature of our descriptors with respect to the state-of-the-art features.

Abstract
Cite

Conference paper

Kalal Z, Mikolajczyk K, Matas J, 2010, Face-TLD: Tracking-learning-detection applied to faces, Proceedings - International Conference on Image Processing, ICIP, Pages: 3789-3792, ISSN: 1522-4880

A novel system for long-term tracking of a human face in unconstrained videos is built on Tracking-Learning-Detection (TLD) approach. The system extends TLD with the concept of a generic detector and a validator which is designed for real-time face tracking resistent to occlusions and appearance changes. The off-line trained detector localizes frontal faces and the online trained validator decides which faces correspond to the tracked subject. Several strategies for building the validator during tracking are quantitatively evaluated. The system is validated on a sitcom episode (23 min.) and a surveillance (8 min.) video. In both cases the system detectstracks the face and automatically learns a multi-view model from a single frontal example and an unlabeled video

Abstract
Cite

Journal article

Kalal Z, Mikolajczyk K, Matas J, 2010, Forward-backward error: Automatic detection of tracking failures, 2010 20th ICPR, Pages: 2756-2759, ISSN: 1051-4651

This paper proposes a novel method for tracking failure detection. The detection is based on the Forward-Backward error, i.e. the tracking is performed forward and backward in time and the discrepancies between these two trajectories are measured. We demonstrate that the proposed error enables reliable detection of tracking failures and selection of reliable trajectories in video sequences. We demonstrate that the approach is complementary to commonly used normalized cross-correlation (NCC). Based on the error, we propose a novel object tracker called Median Flow. State-of-the-art performance is achieved on challenging benchmark video sequences which include non-rigid objects.

Abstract
Cite

Conference paper

Tahir MA, Yan F, Barnard M, Awais M, Mikolajczyk K, Kittler Jet al., 2010, The University of Surrey visual concept detection system at ImageCLEF@ICPR: Working notes, Lecture Notes in Computer Science: Recognising Patterns in Signals, Speech, Images and Videos, Vol: 6388, Pages: 162-170, ISSN: 1611-3349

Visual concept detection is one of the most important tasks in image and video indexing. This paper describes our system in the ImageCLEF@ICPR Visual Concept Detection Task which ranked first for large-scale visual concept detection tasks in terms of Equal Error Rate (EER) and Area under Curve (AUC) and ranked third in terms of hierarchical measure. The presented approach involves state-of-the-art local descriptor computation, vector quantisation via clustering, structured scene or object representation via localised histograms of vector codes, similarity measure for kernel construction and classifier learning. The main novelty is the classifier-level and kernel-level fusion using Kernel Discriminant Analysis with RBF/Power Chi-Squared kernels obtained from various image descriptors. For 32 out of 53 individual concepts, we obtain the best performance of all 12 submissions to this task.

Abstract
Cite

Journal article

Tahir A, Yan F, Barnard M, Awais M, Mikolajczyk K, Kittler Jet al., 2010, The University of Surrey Visual Concept Detection System at ImageCLEF 2010: Working Notes, ICPR 2010, Publisher: Springer

Visual concept detection is one of the most important tasks in image and video indexing. This paper describes our system in the ImageCLEF@ICPR Visual Concept Detection Task which ranked first for large-scale visual concept detection tasks in terms of Equal Error Rate (EER) and Area under Curve (AUC) and ranked third in terms of hierarchical measure. The presented approach involves state-of-the-art local descriptor computation, vector quantisation via clustering, structured scene or object representation via localised histograms of vector codes, similarity measure for kernel construction and classifier learning. The main novelty is the classifier-level and kernel-level fusion using Kernel Discriminant Analysis with RBF/Power Chi-Squared kernels obtained from various image descriptors. For 32 out of 53 individual concepts, we obtain the best performance of all 12 submissions to this task.

Abstract
Cite

Conference paper

Tahir MA, Kittler J, Mikolajczyk K, Yan F, Van De Sande KEA, Gevers Tet al., 2009, Visual category recognition using spectral regression and kernel discriminant analysis, Pages: 178-185

Visual category recognition (VCR) is one of the most important tasks in image and video indexing. Spectral methods have recently emerged as a powerful tool for dimensionality reduction and manifold learning. Recently, Spectral Regression combined with Kernel Discriminant Analysis (SR-KDA) has been successful in many classification problems. In this paper, we adopt this solution to VCR and demonstrate its advantages over existing methods both in terms of speed and accuracy. The distinctiveness of this method is assessed experimentally using an image and a video benchmark: the PASCAL VOC Challenge 08 and the Mediamill Challenge. From the experimental results, it can be derived that SR-KDA consistently yields significant performance gains when compared with the state-of-the art methods. The other strong point of using SR-KDA is that the time complexity scales linearly with respect to the number of concepts and the main computational complexity is independent of the number of categories. ©2009 IEEE.

Abstract
Cite

Conference paper

Yan F, Mikolajczyk K, Kittler J, Tahir Met al., 2009, A Comparison of l(1) Norm and l(2) Norm Multiple Kernel SVMs in Image and Video Classification, International Workshop on Content-Based Multimedia Indexing, Publisher: IEEE, Pages: 7-12, ISSN: 1949-3983

Conference paper

Tahir MA, Kittler J, Yan F, Mikolajczyk Ket al., 2009, Kernel Discriminant Analysis using Triangular Kernel for Semantic Scene Classification, International Workshop on Content-Based Multimedia Indexing, Publisher: IEEE, Pages: 1-6, ISSN: 1949-3983

Conference paper

Tahir MA, Kittler J, Mikolajczyk K, Yan Fet al., 2009, A Multiple Expert Approach to the Class Imbalance Problem Using Inverse Random under Sampling, 8th International Workshop on Multiple Classifier Systems, Publisher: SPRINGER-VERLAG BERLIN, Pages: 82-91, ISSN: 0302-9743

Conference paper

Tahir A, Kittler J, Yan F, Mikolajczyk Ket al., 2009, Concept Learning for Image and Video Retrieval: the Inverse Random Under Sampling Approach, 17th European Signal Processing Conference (EUSIPCO 2009), Pages: 574-578

Cite

Conference paper

Schubert F, Schertler K, Mikolajczyk K, 2009, A hands-on approach to high-dynamic-range and superresolution fusion, 2009 9th IEEE WAVC, ISSN: 1550-5790

This paper discusses a new framework to enhance image and video quality. Recent advances in high-dynamic-range image fusion and superresolution make it possible to extend the intensity range or to increase the resolution of the image beyond the limitations of the sensor. In this paper, we propose a new way to combine both of these fusion methods in a two-stage scheme. To achieve robust image enhancement in practical application scenarios, we adapt state-of-the-art methods for automatic photometric camera calibration, controlled image acquisition, image fusion and tone mapping. With respect to high-dynamic-range reconstruction, we show that only two input images can sufficiently capture the dynamic range of the scene. The usefulness and performance of this system is demonstrated on images taken with various types of cameras.

Abstract
Cite

Conference paper

Snoek CGM, van de Sande KEA, Uijlings JRR, Bugalho M, Trancoso I, Yan F, Tahir MA, Mikolajczyk K, Kittler J, Gevers T, Koelma DC, Smeulders AWMet al., 2009, Learning from video browse behavior, 2009 TREC Video Retrieval Evaluation Notebook Papers

Cite

Journal article

Mikolajczyk K, Tuytelaars T, 2009, Local Image Features., Encyclopedia of Biometrics, Editors: Li, Jain, Publisher: Springer US, Pages: 939-943, ISBN: 978-0-387-73002-8

Book chapter

Snoek CGM, van de Sande KEA, Uijlings JRR, Gevers T, Koelma DC, Smeulders AWM, Bugalho M, Trancoso I, Yan F, Tahir MA, Mikolajczyk K, Kittler Jet al., 2009, Multi-frame, multi-modal, and multi-kernel concept detection in video, 2009 TREC Video Retrieval Evaluation Notebook Papers

Cite

Journal article

Kalal Z, Matas J, Mikolajczyk K, 2009, Online learning of robust object detectors during unstable tracking, 12th ICCV Worksshops, Pages: 1417-1424

This work investigates the problem of robust, longterm visual tracking of unknown objects in unconstrained environments. It therefore must cope with frame-cuts, fast camera movements and partial/total object occlusions/dissapearances. We propose a new approach, called Tracking-Modeling-Detection (TMD) that closely integrates adaptive tracking with online learning of the object-specific detector. Starting from a single click in the first frame, TMD tracks the selected object by an adaptive tracker. The trajectory is observed by two processes (growing and pruning event) that robustly model the appearance and build an object detector on the fly. Both events make errors, the stability of the system is achieved by their cancellation. The learnt detector enables re-initialization of the tracker whenever previously observed appearance reoccurs. We show the real-time learning and classification is achievable with random forests. The performance and the long-term stability of TMD is demonstrated and evaluated on a set of challenging video sequences with various objects such as cars, people and animals.

Abstract
Cite

Conference paper

Koniusz P, Mikolajczyk K, 2009, Segmentation Based Interest Points and Evaluation of Unsupervised Image Segmentation Methods., Publisher: British Machine Vision Association, Pages: 1-11

Conference paper

Snoek C, Sande K, Rooij O, Huurnink B, Uijlings J, Liempt M, Bugalhoy M, Trancosoy I, Yan F, Tahir M, Mikolajczyk K, Kittler J, Rijke M, Geusebroek J, Gevers T, Worring M, Koelma D, Smeulders Aet al., 2009, The MediaMill TRECVID 2009 Semantic Video Search Engine

In this paper we describe our TRECVID 2009 video re- trieval experiments. The MediaMill team participated in three tasks: concept detection, automatic search, and in- teractive search. The starting point for the MediaMill con- cept detection approach is our top-performing bag-of-words system of last year, which uses multiple color descriptors, codebooks with soft-assignment, and kernel-based supervised learning. We improve upon this baseline system by explor- ing two novel research directions. Firstly, we study a multi- modal extension by including 20 audio concepts and fusion using two novel multi-kernel supervised learning methods. Secondly, with the help of recently proposed algorithmic re- nements of bag-of-word representations, a GPU implemen- tation, and compute clusters, we scale-up the amount of vi- sual information analyzed by an order of magnitude, to a total of 1,000,000 i-frames. Our experiments evaluate the merit of these new components, ultimately leading to 64 ro- bust concept detectors for video retrieval. For retrieval, a robust but limited set of concept detectors justi es the need to rely on as many auxiliary information channels as pos- sible. For automatic search we therefore explore how we can learn to rank various information channels simultane- ously to maximize video search results for a given topic. To further improve the video retrieval results, our interactive search experiments investigate the roles of visualizing pre- view results for a certain browse-dimension and relevance feedback mechanisms that learn to solve complex search top- ics by analysis from user browsing behavior. The 2009 edi- tion of the TRECVID benchmark has again been a fruitful participation for the MediaMill team, resulting in the top ranking for both concept detection and interactive search. Again a lot has been learned during this year’s TRECVID campaign; we highlight the most important lessons at the end of this paper.

Abstract
Cite

Conference paper

Yan F, Kittler J, Mikolajczyk K, Tahir Aet al., 2009, Non-Sparse Multiple Kernel Learning for Fisher Discriminant Analysis, ICDM ’09, Pages: 1064-1069, ISSN: 1550-4786

We consider the problem of learning a linear combination of pre-specified kernel matrices in the Fisher discriminant analysis setting. Existing methods for such a task impose an Â¿1 norm regularisation on the kernel weights, which produces sparse solution but may lead to loss of information. In this paper, we propose to use Â¿2 norm regularisation instead. The resulting learning problem is formulated as a semi-infinite program and can be solved efficiently. Through experiments on both synthetic data and a very challenging object recognition benchmark, the relative advantages of the proposed method and its Â¿1 counterpart are demonstrated, and insights are gained as to how the choice of regularisation norm should be made.

Abstract
Cite

Conference paper

Mikolajczyk K, Uemura H, 2008, Action recognition with motion-appearance vocabulary forest, IEEE Conference on Computer Vision and Pattern Recognition, Publisher: IEEE, Pages: 2229-2236, ISSN: 1063-6919

Conference paper

Schubert F, Mikolajczyk K, 2008, Combining High-Resolution Images With Low-Quality Videos., Publisher: British Machine Vision Association, Pages: 1-10

Conference paper

Kalal Z, Matas J, Mikolajczyk K, 2008, Weighted Sampling for Large-Scale Boosting., Publisher: British Machine Vision Association, Pages: 1-10

Conference paper

ProfessorKrystianMikolajczyk

Contact

Location

Summary