68 results found
Ramisa A, Yan F, Moreno-Noguer F, et al., 2018, BreakingNews: Article Annotation by Image and Text Processing, IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, Vol: 40, Pages: 1072-1085, ISSN: 0162-8828
Balntas V, Tang L, Mikolajczyk K, 2018, Binary Online Learned Descriptors, IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol: 40, Pages: 555-567, ISSN: 0162-8828
Koniusz P, Yan F, Gosselin P-H, et al., 2017, Higher-Order Occurrence Pooling for Bags-of-Words: Visual Concept Detection, IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, Vol: 39, Pages: 313-326, ISSN: 0162-8828
Akin O, Erdem E, Erdem A, et al., 2016, Deformable part-based tracking by coupled global and local correlation filters, Journal of Visual Communication and Image Representation, Vol: 38, Pages: 763-774, ISSN: 1047-3203
Chan CH, Yan F, Kittler J, et al., 2015, Full ranking as local descriptor for visual recognition: A comparison of distance metrics on sn, Pattern Recognition, Vol: 48, Pages: 1328-1336, ISSN: 0031-3203
Yan F, Kittler J, Windridge D, et al., 2014, Automatic annotation of tennis games: An integration of audio, vision, and learning, Image and Vision Computing, Vol: 32, Pages: 896-903, ISSN: 0262-8856
Bowden R, Collomosse J, Mikolajczyk K, 2014, Guest Editorial: Tracking, Detection and Segmentation, International Journal of Computer Vision, Vol: 110, Pages: 1-1, ISSN: 0920-5691
Gaur A, Mikolajczyk K, 2014, Ranking images based on aesthetic qualities, Pages: 3410-3415, ISSN: 1051-4651
© 2014 IEEE. We propose a novel approach for learning image representation based on qualitative assessments of visual aesthetics. It relies on a multi-node multi-state model that represents image attributes and their relations. The model is learnt from pair wise image preferences provided by annotators. To demonstrate the effectiveness we apply our approach to fashion image rating, i.e., comparative assessment of aesthetic qualities. Bag-of-features object recognition is used for the classification of visual attributes such as clothing and body shape in an image. The attributes and their relations are then assigned learnt potentials which are used to rate the images. Evaluation of the representation model has demonstrated a high performance rate in ranking fashion images.
Schubert F, Mikolajczyk K, 2014, Robust registration and filtering for moving object detection in aerial videos, Pages: 2808-2813, ISSN: 1051-4651
© 2014 IEEE. In this paper we present a multi-frame motion detection approach for aerial platforms with a two-folded contribution. First, we propose a novel image registration method, which can robustly cope with a large variety of aerial imagery. We show that it can benefit from a hardware accelerated implementation using graphic cards, allowing processing at high frame rate. Second, to handle the inaccuracy of the registration and sensor noise that result in false-alarms, we present an efficient filtering step to reduce incorrect motion hypotheses that arise from background substraction. We show that the proposed filtering significantly improves the precision of the motion detection while maintaining high recall. We introduce a new dataset for evaluating aerial surveillance systems, which will be made available for comparison. We evaluate the registration performance in terms of accuracy and speed as well as the filtering in terms of motion detection performance.
Balntas V, Tang L, Mikolajczyk K, 2014, Improving object tracking with voting from false positive detections, Pages: 1928-1933, ISSN: 1051-4651
© 2014 IEEE. Context provides additional information in detection and tracking and several works proposed online trained trackers that make use of the context. However, the context is usually considered during tracking as items with motion patterns significantly correlated with the target. We propose a new approach that exploits context in tracking-by-detection and makes use of persistent false positive detections. True detection as well as repeated false positives act as pointers to the location of the target. This is implemented with a generalised Hough voting and incorporated into a state-of-the art online learning framework. The proposed method presents good performance in both speed and accuracy and it improves the current state of the art results in a challenging benchmark.
Akin O, Mikolajczyk K, 2014, Online learning and detection with part-based, circulant structure, Pages: 4229-4233, ISSN: 1051-4651
© 2014 IEEE. Circulant Structure Kernel (CSK) has recently been introduced as a simple and extremely efficient tracking method. In this paper, we propose an extension of CSK that explicitly addresses partial occlusion problems which the original CSK suffers from. Our extension is based on a part-based scheme, which improves the robustness and localisation accuracy. Furthermore, we improve the robustness of CSK for long-term tracking by incorporating it into an online learning and detection framework. We provide an extensive comparison to eight recently introduced tracking methods. Our experimental results show that the proposed approach significantly improves the original CSK and provides state-of-the-art results when combined with online learning approach.
Yan F, Mikolajczyk K, 2014, Leveraging High Level Visual Information for Matching Images and Captions, Asian Conference on Computer Vision
Schubert F, Mikolajczyk K, 2013, Performance evaluation of image filtering for classification and retrieval, Pages: 485-491
Much research effort in the literature is focused on improving feature extraction methods to boost the performance in various computer vision applications. This is mostly achieved by tailoring feature extraction methods to specific tasks. For instance, for the task of object detection often new features are designed that are even more robust to natural variations of a certain object class and yet discriminative enough to achieve high precision. This focus led to a vast amount of different feature extraction methods with more or less consistent performance across different applications. Instead of fine-tuning or re-designing new features to further increase performance we want to motivate the use of image filters for pre-processing. We therefore present a performance evaluation of numerous existing image enhancement techniques which help to increase performance of already well-known feature extraction methods. We investigate the impact of such image enhancement or filtering techniques on two state-of-the-art image classification and retrieval approaches. For classification we evaluate using a standard Pascal VOC dataset. For retrieval we provide a new challenging dataset. We find that gradient-based interest-point detectors and descriptors such as SIFT or HOG can benefit from enhancement methods and lead to improved performance.
Koniusz P, Yan F, Mikolajczyk K, 2013, Comparison of mid-level feature coding approaches and pooling strategies in visual concept detection, Computer Vision and Image Understanding, Vol: 117, Pages: 479-492, ISSN: 1077-3142
In this paper, we present a novel approach to saliency detection. We define a visually salient region with the following two properties; global saliency i.e. the spatial redundancy, and local saliency i.e. the region complexity. The former is its probability of occurrence within the image, whereas the latter defines how much information is contained within the region, and it is quantified by the entropy. By combining the global spatial redundancy measure and local entropy, we can achieve a simple, yet robust saliency detector. We evaluate it quantitatively and compare to Itti et al.  as well as to the spectral residual approach  on publicly available data where it shows a significant improvement. © 2013 Springer-Verlag.
Schubert F, Mikolajczyk K, 2013, Benchmarking GPU-based phase correlation for homography-based registration of aerial imagery, Pages: 83-90, ISSN: 0302-9743
Many multi-image fusion applications require fast registration methods in order to allow real-time processing. Although the most popular approaches, local-feature-based methods, have proven efficient enough for registering image pairs at real-time, some applications like multi-frame background subtraction, super-resolution or high-dynamic-range imaging benefit from even faster algorithms. A common trend to speed up registration is to implement the algorithms on graphic cards (GPUs). However not all algorithms are specially suited for massive parallelization via GPUs. In this paper we evaluate the speed of a well-known global registration method, i.e. phase correlation, for computing 8-DOF homographies. We propose a benchmark to compare a CPU- and GPU-based implementation using different systems and a dataset of aerial imagery. We demonstrate that phase correlation benefits from GPU-based implementations much more than local methods, significantly increasing the processing speed. © 2013 Springer-Verlag.
Tahir M, Yan F, Koniusz P, et al., 2012, A Robust and Scalable Visual Category and Action Recognition System using Kernel Discriminant Analysis with Spectral Regression, IEEE Transactions on Multimedia, ISSN: 1520-9210
Kalal Z, Mikolajczyk K, Matas J, 2012, Tracking-Learning-Detection, IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol: 34, Pages: 1409-1422, ISSN: 0162-8828
, 2012, British Machine Vision Conference, BMVC 2012, Surrey, UK, September 3-7, 2012, Publisher: BMVA Press
Miksik O, Mikolajczyk K, 2012, Evaluation of local detectors and descriptors for fast feature matching, 21st International Conference on Pattern Recognition, Publisher: IEEE, Pages: 2681-2684, ISSN: 1051-4651
Local feature detectors and descriptors are widely used in many computer vision applications and various methods have been proposed during the past decade. There have been a number of evaluations focused on various aspects of local features, matching accuracy in particular, however there has been no comparisons considering the accuracy and speed trade-offs of recent extractors such as BRIEF, BRISK, ORB, MRRID, MROGH and LIOP. This paper provides a performance evaluation of recent feature detectors and compares their matching precision and speed in randomized kd-trees setup as well as an evaluation of binary descriptors with efficient computation of Hamming distance. © 2012 ICPR Org Committee.
Yan F, Kittler J, Mikolajczyk K, et al., 2011, Non-Sparse Multiple Kernel Fisher Discriminant Analysis, Journal of Machine Learning Research, Vol: 13, Pages: 607-642, ISSN: 1532-4435
Sparsity-inducing multiple kernel Fisher discriminant analysis (MK-FDA) has been studied in the literature. Building on recent advances in non-sparse multiple kernel learning (MKL), we propose a non-sparse version of MK-FDA, which imposes a general ‘p norm regularisation on the kernel weights. We formulate the associated optimisation problem as a semi-infinite program (SIP), and adapt an iterative wrapper algorithm to solve it. We then discuss, in light of latest advances inMKL optimisation techniques, several reformulations and optimisation strategies that can potentially lead to significant improvements in the efficiency and scalability of MK-FDA. We carry out extensive experiments on six datasets from various application areas, and compare closely the performance of ‘p MK-FDA, fixed norm MK-FDA, and several variants of SVM-based MKL (MK-SVM). Our results demonstrate that ‘p MK-FDA improves upon sparse MK-FDA in many practical situations. The results also show that on image categorisation problems, ‘p MK-FDA tends to outperform its SVM counterpart. Finally, we also discuss the connection between (MK-)FDA and (MK-)SVM, under the unified framework of regularised kernel machines.
De Campos T, Barnard M, Mikolajczyk K, et al., 2011, An evaluation of bags-of-words and spatio-temporal shapes for action recognition, Pages: 344-351
Bags-of-visual-Words (BoW) and Spatio-Temporal Shapes (STS) are two very popular approaches for action recognition from video. The former (BoW) is an un-structured global representation of videos which is built using a large set of local features. The latter (STS) uses a single feature located on a region of interest (where the actor is) in the video. Despite the popularity of these methods, no comparison between them has been done. Also, given that BoW and STS differ intrinsically in terms of context inclusion and globality/locality of operation, an appropriate evaluation framework has to be designed carefully. This paper compares these two approaches using four different datasets with varied degree of space-time specificity of the actions and varied relevance of the contextual background. We use the same local feature extraction method and the same classifier for both approaches. Further to BoW and STS, we also evaluated novel variations of BoW constrained in time or space. We observe that the STS approach leads to better results in all datasets whose background is of little relevance to action classification. © 2010 IEEE.
Mikolajczyk K, Uemura H, 2011, Action recognition with appearance–motion features and fast search trees, Computer Vision and Image Understanding, Vol: 115, Pages: 426-438, ISSN: 1077-3142
Hongping C, Mikolajczyk K, Matas J, 2011, Learning Linear Discriminant Projections for Dimensionality Reduction of Image Descriptors, IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol: 33, Pages: 338-352, ISSN: 0162-8828
Awais M, Yan F, Mikolajczyk K, et al., 2011, Augmented Kernel Matrix vs classifier fusion for object recognition, BMVC 2011 - Proceedings of the British Machine Vision Conference 2011
Augmented Kernel Matrix (AKM) has recently been proposed to accommodate for the fact that a single training example may have different importance in different feature spaces, in contrast to Multiple Kernel Learning (MKL) that assigns the same weight to all examples in one feature space. However, the AKM approach is limited to small datasets due to its memory requirements. An alternative way to fuse information from different feature channels is classifier fusion (ensemble methods). There is a significant amount of work on linear programming formulations of classifier fusion (CF) in the case of binary classification. In this paper we derive primal and dual of AKM to draw its correspondence with CF. We propose a multiclass extension of binary v-LPBoost, which learns the contribution of each class in each feature channel. Existing approaches of CF promote sparse features combinations, due to regularization based on ℓ1-norm, and lead to a selection of a subset of feature channels, which is not good in case of informative channels. We also generalize existing CF formulations to arbitrary ℓp-norm for binary and multiclass problems which results in more effective use of complementary information. We carry out an extensive comparison and show that the proposed nonlinear CF schemes outperform its sparse counterpart as well as state-of-the-art MKL approaches. © 2011. The copyright of this document resides with its authors.
Yan F, Mikolajczyk K, Kittler J, 2011, Multiple Kernel Learning via Distance Metric Learning for Interactive Image Retrieval, 10th International Workshop on Multiple Classifier Systems, Publisher: SPRINGER-VERLAG BERLIN, Pages: 147-156, ISSN: 0302-9743
Awais M, Yan F, Mikolajczyk K, et al., 2011, Novel fusion methods for pattern recognition, ECML PKDD 2011: Machine Learning and Knowledge Discovery in Databases, Publisher: Springer, Pages: 140-155, ISSN: 0302-9743
Over the last few years, several approaches have been proposed for information fusion including different variants of classifier level fusion (ensemble methods), stacking and multiple kernel learning (MKL). MKL has become a preferred choice for information fusion in object recognition. However, in the case of highly discriminative and complementary feature channels, it does not significantly improve upon its trivial baseline which averages the kernels. Alternative ways are stacking and classifier level fusion (CLF) which rely on a two phase approach. There is a significant amount of work on linear programming formulations of ensemble methods particularly in the case of binary classification. In this paper we propose a multiclass extension of binary ν-LPBoost, which learns the contribution of each class in each feature channel. The existing approaches of classifier fusion promote sparse features combinations, due to regularization based on ℓ1-norm, and lead to a selection of a subset of feature channels, which is not good in the case of informative channels. Therefore, we generalize existing classifier fusion formulations to arbitrary ℓ p -norm for binary and multiclass problems which results in more effective use of complementary information. We also extended stacking for both binary and multiclass datasets. We present an extensive evaluation of the fusion methods on four datasets involving kernels that are all informative and achieve state-of-the-art results on all of them.
Koniusz P, Mikolajczyk K, 2011, Spatial coordinate coding to reduce histogram representations, dominant angle and colour pyramid match, ICIP 2011, Publisher: IEEE, Pages: 661-664, ISSN: 1522-4880
Spatial Pyramid Match lies at a heart of modern object category recognition systems. Once image descriptors are expressed as histograms of visual words, they are further deployed across spatial pyramid with coarse-to-fine spatial location grids. However, such representation results in extreme histogram vectors of 200K or more elements increasing computational and memory requirements. This paper investigates alternative ways of introducing spatial information during formation of histograms. Specifically, we propose to apply spatial location information at a descriptor level and refer to it as Spatial Coordinate Coding. Alternatively, x, y, radius, or angle is used to perform semi-coding. This is achieved by adding one of the spatial components at the descriptor level whilst applying Pyramid Match to another. Lastly, we demonstrate that Pyramid Match can be applied robustly to other measurements: Dominant Angle and Colour. We demonstrate state-of-the art results on two datasets with means of Soft Assignment and Sparse Coding.
Awais M, Yan F, Mikolajczyk K, et al., 2011, Two-stage augmented kernel matrix for object recognition, MCS 2011: 10th International Workshop on Multiple Classifier Systems, Publisher: Springer, Pages: 137-146, ISSN: 0302-9743
Multiple Kernel Learning (MKL) has become a preferred choice for information fusion in image recognition problem. Aim of MKL is to learn optimal combination of kernels formed from different features, thus, to learn importance of different feature spaces for classification. Augmented Kernel Matrix (AKM) has recently been proposed to accommodate for the fact that a single training example may have different importance in different feature spaces, in contrast to MKL that assigns same weight to all examples in one feature space. However, AKM approach is limited to small datasets due to its memory requirements. We propose a novel two stage technique to make AKM applicable to large data problems. In first stage various kernels are combined into different groups automatically using kernel alignment. Next, most influential training examples are identified within each group and used to construct an AKM of significantly reduced size. This reduced size AKM leads to same results as the original AKM. We demonstrate that proposed two stage approach is memory efficient and leads to better performance than original AKM and is robust to noise. Results are compared with other state-of-the art MKL techniques, and show improvement on challenging object recognition benchmarks.
Koniusz P, Mikolajczyk K, 2011, Soft assignment of visual words as linear coordinate coding and optimisation of its reconstruction error, ICIP 2011, Publisher: IEEE, Pages: 2413-2416, ISSN: 1522-4880
Visual Word Uncertainty also referred to as Soft Assignment is a well established technique for representing images as histograms by flexible assignment of image descriptors to a visual vocabulary. Recently, an attention of the community dealing with the object category recognition has been drawn to Linear Coordinate Coding methods. In this work, we focus on Soft Assignment as it yields good results amidst competitive methods. We show that one can take two views on Soft Assignment: an approach derived from Gaussian Mixture Model or special case of Linear Coordinate Coding. The latter view helps us propose how to optimise smoothing factor of Soft Assignment in a way that minimises descriptor reconstruction error and maximises classification performance. In turns, this renders tedious cross-validation towards establishing this parameter unnecessary and yields it a handy technique. We demonstrate state-of-the-art performance of such optimised assignment on two image datasets and several types of descriptors.
This data is extracted from the Web of Science and reproduced under a licence from Thomson Reuters. You may not copy or re-distribute this data in whole or in part without the written consent of the Science business of Thomson Reuters.