101 results found
Tang D, Ye Q, Yuan S, et al., 2019, Opening the black box: hierarchical sampling optimization for hand pose estimation, IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol: 41, Pages: 2161-2175, ISSN: 0162-8828
Hand pose estimation, formulated as an inverse problem, is typically optimized by an energy function over pose parameters using a `black box' image generation procedure, knowing little about either the relationships between the parameters or the form of the energy function. In this paper, we show significant improvement upon such black box optimization by exploiting high-level knowledge of the parameter structure and using a local surrogate energy function. Our new framework, hierarchical sampling optimization (HSO), consists of a sequence of discriminative predictors organized into a kinematic hierarchy. Each predictor is conditioned on its ancestors, and generates a set of samples over a subset of the pose parameters, with only one selected by the highly-efficient surrogate energy. The selected partial poses are concatenated to generate a full-pose hypothesis. Repeating the same process, several hypotheses are generated and the full energy function selects the best result. Under the same kinematic hierarchy, two methods based on decision forest and convolutional neural network are proposed to generate the samples and two optimization methods are studied when optimizing these samples. Experimental evaluations on three publicly available datasets show that our method is particularly impressive in low-compute scenarios where it significantly outperforms all other state-of-the-art methods.
Luo W, Stenger B, Zhao X, et al., 2019, Trajectories as topics: multi-object tracking by topic discovery, IEEE Transactions on Image Processing, Vol: 28, Pages: 240-252, ISSN: 1057-7149
This paper proposes a new approach to multi-object tracking by semantic topic discovery. We dynamically cluster frame-by-frame detections and treat objects as topics, allowing the application of the Dirichlet process mixture model. The tracking problem is cast as a topic-discovery task, where the video sequence is treated analogously to a document. It addresses tracking issues such as object exclusivity constraints as well as tracking management without the need for heuristic thresholds. Variation of object appearance is modeled as the dynamics of word co-occurrence and handled by updating the cluster parameters across the sequence in the dynamical clustering procedure. We develop two kinds of visual representation based on super-pixel and deformable part model and integrate them into the model of automatic topic discovery for tracking rigid and non-rigid objects, respectively. In experiments on public data sets, we demonstrate the effectiveness of the proposed algorithm.
Hodan T, Kim T-K, 2018, BOP: Benchmark for 6D Object Pose Estimation, Proc. of European Conf. on Computer Vision (ECCV)
Ye Q, Kim T-K, 2018, Occlusion-aware Hand Pose Estimation Using Hierarchical Mixture Density Network, Proc. of European Conf. on Computer Vision (ECCV)
Pei Y, Yi Y, Ma G, et al., 2018, Spatially Consistent Supervoxel Correspondences of Cone-Beam Computed Tomography Images, IEEE TRANSACTIONS ON MEDICAL IMAGING, Vol: 37, Pages: 2310-2321, ISSN: 0278-0062
Liu Y, Hoai M, Shao M, et al., 2018, Latent Bi-Constraint SVM for Video-Based Object Recognition, IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, Vol: 28, Pages: 3044-3052, ISSN: 1051-8215
Gecer B, Kim T-K, Bhattarai B, et al., 2018, Semi-supervised Adversarial Learning to Generate Photorealistic Face Images of New Identities from 3D Morphable Model, Proc. of European Conf. on Computer Vision (ECCV)
Chen X, Wang G, Zhang C, et al., 2018, SHPR-Net: Deep Semantic Hand Pose Regression From Point Clouds, IEEE Access, ISSN: 2169-3536
Pei Y, Yi Y, Ma G, et al., 2018, Finding Spatially-Consistent Supervoxel Correspondence of Cone-Beam Computed Tomography Images, IEEE Transactions on Medical Imaging, ISSN: 0278-0062
Pei Y, Kim T-K, 2018, Finding Spatially-Consistent Supervoxel Correspondence of Cone-Beam Computed Tomography Images, IEEE Transactions on Medical Imaging, ISSN: 0278-0062
Serrano I, Deniz O, Bueno G, et al., 2018, Spatio-temporal elastic cuboid trajectories for efficient fight recognition using Hough forests, MACHINE VISION AND APPLICATIONS, Vol: 29, Pages: 207-217, ISSN: 0932-8092
Sock J, Kasaei SH, Lopes LS, et al., 2018, Multi-view 6D Object Pose Estimation and Camera Motion Planning using RGBD Images, 16th IEEE International Conference on Computer Vision (ICCV), Publisher: IEEE, Pages: 2228-2235, ISSN: 2473-9936
Gecer B, Balntas V, Kim T-K, 2018, Learning Deep Convolutional Embeddings for Face Representation Using Joint Sample- and Set-based Supervision, 16th IEEE International Conference on Computer Vision (ICCV), Publisher: IEEE, Pages: 1665-1672, ISSN: 2473-9936
Tejani A, Kouskouridas R, Doumanoglou A, et al., 2018, Latent-Class Hough Forests for 6 DoF object pose estimation, IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol: 40, Pages: 119-132, ISSN: 0162-8828
In this paper we present Latent-Class Hough Forests, a method for object detection and 6 DoF pose estimation in heavily cluttered and occluded scenarios. We adapt a state of the art template matching feature into a scale-invariant patch descriptor and integrate it into a regression forest using a novel template-based split function. We train with positive samples only and we treat class distributions at the leaf nodes as latent variables. During testing we infer by iteratively updating these distributions, providing accurate estimation of background clutter and foreground occlusions and, thus, better detection rate. Furthermore, as a by-product, our Latent-Class Hough Forests can provide accurate occlusion aware segmentation masks, even in the multi-instance scenario. In addition to an existing public dataset, which contains only single-instance sequences with large amounts of clutter, we have collected two, more challenging, datasets for multiple-instance detection containing heavy 2D and 3D clutter as well as foreground occlusions. We provide extensive experiments on the various parameters of the framework such as patch size, number of trees and number of iterations to infer class distributions at test time. We also evaluate the Latent-Class Hough Forests on all datasets where we outperform state of the art methods.
Balntas V, Doumanoglou A, Sahin C, et al., 2017, Pose Guided RGBD Feature Learning for 3D Object Pose Estimation, 16th IEEE International Conference on Computer Vision (ICCV), Publisher: IEEE, Pages: 3876-3884, ISSN: 1550-5499
Yuan S, Ye Q, Stenger B, et al., 2017, BigHand2.2M Benchmark: Hand Pose Dataset and State of the Art Analysis, 30th IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Publisher: IEEE, Pages: 2605-2613, ISSN: 1063-6919
Shi Z, Kim T-K, 2017, Learning and Refining of Privileged Information-based RNNs for Action Recognition from Depth Sequences, 30th IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Publisher: IEEE, Pages: 4684-4693, ISSN: 1063-6919
Garcia-Hernando G, Kim T-K, 2017, Transition Forests: Learning Discriminative Temporal Transitions for Action Recognition and Detection, 30th IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Publisher: IEEE, Pages: 407-415, ISSN: 1063-6919
Tang D, Chang HJ, Tejani A, et al., 2017, Latent regression forest: structured estimation of 3D hand poses, IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol: 39, Pages: 1374-1387, ISSN: 0162-8828
In this paper we present the latent regression forest (LRF), a novel framework for real-time, 3D hand pose estimation from a single depth image. Prior discriminative methods often fall into two categories: holistic and patch-based. Holistic methods are efficient but less flexible due to their nearest neighbour nature. Patch-based methods can generalise to unseen samples by consider local appearance only. However, they are complex because each pixel need to be classified or regressed during testing. In contrast to these two baselines, our method can be considered as a structured coarse-to-fine search, starting from the centre of mass of a point cloud until locating all the skeletal joints. The searching process is guided by a learnt latent tree model which reflects the hierarchical topology of the hand. Our main contributions can be summarised as follows: (i) Learning the topology of the hand in an unsupervised, data-driven manner. (ii) A new forest-based, discriminative framework for structured search in images, as well as an error regression step to avoid error accumulation. (iii) A new multi-view hand pose dataset containing 180 K annotated images from 10 different subjects. Our experiments on two datasets show that the LRF outperforms baselines and prior arts in both accuracy and efficiency.
Sahin C, Kouskouridas R, Kim T-K, 2017, A learning-based variable size part extraction architecture for 6D object pose recovery in depth images, IMAGE AND VISION COMPUTING, Vol: 63, Pages: 38-50, ISSN: 0262-8856
Serrano I, Garcia-Hernando G, Deniz O, et al., 2017, Spatio-Temporal Elastic Cuboid Trajectories for Efficient Fight Recognition Using Hough Forests, Machine Vision and Applications, ISSN: 1432-1769
Tsiotsios A, Davison A, Kim T, 2017, Near-lighting Photometric Stereo for unknown scene distance and medium attenuation, Image and Vision Computing, ISSN: 0262-8856
Baek S, Kim KI, Kim T-K, 2017, Real-time Online Action Detection Forests using Spatio-temporal Contexts, 17th IEEE Winter Conference on Applications of Computer Vision (WACV), Publisher: IEEE, Pages: 158-167, ISSN: 2472-6737
Jang Y, Jeon I, Kim T-K, et al., 2017, Metaphoric Hand Gestures for Orientation-Aware VR Object Manipulation With an Egocentric Viewpoint, IEEE TRANSACTIONS ON HUMAN-MACHINE SYSTEMS, Vol: 47, Pages: 113-127, ISSN: 2168-2291
Doumanoglou A, Kouskouridas R, Malassiotis S, et al., 2016, Recovering 6D Object Pose and Predicting Next-Best-View in the Crowd, 2016 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), Pages: 3583-3592, ISSN: 1063-6919
This work presents a complete pipeline for folding a pile of clothes using a dual-armed robot. This is a challenging task both from the viewpoint of machine vision and robotic manipulation. The presented pipeline is comprised of the following parts: isolating and picking up a single garment from a pile of crumpled garments, recognizing its category, unfolding the garment using a series of manipulations performed in the air, placing the garment roughly flat on a work table, spreading it, and, finally, folding it in several steps. The pile is segmented into separate garments using color and texture information, and the ideal grasping point is selected based on the features computed from a depth map. The recognition and unfolding of the hanging garment are performed in an active manner, utilizing the framework of active random forests to detect grasp points, while optimizing the robot actions. The spreading procedure is based on the detection of deformations of the garment's contour. The perception for folding employs fitting of polygonal models to the contour of the observed garment, both spread and already partially folded. We have conducted several experiments on the full pipeline producing very promising results. To our knowledge, this is the first work addressing the complete unfolding and folding pipeline on a variety of garments, including T-shirts, towels, and shorts.
Tsiotsios C, Davison AJ, Kim T-K, 2016, Near-lighting Photometric Stereo for unknown scene distance and medium attenuation, IMAGE AND VISION COMPUTING, Vol: 57, Pages: 44-57, ISSN: 0262-8856
Ye Q, Yuan S, Kim T, 2016, Spatial Attention Deep Net with Partial PSO for Hierarchical Hybrid Hand Pose Estimation, Proc. of European Conf. on Computer Vision (ECCV)
Tang D, Chang H, Tejani A, et al., 2016, Latent Regression Forest: Structural Estimation of 3D Articulated Hand Posture, IEEE Transactions on Pattern Analysis and Machine Intelligence, ISSN: 0162-8828
Chaiyasarn K, Kim T-K, Viola F, et al., 2016, Errata for “Distortion-Free Image Mosaicing for Tunnel Inspection Based on Robust Cylindrical Surface Estimation through Structure from Motion” by Krisada Chaiyasarn, Tae-Kyun Kim, Fabio Viola, Roberto Cipolla, and Kenichi Soga, Journal of Computing in Civil Engineering, Vol: 30, ISSN: 1943-5487
This data is extracted from the Web of Science and reproduced under a licence from Thomson Reuters. You may not copy or re-distribute this data in whole or in part without the written consent of the Science business of Thomson Reuters.