Publications

Tang D, Ye Q, Yuan S, Taylor J, Kohli P, Keskin C, Kim T, Shotton Jet al., 2019, Opening the black box: hierarchical sampling optimization for hand pose estimation, IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol: 41, Pages: 2161-2175, ISSN: 0162-8828

Hand pose estimation, formulated as an inverse problem, is typically optimized by an energy function over pose parameters using a `black box' image generation procedure, knowing little about either the relationships between the parameters or the form of the energy function. In this paper, we show significant improvement upon such black box optimization by exploiting high-level knowledge of the parameter structure and using a local surrogate energy function. Our new framework, hierarchical sampling optimization (HSO), consists of a sequence of discriminative predictors organized into a kinematic hierarchy. Each predictor is conditioned on its ancestors, and generates a set of samples over a subset of the pose parameters, with only one selected by the highly-efficient surrogate energy. The selected partial poses are concatenated to generate a full-pose hypothesis. Repeating the same process, several hypotheses are generated and the full energy function selects the best result. Under the same kinematic hierarchy, two methods based on decision forest and convolutional neural network are proposed to generate the samples and two optimization methods are studied when optimizing these samples. Experimental evaluations on three publicly available datasets show that our method is particularly impressive in low-compute scenarios where it significantly outperforms all other state-of-the-art methods.

Journal article

Luo W, Stenger B, Zhao X, Kim Tet al., 2019, Trajectories as topics: multi-object tracking by topic discovery, IEEE Transactions on Image Processing, Vol: 28, Pages: 240-252, ISSN: 1057-7149

This paper proposes a new approach to multi-object tracking by semantic topic discovery. We dynamically cluster frame-by-frame detections and treat objects as topics, allowing the application of the Dirichlet process mixture model. The tracking problem is cast as a topic-discovery task, where the video sequence is treated analogously to a document. It addresses tracking issues such as object exclusivity constraints as well as tracking management without the need for heuristic thresholds. Variation of object appearance is modeled as the dynamics of word co-occurrence and handled by updating the cluster parameters across the sequence in the dynamical clustering procedure. We develop two kinds of visual representation based on super-pixel and deformable part model and integrate them into the model of automatic topic discovery for tracking rigid and non-rigid objects, respectively. In experiments on public data sets, we demonstrate the effectiveness of the proposed algorithm.

Journal article

Hodan T, Kim T-K, 2018, BOP: Benchmark for 6D Object Pose Estimation, Proc. of European Conf. on Computer Vision (ECCV)

Cite

Conference paper

Ye Q, Kim T-K, 2018, Occlusion-aware Hand Pose Estimation Using Hierarchical Mixture Density Network, Proc. of European Conf. on Computer Vision (ECCV)

Cite

Conference paper

Pei Y, Yi Y, Ma G, Kim T-K, Guo Y, Xu T, Zha Het al., 2018, Spatially Consistent Supervoxel Correspondences of Cone-Beam Computed Tomography Images, IEEE TRANSACTIONS ON MEDICAL IMAGING, Vol: 37, Pages: 2310-2321, ISSN: 0278-0062

Journal article

Liu Y, Hoai M, Shao M, Kim T-Ket al., 2018, Latent Bi-Constraint SVM for Video-Based Object Recognition, IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, Vol: 28, Pages: 3044-3052, ISSN: 1051-8215

Journal article

Gecer B, Kim T-K, Bhattarai B, Kittler Jet al., 2018, Semi-supervised Adversarial Learning to Generate Photorealistic Face Images of New Identities from 3D Morphable Model, Proc. of European Conf. on Computer Vision (ECCV)

Cite

Conference paper

Chen X, Wang G, Zhang C, Kim T, Ji Xet al., 2018, SHPR-Net: Deep Semantic Hand Pose Regression From Point Clouds, IEEE Access, ISSN: 2169-3536

Cite

Journal article

Pei Y, Yi Y, Ma G, Kim T, Xu T, Zha Het al., 2018, Finding Spatially-Consistent Supervoxel Correspondence of Cone-Beam Computed Tomography Images, IEEE Transactions on Medical Imaging, ISSN: 0278-0062

Cite

Journal article

Pei Y, Kim T-K, 2018, Finding Spatially-Consistent Supervoxel Correspondence of Cone-Beam Computed Tomography Images, IEEE Transactions on Medical Imaging, ISSN: 0278-0062

Cite

Journal article

Serrano I, Deniz O, Bueno G, Garcia-Hernando G, Kim T-Ket al., 2018, Spatio-temporal elastic cuboid trajectories for efficient fight recognition using Hough forests, MACHINE VISION AND APPLICATIONS, Vol: 29, Pages: 207-217, ISSN: 0932-8092

Journal article

Sock J, Kasaei SH, Lopes LS, Kim T-Ket al., 2018, Multi-view 6D Object Pose Estimation and Camera Motion Planning using RGBD Images, 16th IEEE International Conference on Computer Vision (ICCV), Publisher: IEEE, Pages: 2228-2235, ISSN: 2473-9936

Conference paper

Gecer B, Balntas V, Kim T-K, 2018, Learning Deep Convolutional Embeddings for Face Representation Using Joint Sample- and Set-based Supervision, 16th IEEE International Conference on Computer Vision (ICCV), Publisher: IEEE, Pages: 1665-1672, ISSN: 2473-9936

Conference paper

Tejani A, Kouskouridas R, Doumanoglou A, Tang D, Kim T-Ket al., 2018, Latent-Class Hough Forests for 6 DoF object pose estimation, IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol: 40, Pages: 119-132, ISSN: 0162-8828

In this paper we present Latent-Class Hough Forests, a method for object detection and 6 DoF pose estimation in heavily cluttered and occluded scenarios. We adapt a state of the art template matching feature into a scale-invariant patch descriptor and integrate it into a regression forest using a novel template-based split function. We train with positive samples only and we treat class distributions at the leaf nodes as latent variables. During testing we infer by iteratively updating these distributions, providing accurate estimation of background clutter and foreground occlusions and, thus, better detection rate. Furthermore, as a by-product, our Latent-Class Hough Forests can provide accurate occlusion aware segmentation masks, even in the multi-instance scenario. In addition to an existing public dataset, which contains only single-instance sequences with large amounts of clutter, we have collected two, more challenging, datasets for multiple-instance detection containing heavy 2D and 3D clutter as well as foreground occlusions. We provide extensive experiments on the various parameters of the framework such as patch size, number of trees and number of iterations to infer class distributions at test time. We also evaluate the Latent-Class Hough Forests on all datasets where we outperform state of the art methods.

Journal article

Balntas V, Doumanoglou A, Sahin C, Sock J, Kouskouridas R, Kim T-Ket al., 2017, Pose Guided RGBD Feature Learning for 3D Object Pose Estimation, 16th IEEE International Conference on Computer Vision (ICCV), Publisher: IEEE, Pages: 3876-3884, ISSN: 1550-5499

Conference paper

Yuan S, Ye Q, Stenger B, Jain S, Kim T-Ket al., 2017, BigHand2.2M Benchmark: Hand Pose Dataset and State of the Art Analysis, 30th IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Publisher: IEEE, Pages: 2605-2613, ISSN: 1063-6919

Conference paper

Garcia-Hernando G, Kim T-K, 2017, Transition Forests: Learning Discriminative Temporal Transitions for Action Recognition and Detection, 30th IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Publisher: IEEE, Pages: 407-415, ISSN: 1063-6919

Conference paper

Shi Z, Kim T-K, 2017, Learning and Refining of Privileged Information-based RNNs for Action Recognition from Depth Sequences, 30th IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Publisher: IEEE, Pages: 4684-4693, ISSN: 1063-6919

Conference paper

Tang D, Chang HJ, Tejani A, Kim T-Ket al., 2017, Latent regression forest: structured estimation of 3D hand poses, IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol: 39, Pages: 1374-1387, ISSN: 0162-8828

In this paper we present the latent regression forest (LRF), a novel framework for real-time, 3D hand pose estimation from a single depth image. Prior discriminative methods often fall into two categories: holistic and patch-based. Holistic methods are efficient but less flexible due to their nearest neighbour nature. Patch-based methods can generalise to unseen samples by consider local appearance only. However, they are complex because each pixel need to be classified or regressed during testing. In contrast to these two baselines, our method can be considered as a structured coarse-to-fine search, starting from the centre of mass of a point cloud until locating all the skeletal joints. The searching process is guided by a learnt latent tree model which reflects the hierarchical topology of the hand. Our main contributions can be summarised as follows: (i) Learning the topology of the hand in an unsupervised, data-driven manner. (ii) A new forest-based, discriminative framework for structured search in images, as well as an error regression step to avoid error accumulation. (iii) A new multi-view hand pose dataset containing 180 K annotated images from 10 different subjects. Our experiments on two datasets show that the LRF outperforms baselines and prior arts in both accuracy and efficiency.

Journal article

Sahin C, Kouskouridas R, Kim T-K, 2017, A learning-based variable size part extraction architecture for 6D object pose recovery in depth images, IMAGE AND VISION COMPUTING, Vol: 63, Pages: 38-50, ISSN: 0262-8856

Journal article

Tsiotsios A, Davison A, Kim T, 2017, Near-lighting Photometric Stereo for unknown scene distance and medium attenuation, Image and Vision Computing, ISSN: 0262-8856

Cite

Journal article

Serrano I, Garcia-Hernando G, Deniz O, Bueno G, Kim Tet al., 2017, Spatio-Temporal Elastic Cuboid Trajectories for Efficient Fight Recognition Using Hough Forests, Machine Vision and Applications, ISSN: 1432-1769

Cite

Journal article

Baek S, Kim KI, Kim T-K, 2017, Real-time Online Action Detection Forests using Spatio-temporal Contexts, 17th IEEE Winter Conference on Applications of Computer Vision (WACV), Publisher: IEEE, Pages: 158-167, ISSN: 2472-6737

Conference paper

Jang Y, Jeon I, Kim T-K, Woo Wet al., 2017, Metaphoric Hand Gestures for Orientation-Aware VR Object Manipulation With an Egocentric Viewpoint, IEEE TRANSACTIONS ON HUMAN-MACHINE SYSTEMS, Vol: 47, Pages: 113-127, ISSN: 2168-2291

Journal article

Doumanoglou A, Kouskouridas R, Malassiotis S, Kim T-Ket al., 2016, Recovering 6D Object Pose and Predicting Next-Best-View in the Crowd, 2016 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), Pages: 3583-3592, ISSN: 1063-6919

Journal article

Doumanoglou A, Stria J, Mariolis I, Kargakos A, Petrik V, Wagner L, Kim T, Hlavac V, Malassiotis Set al., 2016, Folding clothes autonomously: a complete pipeline, IEEE Transactions on Robotics, Vol: 32, Pages: 1461-1478, ISSN: 1552-3098

This work presents a complete pipeline for folding a pile of clothes using a dual-armed robot. This is a challenging task both from the viewpoint of machine vision and robotic manipulation. The presented pipeline is comprised of the following parts: isolating and picking up a single garment from a pile of crumpled garments, recognizing its category, unfolding the garment using a series of manipulations performed in the air, placing the garment roughly flat on a work table, spreading it, and, finally, folding it in several steps. The pile is segmented into separate garments using color and texture information, and the ideal grasping point is selected based on the features computed from a depth map. The recognition and unfolding of the hanging garment are performed in an active manner, utilizing the framework of active random forests to detect grasp points, while optimizing the robot actions. The spreading procedure is based on the detection of deformations of the garment's contour. The perception for folding employs fitting of polygonal models to the contour of the observed garment, both spread and already partially folded. We have conducted several experiments on the full pipeline producing very promising results. To our knowledge, this is the first work addressing the complete unfolding and folding pipeline on a variety of garments, including T-shirts, towels, and shorts.

Journal article

Tsiotsios C, Davison AJ, Kim T-K, 2016, Near-lighting Photometric Stereo for unknown scene distance and medium attenuation, IMAGE AND VISION COMPUTING, Vol: 57, Pages: 44-57, ISSN: 0262-8856

Journal article

Ye Q, Yuan S, Kim T, 2016, Spatial Attention Deep Net with Partial PSO for Hierarchical Hybrid Hand Pose Estimation, Proc. of European Conf. on Computer Vision (ECCV)

Cite

Conference paper

Tang D, Chang H, Tejani A, Kim Tet al., 2016, Latent Regression Forest: Structural Estimation of 3D Articulated Hand Posture, IEEE Transactions on Pattern Analysis and Machine Intelligence, ISSN: 0162-8828

Cite

Journal article

Chaiyasarn K, Kim T-K, Viola F, Cipolla R, Soga Ket al., 2016, Errata for “Distortion-Free Image Mosaicing for Tunnel Inspection Based on Robust Cylindrical Surface Estimation through Structure from Motion” by Krisada Chaiyasarn, Tae-Kyun Kim, Fabio Viola, Roberto Cipolla, and Kenichi Soga, Journal of Computing in Civil Engineering, Vol: 30, ISSN: 1943-5487

Cite

Journal article

DrTae-KyunKim

Contact

Location

Summary