Publications

Chang H, Garcia-Hernando G, Tang D, Kim Tet al., 2016, Spatio-Temporal Hough Forest for Efficient Detection-Localisation-Recognition of Fingerwriting in Egocentric Camera, Computer Vision and Image Understanding, ISSN: 1077-3142

Cite

Journal article

Chang HJ, Garcia-Hernando G, Tang D, Kim T-Ket al., 2016, Spatio-Temporal Hough Forest for efficient detection-localisation-recognition of fingerwriting in egocentric camera, Computer Vision and Image Understanding, Vol: 148, Pages: 87-96, ISSN: 1090-235X

Recognising fingerwriting in mid-air is a useful input tool for wearable egocentric camera. In this paper we propose a novel framework to this purpose. Specifically, our method first detects a writing hand posture and locates the position of index fingertip in each frame. From the trajectory of the fingertip, the written character is localised and recognised simultaneously. To achieve this challenging task, we first present a contour-based view independent hand posture descriptor extracted with a novel signature function. The proposed descriptor serves both posture recognition and fingertip detection. As to recognising characters from trajectories, we propose Spatio-Temporal Hough Forest that takes sequential data as input and perform regression on both spatial and temporal domain. Therefore our method can perform character recognition and localisation simultaneously. To establish our contributions, a new handwriting-in-mid-air dataset with labels for postures, fingertips and character locations is proposed. We design and conduct experiments of posture estimation, fingertip detection, character recognition and localisation. In all experiments our method demonstrates superior accuracy and robustness compared to prior arts.

Abstract
Cite

Journal article

Tsiotsios C, Kim TK, Davison AJ, Narasimhan SGet al., 2016, Model effectiveness prediction and system adaptation for photometric stereo in murky water, Computer Vision and Image Understanding, Vol: 150, Pages: 126-138, ISSN: 1090-235X

In murky water, the light interaction with the medium particles results in a complex image formation model that is hard to use effectively with a shape estimation framework like Photometric Stereo. All previous approaches have resorted to necessary model simplifications that were though used arbitrarily, without describing how their validity can be estimated in an unknown underwater situation. In this work, we evaluate the effectiveness of such simplified models and we show that this varies strongly with the imaging conditions. For this reason, we propose a novel framework that can predict the effectiveness of a photometric model when the scene is unknown. To achieve this we use a dynamic lighting framework where a robotic platform is able to probe the scene with varying light positions, and the respective change in estimated surface normals serves as a faithful proxy of the true reconstruction error. This creates important benefits over traditional Photometric Stereo frameworks, as our system can adapt some critical factors to an underwater scenario, such as the camera-scene distance and the light position or the photometric model, in order to minimize the reconstruction error. Our work is evaluated through both numerical simulations and real experiments for different distances, underwater visibilities and light source baselines.

Abstract
Cite

Journal article

Xiong C, Liu L, Zhao X, Yan S, Kim T-Ket al., 2016, Convolutional Fusion Network for Face Verification in the Wild, IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, Vol: 26, Pages: 517-528, ISSN: 1051-8215

Author Web Link
Cite
Citations: 24

Journal article

Doumanoglou A, Kouskouridas R, Malassiotis S, Kim Tet al., 2016, 6D Object Detection and Next-Best-View Prediction in the Crowd, IEEE Conf. on Computer Vision and Pattern Recognition (CVPR)

Cite

Conference paper

Garcia-Hernando G, Chang H, Serrano I, Deniz O, Kim Tet al., 2016, Transition Hough Forest for Trajectory-based Action Recognition, IEEE Winter Conference on Applications of Computer Vision (WACV)

Cite

Conference paper

Sahin C, Kouskouridas R, Kim T, 2016, Iterative Hough Forest with Histogram of Control Points for 6 DoF Object Registration from Depth Images, IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS)

Cite

Conference paper

Jang Y, Jeon I, Kim T, Woo Wet al., 2016, SD Gesture: Static and Dynamic Gesture Estimation for Manipulating a Function-Equipped AR Object, IEEE Trans. on Human-Machine Systems

Cite

Journal article

Jang Y, Jeon I, Kim T, Woo Wet al., 2016, Symbolic Hand Gesture Interface in Wearable AR, Asia-Pacific Workshop on Mixed Reality (APMR)

Cite

Conference paper

Tang D, Chang H, Tejani A, Kim Tet al., 2016, Latent Regression Forest: Structural Estimation of 3D Articulated Hand Posture, IEEE Trans. on Pattern Analysis and Machine Intelligence (TPAMI)

Cite

Journal article

Sahin C, Kouskouridas R, Kim T-K, 2016, Iterative Hough Forest with Histogram of Control Points for 6 DoF Object Registration from Depth Images, 2016 IEEE/RSJ INTERNATIONAL CONFERENCE ON INTELLIGENT ROBOTS AND SYSTEMS (IROS 2016), Pages: 4113-4118

Author Web Link
Cite
Citations: 4

Journal article

Lee K, Ognibene D, Chang H, Kim T-K, Demiris Yet al., 2015, STARE: Spatio-Temporal Attention Relocation for multiple structured activities detection, IEEE Transactions on Image Processing, Vol: 24, Pages: 5916-5927, ISSN: 1057-7149

We present a spatio-temporal attention relocation (STARE) method, an information-theoretic approach for efficient detection of simultaneously occurring structured activities. Given multiple human activities in a scene, our method dynamically focuses on the currently most informative activity. Each activity can be detected without complete observation, as the structure of sequential actions plays an important role on making the system robust to unattended observations. For such systems, the ability to decide where and when to focus is crucial to achieving high detection performances under resource bounded condition. Our main contributions can be summarized as follows: 1) information-theoretic dynamic attention relocation framework that allows the detection of multiple activities efficiently by exploiting the activity structure information and 2) a new high-resolution data set of temporally-structured concurrent activities. Our experiments on applications show that the STARE method performs efficiently while maintaining a reasonable level of accuracy.

Journal article

Liu Y, Kouskouridas R, Kim T, 2015, Video-based Object Recognition with Weekly Supervised Object Localisation, Asian Conf. on Pattern Recognition

Cite

Conference paper

Chaiyasarn K, Kim T-K, Viola F, Cipolla R, Soga Ket al., 2015, Distortion-Free Image Mosaicing for Tunnel Inspection Based on Robust Cylindrical Surface Estimation through Structure from Motion, Journal of Computing in Civil Engineering, Vol: 30, ISSN: 1943-5487

Visual inspection, although labor-intensive, costly, and inaccurate, is a common practice used in the condition assessment of underground tunnels to ensure safety and serviceability. This paper presents a system that- can construct a mosaic image of a tunnel surface with little distortion, allowing a large area of tunnels to be visualized, and enabling tunnel inspection to be carried out off-line. The system removes distortion by a robust estimation of a tunnel surface through structure from motion (SFM), which can create a 3D point cloud of the tunnel surface from uncalibrated images. SFM enables the mosaicing system to cope with images with a general camera motion, in contrast to standard mosaicing software that can cope only with a strict camera motion. The estimation of the tunnel surface is further improved by support vector machine (SVM), which is used to remove noise in the point cloud. Some curvatures are observed in the mosaics when an inaccurate surface is used for mosaicing, whereas the mosaics from a surface estimated using the proposed method are almost distortion-free, preserving all physical attributes, e.g., line parallelism and straightness, which is important for tunnel inspection.

Abstract
Cite

Journal article

Shao M, Tang D, Liu Y, Kim T-Ket al., 2015, A comparative study of video-based object recognition from an egocentric viewpoint, Neurocomputing, Vol: 171, Pages: 982-990, ISSN: 1872-8286

Videos tend to yield a more complete description of their content than individual images. And egocentric vision often provides a more controllable and practical perspective for capturing useful information. In this study, we presented new insights into different object recognition methods for video-based rigid object instance recognition. In order to better exploit egocentric videos as training and query sources, diverse state-of-the-art techniques were categorised, extended and evaluated empirically using a newly collected video dataset, which consists of complex sculptures in clutter scenes. In particular, we investigated how to utilise the geometric and temporal cues provided by egocentric video sequences to improve the performance of object recognition. Based on the experimental results, we analysed the pros and cons of these methods and reached the following conclusions. For geometric cues, the 3D object structure learnt from a training video dataset improves the average video classification performance dramatically. By contrast, for temporal cues, tracking visual fixation among video sequences has little impact on the accuracy, but significantly reduces the memory consumption by obtaining a better signal-to-noise ratio for the feature points detected in the query frames. Furthermore, we proposed a method that integrated these two important cues to exploit the advantages of both.

Abstract
Cite

Journal article

Serrano Gracia I, Deniz Suarez O, Bueno Garcia G, Kim T-Ket al., 2015, Fast Fight Detection, PLOS One, Vol: 10, ISSN: 1932-6203

Action recognition has become a hot topic within computer vision. However, the action recognition community has focused mainly on relatively simple actions like clapping, walking, jogging, etc. The detection of specific events with direct practical use such as fights or in general aggressive behavior has been comparatively less studied. Such capability may be extremely useful in some video surveillance scenarios like prisons, psychiatric centers or even embedded in camera phones. As a consequence, there is growing interest in developing violence detection algorithms. Recent work considered the well-known Bag-of-Words framework for the specific problem of fight detection. Under this framework, spatio-temporal features are extracted from the video sequences and used for classification. Despite encouraging results in which high accuracy rates were achieved, the computational cost of extracting such features is prohibitive for practical applications. This work proposes a novel method to detect violence sequences. Features extracted from motion blobs are used to discriminate fight and non-fight sequences. Although the method is outperformed in accuracy by state of the art, it has a significantly faster computation time thus making it amenable for real-time applications.

Journal article

Jang Y, Noh S-T, Chang HJ, Kim T-K, Woo Wet al., 2015, 3D Finger CAPE: Clicking Action and Position Estimation under Self-Occlusions in Egocentric Viewpoint, IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS, Vol: 21, Pages: 501-510, ISSN: 1077-2626

Author Web Link
Cite
Citations: 60

Journal article

Luo W, Stenger B, Zhao X, Kim Tet al., 2015, Automatic Topic Discovery for Multi-object Tracking, Proc. of the Association for the Advancement of Artificial Intelligence (AAAI)

Cite

Conference paper

Tang D, Taylor J, Kohli P, Keskin C, Kim T-K, Shotton Jet al., 2015, Opening the Black Box: Hierarchical Sampling Optimization for Estimating Human Hand Pose, IEEE International Conference on Computer Vision, Publisher: IEEE, Pages: 3325-3333, ISSN: 1550-5499

Author Web Link
Cite
Citations: 69

Conference paper

Xiong C, Zhao X, Tang D, Jayashree K, Yan S, Kim T-Ket al., 2015, Conditional Convolutional Neural Network for Modality-aware Face Recognition, IEEE International Conference on Computer Vision, Publisher: IEEE, Pages: 3667-3675, ISSN: 1550-5499

Author Web Link
Cite
Citations: 49

Conference paper

Hameed MZ, Garcia-Hernando G, Kim T-K, 2015, Novel Spatio-temporal Features for Fingertip Writing Recognition in Egocentric Viewpoint, 14th IAPR International Conference on Machine Vision Applications (MVA), Publisher: IEEE, Pages: 484-488

Author Web Link
Cite
Citations: 2

Conference paper

Xiong C, Gao G, Zha Z, Yan S, Ma H, Kim T-Ket al., 2014, Adaptive Learning for Celebrity Identification With Video Context, IEEE TRANSACTIONS ON MULTIMEDIA, Vol: 16, Pages: 1473-1485, ISSN: 1520-9210

Author Web Link
Cite
Citations: 6

Journal article

Doumanoglou A, Kargakos A, Kim T, Malassiotis Set al., 2014, Autonomous Active Recognition and Unfolding of Clothes using Random Decision Forests and Probabilistic Planning, IEEE Int. Conf. on Robotics and Automation (ICRA)

Cite

Conference paper

Deniz O, Serrano I, Bueno G, Kim T-Ket al., 2014, Fast Violence Detection in Video, The 9th International Conference on Computer Vision Theory and Applications (VISAPP)

Cite

Conference paper

Tsiotsios C, Angelopoulou ME, Kim T-K, Davison AJet al., 2014, Backscatter Compensated Photometric Stereo with 3 Sources, 27th IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Publisher: IEEE, Pages: 2259-2266, ISSN: 1063-6919

Author Web Link
Cite
Citations: 36

Conference paper

Tejani A, Tang D, Kouskouridas R, Kim T-Ket al., 2014, Latent-Class Hough Forests for 3D Object Detection and Pose Estimation, 13th European Conference on Computer Vision (ECCV), Publisher: SPRINGER-VERLAG BERLIN, Pages: 462-477, ISSN: 0302-9743

Author Web Link
Cite
Citations: 167

Conference paper

Doumanoglou A, Kim T-K, Zhao X, Malassiotis Set al., 2014, Active Random Forests: An Application to Autonomous Unfolding of Clothes, 13th European Conference on Computer Vision (ECCV), Publisher: SPRINGER INTERNATIONAL PUBLISHING AG, Pages: 644-658, ISSN: 0302-9743

Author Web Link
Cite
Citations: 26

Conference paper

Liu Y, Jang Y, Woo W, Kim T-Ket al., 2014, Video-based Object Recognition using Novel Set-of-Sets Representations, 27th IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Publisher: IEEE, Pages: 533-+, ISSN: 2160-7508

Author Web Link
Cite
Citations: 5

Conference paper

Hoo WL, Kim T-K, Pei Y, Chan CSet al., 2014, Enhanced Random Forest with Image/Patch-Level Learning for Image Understanding, 22nd International Conference on Pattern Recognition (ICPR), Publisher: IEEE COMPUTER SOC, Pages: 3434-3439, ISSN: 1051-4651

Author Web Link
Cite
Citations: 3

Conference paper

Zhao X, Kim T-K, Luo W, 2014, Unified Face Analysis by Iterative Multi-Output Random Forests, 27th IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Publisher: IEEE, Pages: 1765-1772, ISSN: 1063-6919

Author Web Link
Cite
Citations: 32

Conference paper

DrTae-KyunKim

Contact

Location

Summary