101 results found
Chang HJ, Garcia-Hernando G, Tang D, et al., 2016, Spatio-Temporal Hough Forest for efficient detection-localisation-recognition of fingerwriting in egocentric camera, Computer Vision and Image Understanding, Vol: 148, Pages: 87-96, ISSN: 1090-235X
Recognising fingerwriting in mid-air is a useful input tool for wearable egocentric camera. In this paper we propose a novel framework to this purpose. Specifically, our method first detects a writing hand posture and locates the position of index fingertip in each frame. From the trajectory of the fingertip, the written character is localised and recognised simultaneously. To achieve this challenging task, we first present a contour-based view independent hand posture descriptor extracted with a novel signature function. The proposed descriptor serves both posture recognition and fingertip detection. As to recognising characters from trajectories, we propose Spatio-Temporal Hough Forest that takes sequential data as input and perform regression on both spatial and temporal domain. Therefore our method can perform character recognition and localisation simultaneously. To establish our contributions, a new handwriting-in-mid-air dataset with labels for postures, fingertips and character locations is proposed. We design and conduct experiments of posture estimation, fingertip detection, character recognition and localisation. In all experiments our method demonstrates superior accuracy and robustness compared to prior arts.
Tsiotsios C, Kim TK, Davison AJ, et al., 2016, Model effectiveness prediction and system adaptation for photometric stereo in murky water, Computer Vision and Image Understanding, Vol: 150, Pages: 126-138, ISSN: 1090-235X
In murky water, the light interaction with the medium particles results in a complex image formation model that is hard to use effectively with a shape estimation framework like Photometric Stereo. All previous approaches have resorted to necessary model simplifications that were though used arbitrarily, without describing how their validity can be estimated in an unknown underwater situation. In this work, we evaluate the effectiveness of such simplified models and we show that this varies strongly with the imaging conditions. For this reason, we propose a novel framework that can predict the effectiveness of a photometric model when the scene is unknown. To achieve this we use a dynamic lighting framework where a robotic platform is able to probe the scene with varying light positions, and the respective change in estimated surface normals serves as a faithful proxy of the true reconstruction error. This creates important benefits over traditional Photometric Stereo frameworks, as our system can adapt some critical factors to an underwater scenario, such as the camera-scene distance and the light position or the photometric model, in order to minimize the reconstruction error. Our work is evaluated through both numerical simulations and real experiments for different distances, underwater visibilities and light source baselines.
Xiong C, Liu L, Zhao X, et al., 2016, Convolutional Fusion Network for Face Verification in the Wild, IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, Vol: 26, Pages: 517-528, ISSN: 1051-8215
Doumanoglou A, Stria J, Mariolis I, et al., Folding Clothes Autonomously: A Complete Pipeline, IEEE Trans. on Robotics (TRO)
Jang Y, Jeon I, Kim T, et al., SD Gesture: Static and Dynamic Gesture Estimation for Manipulating a Function-Equipped AR Object, IEEE Trans. on Human-Machine Systems
Jang Y, Jeon I, Kim T, et al., Symbolic Hand Gesture Interface in Wearable AR, Asia-Pacific Workshop on Mixed Reality (APMR)
Sahin C, Kouskouridas R, Kim T, Iterative Hough Forest with Histogram of Control Points for 6 DoF Object Registration from Depth Images, IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS)
Garcia-Hernando G, Chang H, Serrano I, et al., Transition Hough Forest for Trajectory-based Action Recognition, IEEE Winter Conference on Applications of Computer Vision (WACV)
Doumanoglou A, Kouskouridas R, Malassiotis S, et al., 6D Object Detection and Next-Best-View Prediction in the Crowd, IEEE Conf. on Computer Vision and Pattern Recognition (CVPR)
Sahin C, Kouskouridas R, Kim T-K, 2016, Iterative Hough Forest with Histogram of Control Points for 6 DoF Object Registration from Depth Images, 2016 IEEE/RSJ INTERNATIONAL CONFERENCE ON INTELLIGENT ROBOTS AND SYSTEMS (IROS 2016), Pages: 4113-4118
Tang D, Chang H, Tejani A, et al., Latent Regression Forest: Structural Estimation of 3D Articulated Hand Posture, IEEE Trans. on Pattern Analysis and Machine Intelligence (TPAMI)
Liu Y, Kouskouridas R, Kim T, Video-based Object Recognition with Weekly Supervised Object Localisation, Asian Conf. on Pattern Recognition
Lee K, Ognibene D, Chang H, et al., 2015, STARE: Spatio-Temporal Attention Relocation for Multiple Structured Activities Detection, IEEE Transactions on Image Processing, Vol: 24, Pages: 5916-5927, ISSN: 1057-7149
We present a spatio-temporal attention relocation (STARE) method, an information-theoretic approach for efficient detection of simultaneously occurring structured activities. Given multiple human activities in a scene, our method dynamically focuses on the currently most informative activity. Each activity can be detected without complete observation, as the structure of sequential actions plays an important role on making the system robust to unattended observations. For such systems, the ability to decide where and when to focus is crucial to achieving high detection performances under resource bounded condition. Our main contributions can be summarized as follows: 1) information-theoretic dynamic attention relocation framework that allows the detection of multiple activities efficiently by exploiting the activity structure information and 2) a new high-resolution data set of temporally-structured concurrent activities. Our experiments on applications show that the STARE method performs efficiently while maintaining a reasonable level of accuracy.
Chaiyasarn K, Kim T-K, Viola F, et al., 2015, Distortion-Free Image Mosaicing for Tunnel Inspection Based on Robust Cylindrical Surface Estimation through Structure from Motion, Journal of Computing in Civil Engineering, Vol: 30, ISSN: 1943-5487
Visual inspection, although labor-intensive, costly, and inaccurate, is a common practice used in the condition assessment of underground tunnels to ensure safety and serviceability. This paper presents a system that- can construct a mosaic image of a tunnel surface with little distortion, allowing a large area of tunnels to be visualized, and enabling tunnel inspection to be carried out off-line. The system removes distortion by a robust estimation of a tunnel surface through structure from motion (SFM), which can create a 3D point cloud of the tunnel surface from uncalibrated images. SFM enables the mosaicing system to cope with images with a general camera motion, in contrast to standard mosaicing software that can cope only with a strict camera motion. The estimation of the tunnel surface is further improved by support vector machine (SVM), which is used to remove noise in the point cloud. Some curvatures are observed in the mosaics when an inaccurate surface is used for mosaicing, whereas the mosaics from a surface estimated using the proposed method are almost distortion-free, preserving all physical attributes, e.g., line parallelism and straightness, which is important for tunnel inspection.
Shao M, Tang D, Liu Y, et al., 2015, A comparative study of video-based object recognition from an egocentric viewpoint, Neurocomputing, Vol: 171, Pages: 982-990, ISSN: 1872-8286
Videos tend to yield a more complete description of their content than individual images. And egocentric vision often provides a more controllable and practical perspective for capturing useful information. In this study, we presented new insights into different object recognition methods for video-based rigid object instance recognition. In order to better exploit egocentric videos as training and query sources, diverse state-of-the-art techniques were categorised, extended and evaluated empirically using a newly collected video dataset, which consists of complex sculptures in clutter scenes. In particular, we investigated how to utilise the geometric and temporal cues provided by egocentric video sequences to improve the performance of object recognition. Based on the experimental results, we analysed the pros and cons of these methods and reached the following conclusions. For geometric cues, the 3D object structure learnt from a training video dataset improves the average video classification performance dramatically. By contrast, for temporal cues, tracking visual fixation among video sequences has little impact on the accuracy, but significantly reduces the memory consumption by obtaining a better signal-to-noise ratio for the feature points detected in the query frames. Furthermore, we proposed a method that integrated these two important cues to exploit the advantages of both.
Action recognition has become a hot topic within computer vision. However, the action recognition community has focused mainly on relatively simple actions like clapping, walking, jogging, etc. The detection of specific events with direct practical use such as fights or in general aggressive behavior has been comparatively less studied. Such capability may be extremely useful in some video surveillance scenarios like prisons, psychiatric centers or even embedded in camera phones. As a consequence, there is growing interest in developing violence detection algorithms. Recent work considered the well-known Bag-of-Words framework for the specific problem of fight detection. Under this framework, spatio-temporal features are extracted from the video sequences and used for classification. Despite encouraging results in which high accuracy rates were achieved, the computational cost of extracting such features is prohibitive for practical applications. This work proposes a novel method to detect violence sequences. Features extracted from motion blobs are used to discriminate fight and non-fight sequences. Although the method is outperformed in accuracy by state of the art, it has a significantly faster computation time thus making it amenable for real-time applications.
Jang Y, Noh S-T, Chang HJ, et al., 2015, 3D Finger CAPE: Clicking Action and Position Estimation under Self-Occlusions in Egocentric Viewpoint, IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS, Vol: 21, Pages: 501-510, ISSN: 1077-2626
Tang D, Taylor J, Kohli P, et al., 2015, Opening the Black Box: Hierarchical Sampling Optimization for Estimating Human Hand Pose, IEEE International Conference on Computer Vision, Publisher: IEEE, Pages: 3325-3333, ISSN: 1550-5499
Hameed MZ, Garcia-Hernando G, Kim T-K, 2015, Novel Spatio-temporal Features for Fingertip Writing Recognition in Egocentric Viewpoint, 14th IAPR International Conference on Machine Vision Applications (MVA), Publisher: IEEE, Pages: 484-488
Luo W, Stenger B, Zhao X, et al., Automatic Topic Discovery for Multi-object Tracking, Proc. of the Association for the Advancement of Artificial Intelligence (AAAI)
Xiong C, Zhao X, Tang D, et al., 2015, Conditional Convolutional Neural Network for Modality-aware Face Recognition, IEEE International Conference on Computer Vision, Publisher: IEEE, Pages: 3667-3675, ISSN: 1550-5499
Xiong C, Gao G, Zha Z, et al., 2014, Adaptive Learning for Celebrity Identification With Video Context, IEEE TRANSACTIONS ON MULTIMEDIA, Vol: 16, Pages: 1473-1485, ISSN: 1520-9210
Doumanoglou A, Kargakos A, Kim T, et al., Autonomous Active Recognition and Unfolding of Clothes using Random Decision Forests and Probabilistic Planning, IEEE Int. Conf. on Robotics and Automation (ICRA)
Deniz O, Serrano I, Bueno G, et al., 2014, Fast Violence Detection in Video, The 9th International Conference on Computer Vision Theory and Applications (VISAPP)
Tsiotsios C, Angelopoulou ME, Kim T-K, et al., 2014, Backscatter Compensated Photometric Stereo with 3 Sources, 27th IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Publisher: IEEE, Pages: 2259-2266, ISSN: 1063-6919
Tejani A, Tang D, Kouskouridas R, et al., 2014, Latent-Class Hough Forests for 3D Object Detection and Pose Estimation, 13th European Conference on Computer Vision (ECCV), Publisher: SPRINGER-VERLAG BERLIN, Pages: 462-477, ISSN: 0302-9743
Doumanoglou A, Kim T-K, Zhao X, et al., 2014, Active Random Forests: An Application to Autonomous Unfolding of Clothes, 13th European Conference on Computer Vision (ECCV), Publisher: SPRINGER INTERNATIONAL PUBLISHING AG, Pages: 644-658, ISSN: 0302-9743
Liu Y, Jang Y, Woo W, et al., 2014, Video-based Object Recognition using Novel Set-of-Sets Representations, 27th IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Publisher: IEEE, Pages: 533-+, ISSN: 2160-7508
Hoo WL, Kim T-K, Pei Y, et al., 2014, Enhanced Random Forest with Image/Patch-Level Learning for Image Understanding, 22nd International Conference on Pattern Recognition (ICPR), Publisher: IEEE COMPUTER SOC, Pages: 3434-3439, ISSN: 1051-4651
Zhao X, Kim T-K, Luo W, 2014, Unified Face Analysis by Iterative Multi-Output Random Forests, 27th IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Publisher: IEEE, Pages: 1765-1772, ISSN: 1063-6919
This data is extracted from the Web of Science and reproduced under a licence from Thomson Reuters. You may not copy or re-distribute this data in whole or in part without the written consent of the Science business of Thomson Reuters.