A real-time visual SLAM system capable of semantically annotating a dense 3D scene.
In this project, we address the challenge of semantic maps by combining Convolutional Neural Networks (CNNs) and a state of the art dense Simultaneous Localisation and Mapping (SLAM) system, ElasticFusion, which provides long-term dense correspondence between frames of indoor RGB-D video even during loopy scanning trajectories. These correspondences allow the CNN’s semantic predictions from multiple view points to be probabilistically fused into a map. This not only produces a useful semantic 3D map, but we also show on the NYUv2 dataset that fusing multiple predictions leads to an improvement even in the 2D semantic labelling over baseline single frame predictions. We also show that for a smaller reconstruction dataset with larger variation in prediction viewpoint, the improvement over single frame segmentation increases. Our system is efficient enough to allow real-time interactive use at frame-rates of ≈25Hz.
John McCormac, Ankur Handa, Andrew J Davison, Stefan Leutenegger. SemanticFusion: Dense 3D Semantic Mapping with Convolutional Neural Networks. IEEE International Conference on Robotics and Automation (ICRA), 2017
The SemanticFusion software is available through the link on the right and is free to be used for non-commercial purposes. Full terms and conditions which govern its use are detailed here.