Randomly Connected Neural Network for Self-Supervised Monocular Depth Estimation

15 November 2021

Randomly Connected Neural Networks for Self-Supervised Monocular Depth Estimation

A Hamlyn research team that introduced a novel deep learning method has won the Outstanding Paper Award at AECAI-CARE-OR2.0 workshop in MICCAI 2021.

In computing vision, depth estimation is one of the fundamental problems in terms of predicting the distance of each point in a captured scene relative to the camera. Depth is essential in a divergent range of applications including robotic vision, surgical navigation and mixed reality.

Recently, deep learning models have become very attractive in the depth perception workflow as they provide end-to-end solutions for depth and disparity estimation. In monocular depth estimation, most self-supervised methods adopt the U-Net-based encoder-decoder architecture DispNet, which generates a high-resolution photometric output.

Lead author of this award paper Sam Tukra said: "typically a U-Net model architecture is always used for self-supervised depth estimation. However this architecture may be suboptimal since other studies (that focused on replacing methods) have shown, replacing the encoder part of U-Net with more complex architectures improves results. Typically these improvements come due to the innovative ways the network is wired."

Novel Randomly Connected Neural Networks for Self-Supervised Monocular Depth Estimation

In the light of this, our research team at the Hamlyn Centre proposed a novel randomly connected encoder-decoder architecture for self-supervised monocular depth estimation.

The proposed encoder-decoder architecture. The solid and the dotted lines denote forward propagation and skip connections, respectively. The purple lines signify the output left and right disparity maps generated at 4 scales, each increasing hierarchically with a scale factor of 2. — The proposed encoder-decoder architecture.

"In order to improve depth estimation we simply challenged the notion I mentioned above by experimenting with randomly connecting neural networks (i.e. no limitation in how these networks are wired). This way we can attain a unique powerful architectures for the specific task (i.e. depth estimation in surgery)" Sam Tukra said.

Inspired by graph theory, our researchers created randomly connected neural networks by modelling them as graphs, where each node is essentially a convolution layer.

The process of generating a randomly connected deep-learning architecture. The red and the grey nodes are the input and the output nodes of the block, respectively. The black nodes are the convolution operations. The purple module is a multi-head self-attention module. — The process of generating a randomly connected deep-learning architecture.

Our researchers simply connected these nodes by using a random graph generator algorithm. Once the graph is created, they converted them into a neural network in a deep learning library (i.e. PyTorch).

To enable efficient search in the connection space, the ‘cascaded random search’ approach is introduced for the generation of random network architectures.

Moreover, our researchers also introduced a new variant of the U-Net topology to improve the expressive power of skip connection feature maps, both spatially and semantically.

"We hope that the proposed methodology guides researchers in understanding how to improve monocular depth estimation without reliance on capturing arduous ground truth." Sam Tukra

Unlike standard U-Net, this new concept included convolution (learnable layers) in the skip connections itself. Our researchers therefore can improve the usage of deep semantic features in the encoder feature maps, which are usually stored in the channels space but not explicitly used.

Last but not least, as our researchers realised that multi-scale loss functions are vital for improvement in image reconstruction, they created a new loss function to improve on the image reconstruction quality.

The new loss function efficiently extends deep feature adversarial and perceptual loss to multiple scales for high fidelity view synthesis and error calculation.

Discriminator model. The solid black lines and the dotted lines denote forward propagation and skip connections, respectively. The skip connection inputs are concatenated with the forward propagating feature map prior to being processed by the next layer. The pink lines show the extraction of multi-scale feature maps. — Discriminator model.

Our research team conducted performance evaluation on two surgical datasets, including comparisons to state-of-the-art self-supervised depth estimation methods.

Comparison of depth maps generated by our model with Godard et al. (2017) and Pilzer et al. (2018) on the Hamlyn test split.

In conclusion, the research results verify that even a randomly connected network with standard convolution operations but innovative inter-connections can learn the task well. Additionally, this research also showed that multi-scale penalty in loss functions is vital for generating finer details.

Our researchers believe that this study will aid in further research when it comes to neural network architecture design. it might also be helpful for those studies that aim to shift from the standard U-Net as well as manual trial and error based approaches to more automated design methods.

FIND OUT MORE >>

Samyakh Tukra and Stamatia Giannarou, "Randomly connected neural networks for self-supervised monocular depth estimation", Computer Methods in Biomechanics and Biomedical Engineering: Imaging & Visualization, November 2021.