A self‐supervised monocular depth estimation model with scale recovery and transfer learning for construction scene analysis: Fid-Bau Portal

A self‐supervised monocular depth estimation model with scale recovery and transfer learning for construction scene analysis

Shen, Jie / Yan, Wenjie / Qin, Shengxian / Zheng, Xiaoyu

Abstract

Estimating the depth of a construction scene from a single red‐green‐blue image is a crucial prerequisite for various applications, including work zone safety, localization, productivity analysis, activity recognition, and scene understanding. Recently, self‐supervised representation learning methods have made significant progress and demonstrated state‐of‐the‐art performance on monocular depth estimation. However, the two leading open challenges are the ambiguity of estimated depth up to an unknown scale and representation transferability for a downstream task, which severely hinders the practical deployment of self‐supervised methods. We propose a prior information‐based method, not depending on additional sensors, to recover the unknown scale in monocular vision and predict per‐pixel absolute depth. Moreover, a new learning paradigm for a self‐supervised monocular depth estimation model is constructed to transfer the pre‐trained self‐supervised model to other downstream construction scene analysis tasks. Meanwhile, we also propose a novel depth loss to enforce depth consistency when transferring to a new downstream task and two new metrics to measure transfer performance. Finally, we verify the effectiveness of scale recovery and representation transferability in isolation. The new learning paradigm with our new metrics and depth loss is expected to estimate the monocular depth of a construction scene without depth ground truth like light detection and ranging. Our models will serve as a good foundation for further construction scene analysis tasks.