A platform for research: civil engineering, architecture and urbanism
A self‐supervised monocular depth estimation model with scale recovery and transfer learning for construction scene analysis
Estimating the depth of a construction scene from a single red‐green‐blue image is a crucial prerequisite for various applications, including work zone safety, localization, productivity analysis, activity recognition, and scene understanding. Recently, self‐supervised representation learning methods have made significant progress and demonstrated state‐of‐the‐art performance on monocular depth estimation. However, the two leading open challenges are the ambiguity of estimated depth up to an unknown scale and representation transferability for a downstream task, which severely hinders the practical deployment of self‐supervised methods. We propose a prior information‐based method, not depending on additional sensors, to recover the unknown scale in monocular vision and predict per‐pixel absolute depth. Moreover, a new learning paradigm for a self‐supervised monocular depth estimation model is constructed to transfer the pre‐trained self‐supervised model to other downstream construction scene analysis tasks. Meanwhile, we also propose a novel depth loss to enforce depth consistency when transferring to a new downstream task and two new metrics to measure transfer performance. Finally, we verify the effectiveness of scale recovery and representation transferability in isolation. The new learning paradigm with our new metrics and depth loss is expected to estimate the monocular depth of a construction scene without depth ground truth like light detection and ranging. Our models will serve as a good foundation for further construction scene analysis tasks.
A self‐supervised monocular depth estimation model with scale recovery and transfer learning for construction scene analysis
Estimating the depth of a construction scene from a single red‐green‐blue image is a crucial prerequisite for various applications, including work zone safety, localization, productivity analysis, activity recognition, and scene understanding. Recently, self‐supervised representation learning methods have made significant progress and demonstrated state‐of‐the‐art performance on monocular depth estimation. However, the two leading open challenges are the ambiguity of estimated depth up to an unknown scale and representation transferability for a downstream task, which severely hinders the practical deployment of self‐supervised methods. We propose a prior information‐based method, not depending on additional sensors, to recover the unknown scale in monocular vision and predict per‐pixel absolute depth. Moreover, a new learning paradigm for a self‐supervised monocular depth estimation model is constructed to transfer the pre‐trained self‐supervised model to other downstream construction scene analysis tasks. Meanwhile, we also propose a novel depth loss to enforce depth consistency when transferring to a new downstream task and two new metrics to measure transfer performance. Finally, we verify the effectiveness of scale recovery and representation transferability in isolation. The new learning paradigm with our new metrics and depth loss is expected to estimate the monocular depth of a construction scene without depth ground truth like light detection and ranging. Our models will serve as a good foundation for further construction scene analysis tasks.
A self‐supervised monocular depth estimation model with scale recovery and transfer learning for construction scene analysis
Shen, Jie (author) / Yan, Wenjie (author) / Qin, Shengxian (author) / Zheng, Xiaoyu (author)
Computer‐Aided Civil and Infrastructure Engineering ; 38 ; 1142-1161
2023-06-01
20 pages
Article (Journal)
Electronic Resource
English
Monocular 3D scene reconstruction at absolute scale
Online Contents | 2009
|Monocular 3D scene reconstruction at absolute scale
Online Contents | 2009
|An Efficient Approach to Monocular Depth Estimation for Autonomous Vehicle Perception Systems
DOAJ | 2023
|