Eine Plattform für die Wissenschaft: Bauingenieurwesen, Architektur und Urbanistik
Self-supervised contrastive video representation learning for construction equipment activity recognition on limited dataset
Abstract The importance of monitoring the activities of construction equipment for evaluating their productivity has resulted in the development of many vision-based automated monitoring methods. The state-of-the-art construction equipment activity recognition methods are based on the supervised learning approach that requires large, labeled datasets for each equipment and activity. Recently, many self-supervised deep learning methods have been proposed, which exploit the abundant unlabeled data to alleviate the data annotation cost by creating labels from the input data itself. However, the assumption of availability of abundant unlabeled data limits the applicability of self-supervised methods in the area of construction equipment activity recognition. To address these problems, in this work we propose CVRLoLD, which stands for Contrastive Video Representation Learning on Limited Dataset. CVRLoLD is a self-supervised contrastive learning approach that can successfully learn to recognize construction equipment activities on a limited dataset while only a portion of the dataset is annotated. The objectives of this work are: (1) proposing a novel self-supervised method for excavator activity recognition, and (2) improving the applicability of the self-supervised learning method on the relatively small datasets available for construction equipment activity recognition. Initially, the proposed method trains a backbone network using contrastive learning on the unlabeled data. Afterwards, the labeled data are used to fine-tune the pretrained backbone. The proposed method achieved an activity recognition accuracy of 81.7% while using only 30% of the labels in the dataset. The results demonstrate the potential of the proposed method for reducing the time and efforts required for data labeling while achieving high performance on the relatively limited datasets available in the construction domain.
Highlights Proposing self-supervised contrastive video representation learning (CVRLoLD) CVRLoLD requires only a portion of the dataset to be annotated CVRLoLD can recognize construction equipment activities on a small dataset Achieving 81.7% accuracy using only 30% labeled data from the dataset First attempt to apply self-supervised learning in the construction domain
Self-supervised contrastive video representation learning for construction equipment activity recognition on limited dataset
Abstract The importance of monitoring the activities of construction equipment for evaluating their productivity has resulted in the development of many vision-based automated monitoring methods. The state-of-the-art construction equipment activity recognition methods are based on the supervised learning approach that requires large, labeled datasets for each equipment and activity. Recently, many self-supervised deep learning methods have been proposed, which exploit the abundant unlabeled data to alleviate the data annotation cost by creating labels from the input data itself. However, the assumption of availability of abundant unlabeled data limits the applicability of self-supervised methods in the area of construction equipment activity recognition. To address these problems, in this work we propose CVRLoLD, which stands for Contrastive Video Representation Learning on Limited Dataset. CVRLoLD is a self-supervised contrastive learning approach that can successfully learn to recognize construction equipment activities on a limited dataset while only a portion of the dataset is annotated. The objectives of this work are: (1) proposing a novel self-supervised method for excavator activity recognition, and (2) improving the applicability of the self-supervised learning method on the relatively small datasets available for construction equipment activity recognition. Initially, the proposed method trains a backbone network using contrastive learning on the unlabeled data. Afterwards, the labeled data are used to fine-tune the pretrained backbone. The proposed method achieved an activity recognition accuracy of 81.7% while using only 30% of the labels in the dataset. The results demonstrate the potential of the proposed method for reducing the time and efforts required for data labeling while achieving high performance on the relatively limited datasets available in the construction domain.
Highlights Proposing self-supervised contrastive video representation learning (CVRLoLD) CVRLoLD requires only a portion of the dataset to be annotated CVRLoLD can recognize construction equipment activities on a small dataset Achieving 81.7% accuracy using only 30% labeled data from the dataset First attempt to apply self-supervised learning in the construction domain
Self-supervised contrastive video representation learning for construction equipment activity recognition on limited dataset
Ghelmani, Ali (Autor:in) / Hammad, Amin (Autor:in)
15.06.2023
Aufsatz (Zeitschrift)
Elektronische Ressource
Englisch
Elsevier | 2024
|British Library Conference Proceedings | 2019
|