Eine Plattform für die Wissenschaft: Bauingenieurwesen, Architektur und Urbanistik
Automatic Construction Safety Report Using Visual Question Answering and Segmentation Model
A construction safety report is crucial for reducing the accident rate in this complex and dangerous working area. With deep learning-based computer vision applications, many computer vision techniques, such as object detection, image segmentation, and semantic segmentation, are utilized for recognizing hazards. However, there is a lack of research on identifying hazards from a natural language processing and computer vision perspective. Visual question answering (VQA) and the segmentation model can fill the gap in this challenge. VQA is a vision and language model that can infer safety violated rules based on the input image. In addition, segmentation models can be applied to automated segmenting objects that are relevant to the violated rules. From that, an end-to-end automatic safety report system is developed. In this research, we proposed a novel approach that combines VQA and the segmentation model in a construction safety report system. First, the large “scenario-questions” with 200,000 images and 16 questions are created based on a public segmentation dataset. Then the vision-and-language transformer model is trained and validated. Next, the segmentation model is utilized for postprocessing to explain the reason for hazard inference and increase baseline model accuracy. As a result, both the VQA and segmentation models showed robustness in validation accuracy and can be considered as a robust approach for safety management and monitoring in construction sites.
Automatic Construction Safety Report Using Visual Question Answering and Segmentation Model
A construction safety report is crucial for reducing the accident rate in this complex and dangerous working area. With deep learning-based computer vision applications, many computer vision techniques, such as object detection, image segmentation, and semantic segmentation, are utilized for recognizing hazards. However, there is a lack of research on identifying hazards from a natural language processing and computer vision perspective. Visual question answering (VQA) and the segmentation model can fill the gap in this challenge. VQA is a vision and language model that can infer safety violated rules based on the input image. In addition, segmentation models can be applied to automated segmenting objects that are relevant to the violated rules. From that, an end-to-end automatic safety report system is developed. In this research, we proposed a novel approach that combines VQA and the segmentation model in a construction safety report system. First, the large “scenario-questions” with 200,000 images and 16 questions are created based on a public segmentation dataset. Then the vision-and-language transformer model is trained and validated. Next, the segmentation model is utilized for postprocessing to explain the reason for hazard inference and increase baseline model accuracy. As a result, both the VQA and segmentation models showed robustness in validation accuracy and can be considered as a robust approach for safety management and monitoring in construction sites.
Automatic Construction Safety Report Using Visual Question Answering and Segmentation Model
Lecture Notes in Civil Engineering
Francis, Adel (Herausgeber:in) / Miresco, Edmond (Herausgeber:in) / Melhado, Silvio (Herausgeber:in) / Tran, Dai Quoc (Autor:in) / Jeon, Yuntae (Autor:in) / Son, Seongwoo (Autor:in) / Kulinan, Almo Senja (Autor:in) / Lee, Changjun (Autor:in) / Park, Seunghee (Autor:in)
International Conference on Computing in Civil and Building Engineering ; 2024 ; Montreal, QC, Canada
Advances in Information Technology in Civil and Building Engineering ; Kapitel: 25 ; 307-317
04.03.2025
11 pages
Aufsatz/Kapitel (Buch)
Elektronische Ressource
Englisch
Visual Question Answering Bahasa Indonesia Berbasis Deep Learning untuk Pembelajaran Visual Anak TK
DOAJ | 2024
|Features - AEHP - Answering the Central Question
Online Contents | 2000
X-ray is answering question "what is glass,"
Engineering Index Backfile | 1935
|A question answering system for project management applications
Tema Archiv | 2002
|