A platform for research: civil engineering, architecture and urbanism
Explainable Image Captioning to Identify Ergonomic Problems and Solutions for Construction Workers
The high occurrence of work-related musculoskeletal disorders (WMSDs) in construction remains a pressing concern, causing numerous nonfatal injuries. Preventing WMSDs necessitates the implementation of an ergonomic process, encompassing the identification of ergonomic problems and corresponding solutions. Finding ergonomic problems and solutions within active construction sites requires significant efforts from personnel possessing ergonomics expertise. However, ergonomic experts and training programs are often lacking in construction. To address this issue, the authors applied deep learning (DL)–based explainable image captioning to identify ergonomic problems and their corresponding solutions from images that are prevalent in construction sites. To this end, the authors proposed a vision-language model (VLM) capable of identifying ergonomic problems and their solutions, aided by data augmentation. The bilingual evaluation understudy (BLEU) score was used to measure the similarity between ergonomic problems and solutions identified by the proposed VLM and those specified in an ergonomic guideline. Testing with 222 real-site images, the proposed VLM achieved the highest BLEU-4 score, 0.796, compared with the traditional convolutional neural network-long short-term memory and a state-of-the-art VLM, the bootstrapping language-image pretraining. In addition, the authors developed an explainability module, visualizing which specific areas of the images the proposed VLM focuses on when identifying ergonomic problems and the important words for identifying ergonomic solutions. The highest BLEU score and the visual explanations demonstrate the potential and credibility of the proposed VLM in identifying ergonomic problems and their solutions. The proposed VLM and explainability module greatly contribute to implementing the ergonomic process in construction, identifying ergonomic problems and their solutions only with site images.
To prevent WMSDs, the National Institute of Occupational Safety and Health (NIOSH) recommends implementing an ergonomic process, which encompasses ergonomic problem identification, ergonomic risk assessment, and ergonomic solution identification. The current practice on sites relies on the intermittent implementation of manual ergonomic processes, and thus often falls short in protecting workers against WMSDs due to rapidly changing site conditions and the lack of on-site ergonomic expertise. Addressing this, many automated tools have been developed for ergonomic risk assessment, but none for ergonomic problem and solution identification. Therefore, with these assessment tools, we aim to streamline the recommended ergonomic process in an automated manner. To this end, we propose a deep learning–based explainable image captioning model for automated ergonomic problem and solution identification. Utilizing an ordinary camera (e.g., smartphones and site surveillance cameras), safety managers can easily identify ergonomic problems, assess risk levels, and identify corresponding solutions. Additionally, our model provides justification for its identification by visualizing the reason behind the identified ergonomic problems and solutions. With such an easily accessible and trustworthy model, the on-site ergonomic process can be streamlined, potentially reducing workers’ WMSDs.
Explainable Image Captioning to Identify Ergonomic Problems and Solutions for Construction Workers
The high occurrence of work-related musculoskeletal disorders (WMSDs) in construction remains a pressing concern, causing numerous nonfatal injuries. Preventing WMSDs necessitates the implementation of an ergonomic process, encompassing the identification of ergonomic problems and corresponding solutions. Finding ergonomic problems and solutions within active construction sites requires significant efforts from personnel possessing ergonomics expertise. However, ergonomic experts and training programs are often lacking in construction. To address this issue, the authors applied deep learning (DL)–based explainable image captioning to identify ergonomic problems and their corresponding solutions from images that are prevalent in construction sites. To this end, the authors proposed a vision-language model (VLM) capable of identifying ergonomic problems and their solutions, aided by data augmentation. The bilingual evaluation understudy (BLEU) score was used to measure the similarity between ergonomic problems and solutions identified by the proposed VLM and those specified in an ergonomic guideline. Testing with 222 real-site images, the proposed VLM achieved the highest BLEU-4 score, 0.796, compared with the traditional convolutional neural network-long short-term memory and a state-of-the-art VLM, the bootstrapping language-image pretraining. In addition, the authors developed an explainability module, visualizing which specific areas of the images the proposed VLM focuses on when identifying ergonomic problems and the important words for identifying ergonomic solutions. The highest BLEU score and the visual explanations demonstrate the potential and credibility of the proposed VLM in identifying ergonomic problems and their solutions. The proposed VLM and explainability module greatly contribute to implementing the ergonomic process in construction, identifying ergonomic problems and their solutions only with site images.
To prevent WMSDs, the National Institute of Occupational Safety and Health (NIOSH) recommends implementing an ergonomic process, which encompasses ergonomic problem identification, ergonomic risk assessment, and ergonomic solution identification. The current practice on sites relies on the intermittent implementation of manual ergonomic processes, and thus often falls short in protecting workers against WMSDs due to rapidly changing site conditions and the lack of on-site ergonomic expertise. Addressing this, many automated tools have been developed for ergonomic risk assessment, but none for ergonomic problem and solution identification. Therefore, with these assessment tools, we aim to streamline the recommended ergonomic process in an automated manner. To this end, we propose a deep learning–based explainable image captioning model for automated ergonomic problem and solution identification. Utilizing an ordinary camera (e.g., smartphones and site surveillance cameras), safety managers can easily identify ergonomic problems, assess risk levels, and identify corresponding solutions. Additionally, our model provides justification for its identification by visualizing the reason behind the identified ergonomic problems and solutions. With such an easily accessible and trustworthy model, the on-site ergonomic process can be streamlined, potentially reducing workers’ WMSDs.
Explainable Image Captioning to Identify Ergonomic Problems and Solutions for Construction Workers
J. Comput. Civ. Eng.
Yong, Gunwoo (author) / Liu, Meiyin (author) / Lee, SangHyun (author)
2024-07-01
Article (Journal)
Electronic Resource
English
Construction workers' falls from heights: fatal versus serious injuries - ergonomic approach
British Library Conference Proceedings | 2004
|KNOWLEDGE OF ERGONOMICS AND ERGONOMIC RISK FACTORS FOR WORKERS IN THE CONSTRUCTION INDUSTRY
BASE | 2018
|Real-Time Ergonomic Risk Assessment Approach for Construction Workers Based on Computer Vision
Springer Verlag | 2024
|