A platform for research: civil engineering, architecture and urbanism
Dual attention transformer network for pixel-level concrete crack segmentation considering camera placement
Abstract Pixel-level crack segmentation remains a challenging task due to the trade-off between computational cost and accuracy, as well as the small size of real-world cracks, typically submillimeter in width, resulting in limited pixels for analysis. To address these challenges, this paper proposes a Pixel Crack Transformer Network (PCTNet) to investigate the impact of different camera placements on network performance. PCTNet adopts a hierarchical structure with Cross-Scale PatchEmbedding Layer and Dual Attention Transformer Block, enabling the generation of multi-scale feature maps and the fusion of global and local features. PCTNet achieves a reduction of up to 64% in computational cost compared to transformer networks while outperforming both convolutional and transformer networks, achieving 95.89% precision, 93.77% recall, 94.8% F1-score, and 90.53% mIoU. Furthermore, this work introduces Crack-R dataset, which encompasses crack images captured at varying distances, facilitating the evaluation of segmentation accuracy in real-world scenarios with different crack-to-pixel ratios.
Highlights A Dual Attention Transformer network for pixel-level crack segmentation is proposed. Dual Attention Transformer Block for local and global feature information extraction is proposed. The effect of different camera placements on network performance is studied. Two new metrics, Pixel IoU and relative error rate, are proposed to evaluate pixel-level crack segmentation performance. PCTNet outperforms other state-of-the-art segmentation networks in terms of robustness and generalization.
Dual attention transformer network for pixel-level concrete crack segmentation considering camera placement
Abstract Pixel-level crack segmentation remains a challenging task due to the trade-off between computational cost and accuracy, as well as the small size of real-world cracks, typically submillimeter in width, resulting in limited pixels for analysis. To address these challenges, this paper proposes a Pixel Crack Transformer Network (PCTNet) to investigate the impact of different camera placements on network performance. PCTNet adopts a hierarchical structure with Cross-Scale PatchEmbedding Layer and Dual Attention Transformer Block, enabling the generation of multi-scale feature maps and the fusion of global and local features. PCTNet achieves a reduction of up to 64% in computational cost compared to transformer networks while outperforming both convolutional and transformer networks, achieving 95.89% precision, 93.77% recall, 94.8% F1-score, and 90.53% mIoU. Furthermore, this work introduces Crack-R dataset, which encompasses crack images captured at varying distances, facilitating the evaluation of segmentation accuracy in real-world scenarios with different crack-to-pixel ratios.
Highlights A Dual Attention Transformer network for pixel-level crack segmentation is proposed. Dual Attention Transformer Block for local and global feature information extraction is proposed. The effect of different camera placements on network performance is studied. Two new metrics, Pixel IoU and relative error rate, are proposed to evaluate pixel-level crack segmentation performance. PCTNet outperforms other state-of-the-art segmentation networks in terms of robustness and generalization.
Dual attention transformer network for pixel-level concrete crack segmentation considering camera placement
Wu, Yingjie (author) / Li, Shaoqi (author) / Zhang, Jinge (author) / Li, Yancheng (author) / Li, Yang (author) / Zhang, Yingqiao (author)
2023-10-31
Article (Journal)
Electronic Resource
English
Pixel-wise crack defect segmentation with dual-encoder fusion network
Elsevier | 2024
|Pixel-wise crack defect segmentation with dual-encoder fusion network
Elsevier | 2024
|Automatic concrete crack segmentation model based on transformer
Elsevier | 2022
|