Eine Plattform für die Wissenschaft: Bauingenieurwesen, Architektur und Urbanistik
Vision transformer-based autonomous crack detection on asphalt and concrete surfaces
Abstract Previous research has shown the high accuracy of convolutional neural networks (CNNs) in asphalt and concrete crack detection in controlled conditions. Yet, human-like generalisation remains a significant challenge for industrial applications where the range of conditions varies significantly. Given the intrinsic biases of CNNs, this paper proposes a vision transformer (ViT)-based framework for crack detection on asphalt and concrete surfaces. With transfer learning and the differentiable intersection over union (IoU) loss function, the encoder-decoder network equipped with ViT could achieve an enhanced real-world crack segmentation performance. Compared to the CNN-based models (DeepLabv3+ and U-Net), TransUNet with a CNN-ViT backbone achieved up to ~61% and ~3.8% better mean IoU on the original images of the respective datasets with very small and multi-scale crack semantics. Moreover, ViT assisted the encoder-decoder network to show a robust performance against various noisy signals where the mean Dice score attained by the CNN-based models significantly dropped (<10%).
Highlights The first framework for asphalt and concrete crack detection with ViT is proposed. Real-world applications of data-driven crack detection require adaptive generalisation. Inter-image multi-range dependencies are shown to significantly improve crack recognition. A hybrid approach (ViT/CNN) can improve generalisation adaptability. The ViT-equipped network has higher robustness to noisy signals compared to CNN-based models.
Vision transformer-based autonomous crack detection on asphalt and concrete surfaces
Abstract Previous research has shown the high accuracy of convolutional neural networks (CNNs) in asphalt and concrete crack detection in controlled conditions. Yet, human-like generalisation remains a significant challenge for industrial applications where the range of conditions varies significantly. Given the intrinsic biases of CNNs, this paper proposes a vision transformer (ViT)-based framework for crack detection on asphalt and concrete surfaces. With transfer learning and the differentiable intersection over union (IoU) loss function, the encoder-decoder network equipped with ViT could achieve an enhanced real-world crack segmentation performance. Compared to the CNN-based models (DeepLabv3+ and U-Net), TransUNet with a CNN-ViT backbone achieved up to ~61% and ~3.8% better mean IoU on the original images of the respective datasets with very small and multi-scale crack semantics. Moreover, ViT assisted the encoder-decoder network to show a robust performance against various noisy signals where the mean Dice score attained by the CNN-based models significantly dropped (<10%).
Highlights The first framework for asphalt and concrete crack detection with ViT is proposed. Real-world applications of data-driven crack detection require adaptive generalisation. Inter-image multi-range dependencies are shown to significantly improve crack recognition. A hybrid approach (ViT/CNN) can improve generalisation adaptability. The ViT-equipped network has higher robustness to noisy signals compared to CNN-based models.
Vision transformer-based autonomous crack detection on asphalt and concrete surfaces
Asadi Shamsabadi, Elyas (Autor:in) / Xu, Chang (Autor:in) / Rao, Aravinda S. (Autor:in) / Nguyen, Tuan (Autor:in) / Ngo, Tuan (Autor:in) / Dias-da-Costa, Daniel (Autor:in)
29.04.2022
Aufsatz (Zeitschrift)
Elektronische Ressource
Englisch
Vision-Based Crack Detection of Asphalt Pavement Using Deep Convolutional Neural Network
Springer Verlag | 2021
|