Multi-scale hybrid vision transformer and Sinkhorn tokenizer for sewer defect classification: Fid-Bau Portal

Fachinformationsdienst BAUdigital

Eine Plattform für die Wissenschaft: Bauingenieurwesen, Architektur und Urbanistik

Multi-scale hybrid vision transformer and Sinkhorn tokenizer for sewer defect classification

Haurum, Joakim Bruslund / Madadi, Meysam / Escalera, Sergio / Moeslund, Thomas B.

Highlights • The Multi-Scale Hybrid Vision Transformer is proposed for sewer defect classification. • The Sinkhorn tokenizer is proposed for non-local feature aggregation. • MSHViT outperforms baseline methods on the Sewer-ML sewer defect dataset. • The MSHViT architecture is analyzed in terms of accuracy and efficiency. • Visual verification of the non-local interactions, useful for informing sewer inspectors.

Abstract A crucial part of image classification consists of capturing non-local spatial semantics of image content. This paper describes the multi-scale hybrid vision transformer (MSHViT), an extension of the classical convolutional neural network (CNN) backbone, for multi-label sewer defect classification. To better model spatial semantics in the images, features are aggregated at different scales non-locally through the use of a lightweight vision transformer, and a smaller set of tokens was produced through a novel Sinkhorn clustering-based tokenizer using distinct cluster centers. The proposed MSHViT and Sinkhorn tokenizer were evaluated on the Sewer-ML multi-label sewer defect classification dataset, showing consistent performance improvements of up to 2.53 percentage points.

Zugriff

Zugriff prüfen

Verfügbarkeit in meiner Bibliothek prüfen

Bestellung bei Subito €

Seitennavigation

Dokumentinformationen

Ähnliche Titel

Exportieren, teilen und zitieren

Multi-scale hybrid vision transformer and Sinkhorn tokenizer for sewer defect classification

Haurum, Joakim Bruslund / Madadi, Meysam / Escalera, Sergio / Moeslund, Thomas B.

Highlights • The Multi-Scale Hybrid Vision Transformer is proposed for sewer defect classification. • The Sinkhorn tokenizer is proposed for non-local feature aggregation. • MSHViT outperforms baseline methods on the Sewer-ML sewer defect dataset. • The MSHViT architecture is analyzed in terms of accuracy and efficiency. • Visual verification of the non-local interactions, useful for informing sewer inspectors.

Abstract A crucial part of image classification consists of capturing non-local spatial semantics of image content. This paper describes the multi-scale hybrid vision transformer (MSHViT), an extension of the classical convolutional neural network (CNN) backbone, for multi-label sewer defect classification. To better model spatial semantics in the images, features are aggregated at different scales non-locally through the use of a lightweight vision transformer, and a smaller set of tokens was produced through a novel Sinkhorn clustering-based tokenizer using distinct cluster centers. The proposed MSHViT and Sinkhorn tokenizer were evaluated on the Sewer-ML multi-label sewer defect classification dataset, showing consistent performance improvements of up to 2.53 percentage points.

Multi-scale hybrid vision transformer and Sinkhorn tokenizer for sewer defect classification

Haurum, Joakim Bruslund / Madadi, Meysam / Escalera, Sergio / Moeslund, Thomas B.

Highlights • The Multi-Scale Hybrid Vision Transformer is proposed for sewer defect classification. • The Sinkhorn tokenizer is proposed for non-local feature aggregation. • MSHViT outperforms baseline methods on the Sewer-ML sewer defect dataset. • The MSHViT architecture is analyzed in terms of accuracy and efficiency. • Visual verification of the non-local interactions, useful for informing sewer inspectors.

Abstract A crucial part of image classification consists of capturing non-local spatial semantics of image content. This paper describes the multi-scale hybrid vision transformer (MSHViT), an extension of the classical convolutional neural network (CNN) backbone, for multi-label sewer defect classification. To better model spatial semantics in the images, features are aggregated at different scales non-locally through the use of a lightweight vision transformer, and a smaller set of tokens was produced through a novel Sinkhorn clustering-based tokenizer using distinct cluster centers. The proposed MSHViT and Sinkhorn tokenizer were evaluated on the Sewer-ML multi-label sewer defect classification dataset, showing consistent performance improvements of up to 2.53 percentage points.

Zugriff

Zugriff prüfen

Verfügbarkeit in meiner Bibliothek prüfen

Bestellung bei Subito €

Seitennavigation

Dokumentinformationen

Ähnliche Titel

Exportieren, teilen und zitieren

Dokumentinformationen

Titel:

Multi-scale hybrid vision transformer and Sinkhorn tokenizer for sewer defect classification

Beteiligte:

Haurum, Joakim Bruslund (Autor:in) / Madadi, Meysam (Autor:in) / Escalera, Sergio (Autor:in) / Moeslund, Thomas B. (Autor:in)

Erschienen in:

Automation in Construction ; 144

Erscheinungsdatum:

03.10.2022

ISSN:

DOI:

https://doi.org/10.1016/j.autcon.2022.104614

Medientyp:

Aufsatz (Zeitschrift)

Format:

Elektronische Ressource

Sprache:

Englisch

Schlagwörter:

Sewer Defect Classification , Vision Transformers , Sinkhorn-Knopp , Convolutional Neural Networks , Closed-Circuit Television , Sewer Inspection

Ähnliche Titel

Vision transformer based classification of sewer defects weighted loss model

Ji, Chunhou / Xie, Zhiqiang / Li, Rong et al. | Elsevier | 2025

Towards Trustworthy Multi-label Sewer Defect Classification via Evidential Deep Learning

Zhao, Chenyang / Hu, Chuanfei / Shao, Hang et al. | ArXiv | 2022

Freier Zugriff

Smart and Automated Sewer Pipeline Defect Detection and Classification

Kaddoura, Khalid / Atherton, Jeff | TIBKAT | 2021

Multiple Defect Classification Method for Green Plum Surfaces Based on Vision Transformer

Weihao Su / Yutu Yang / Chenxin Zhou et al. | DOAJ | 2023

Freier Zugriff

Sewer defect detection from 3D point clouds using a transformer-based deep learning model

Zhou, Yunxiang / Ji, Ankang / Zhang, Limao | Elsevier | 2022