Adversarial Approaches to Tackle Imbalanced Data in Machine Learning: Fid-Bau Portal

Fachinformationsdienst BAUdigital

A platform for research: civil engineering, architecture and urbanism

Adversarial Approaches to Tackle Imbalanced Data in Machine Learning

Shahnawaz Ayoub / Yonis Gulzar / Jaloliddin Rustamov / Abdoh Jabbari / Faheem Ahmad Reegu / Sherzod Turaev

Real-world applications often involve imbalanced datasets, which have different distributions of examples across various classes. When building a system that requires a high accuracy, the performance of the classifiers is crucial. However, imbalanced datasets can lead to a poor classification performance and conventional techniques, such as synthetic minority oversampling technique. As a result, this study proposed a balance between the datasets using adversarial learning methods such as generative adversarial networks. The model evaluated the effect of data augmentation on both the balanced and imbalanced datasets. The study evaluated the classification performance on three different datasets and applied data augmentation techniques to generate the synthetic data for the minority class. Before the augmentation, a decision tree was applied to identify the classification accuracy of all three datasets. The obtained classification accuracies were 79.9%, 94.1%, and 72.6%. A decision tree was used to evaluate the performance of the data augmentation, and the results showed that the proposed model achieved an accuracy of 82.7%, 95.7%, and 76% on a highly imbalanced dataset. This study demonstrates the potential of using data augmentation to improve the classification performance in imbalanced datasets.

Access

Page navigation

Document information

Export, share and cite

Adversarial Approaches to Tackle Imbalanced Data in Machine Learning

Shahnawaz Ayoub / Yonis Gulzar / Jaloliddin Rustamov / Abdoh Jabbari / Faheem Ahmad Reegu / Sherzod Turaev

Real-world applications often involve imbalanced datasets, which have different distributions of examples across various classes. When building a system that requires a high accuracy, the performance of the classifiers is crucial. However, imbalanced datasets can lead to a poor classification performance and conventional techniques, such as synthetic minority oversampling technique. As a result, this study proposed a balance between the datasets using adversarial learning methods such as generative adversarial networks. The model evaluated the effect of data augmentation on both the balanced and imbalanced datasets. The study evaluated the classification performance on three different datasets and applied data augmentation techniques to generate the synthetic data for the minority class. Before the augmentation, a decision tree was applied to identify the classification accuracy of all three datasets. The obtained classification accuracies were 79.9%, 94.1%, and 72.6%. A decision tree was used to evaluate the performance of the data augmentation, and the results showed that the proposed model achieved an accuracy of 82.7%, 95.7%, and 76% on a highly imbalanced dataset. This study demonstrates the potential of using data augmentation to improve the classification performance in imbalanced datasets.

Adversarial Approaches to Tackle Imbalanced Data in Machine Learning

Shahnawaz Ayoub / Yonis Gulzar / Jaloliddin Rustamov / Abdoh Jabbari / Faheem Ahmad Reegu / Sherzod Turaev

Real-world applications often involve imbalanced datasets, which have different distributions of examples across various classes. When building a system that requires a high accuracy, the performance of the classifiers is crucial. However, imbalanced datasets can lead to a poor classification performance and conventional techniques, such as synthetic minority oversampling technique. As a result, this study proposed a balance between the datasets using adversarial learning methods such as generative adversarial networks. The model evaluated the effect of data augmentation on both the balanced and imbalanced datasets. The study evaluated the classification performance on three different datasets and applied data augmentation techniques to generate the synthetic data for the minority class. Before the augmentation, a decision tree was applied to identify the classification accuracy of all three datasets. The obtained classification accuracies were 79.9%, 94.1%, and 72.6%. A decision tree was used to evaluate the performance of the data augmentation, and the results showed that the proposed model achieved an accuracy of 82.7%, 95.7%, and 76% on a highly imbalanced dataset. This study demonstrates the potential of using data augmentation to improve the classification performance in imbalanced datasets.

Access

Page navigation

Document information

Export, share and cite

Document information

Title:

Adversarial Approaches to Tackle Imbalanced Data in Machine Learning

Contributors:

Shahnawaz Ayoub (author) / Yonis Gulzar (author) / Jaloliddin Rustamov (author) / Abdoh Jabbari (author) / Faheem Ahmad Reegu (author) / Sherzod Turaev (author)

Published in:

Sustainability, Vol 15, Iss 9, p 7097 (2023)

Publication date:

2023

ISSN:

DOI:

https://doi.org/10.3390/su15097097

Type of media:

Article (Journal)

Type of material:

Electronic Resource

Language:

Unknown

Keywords:

computer vision , machine learning , deep learning , imbalanced dataset , Environmental effects of industries and plants , TD194-195 , Renewable energy sources , TJ807-830 , Environmental sciences , GE1-350

Metadata by DOAJ is licensed under CC BY-SA 1.0

Similar titles

Balanced semisupervised generative adversarial network for damage assessment from low‐data imbalanced‐class regime

Gao, Yuqing / Zhai, Pengyuan / Mosalam, Khalid M. | Wiley | 2021

Handling Imbalanced Data in Road Crash Severity Prediction by Machine Learning Algorithms

Nicholas Fiorentini / Massimo Losa | DOAJ | 2020

Free access

Machine learning based novel cost-sensitive seizure detection classifier for imbalanced EEG data sets

Siddiqui, Mohammad Khubeb / Huang, Xiaodi / Morales-Menendez, Ruben et al. | Springer Verlag | 2020

Machine learning-based sensitivity of steel frames with highly imbalanced and high-dimensional data

Koh, Hyeyoung / Blum, Hannah B. | Elsevier | 2022

Machine learning based novel cost-sensitive seizure detection classifier for imbalanced EEG data sets

Siddiqui, Mohammad Khubeb / Huang, Xiaodi / Morales-Menendez, Ruben et al. | Springer Verlag | 2020