Data Categorization Using Hadoop MapReduce-Based Parallel K-Means Clustering: Fid-Bau Portal

Fachinformationsdienst BAUdigital

A platform for research: civil engineering, architecture and urbanism

Data Categorization Using Hadoop MapReduce-Based Parallel K-Means Clustering

Ansari, Zahid / Afzal, Asif / Sardar, Tanvir Habib

Abstract The volume of datasets is increasing in a very fast rate due to the expansion of digitalization of each file of work. The traditional clustering algorithm becomes ineffective in analyzing such huge volume of datasets as it requires large time to cluster such huge volume of datasets. The parallel and distributed architectures are designed to process such large datasets. In order to obtain efficiency in clustering job, traditional clustering algorithms are required to be designed for such parallel and distributed architectures. Few parallel clustering algorithms are designed for gaining efficiency in clustering which works on datasets which are loaded and accessed from main memory, which in turn develops a limitation in clustering large datasets that cannot load millions of data objects in memory at once. In this work, we have proposed a parallel version of traditional K-means so as to execute it over Hadoop distributed framework. The experimental results show that our proposed K-means algorithm outperforms traditional K-means while clustering large volume of datasets.

Access

Check availability in my library

Order at Subito €

Page navigation

Document information

Export, share and cite

Data Categorization Using Hadoop MapReduce-Based Parallel K-Means Clustering

Ansari, Zahid / Afzal, Asif / Sardar, Tanvir Habib

Abstract The volume of datasets is increasing in a very fast rate due to the expansion of digitalization of each file of work. The traditional clustering algorithm becomes ineffective in analyzing such huge volume of datasets as it requires large time to cluster such huge volume of datasets. The parallel and distributed architectures are designed to process such large datasets. In order to obtain efficiency in clustering job, traditional clustering algorithms are required to be designed for such parallel and distributed architectures. Few parallel clustering algorithms are designed for gaining efficiency in clustering which works on datasets which are loaded and accessed from main memory, which in turn develops a limitation in clustering large datasets that cannot load millions of data objects in memory at once. In this work, we have proposed a parallel version of traditional K-means so as to execute it over Hadoop distributed framework. The experimental results show that our proposed K-means algorithm outperforms traditional K-means while clustering large volume of datasets.

Data Categorization Using Hadoop MapReduce-Based Parallel K-Means Clustering

Ansari, Zahid / Afzal, Asif / Sardar, Tanvir Habib

Abstract The volume of datasets is increasing in a very fast rate due to the expansion of digitalization of each file of work. The traditional clustering algorithm becomes ineffective in analyzing such huge volume of datasets as it requires large time to cluster such huge volume of datasets. The parallel and distributed architectures are designed to process such large datasets. In order to obtain efficiency in clustering job, traditional clustering algorithms are required to be designed for such parallel and distributed architectures. Few parallel clustering algorithms are designed for gaining efficiency in clustering which works on datasets which are loaded and accessed from main memory, which in turn develops a limitation in clustering large datasets that cannot load millions of data objects in memory at once. In this work, we have proposed a parallel version of traditional K-means so as to execute it over Hadoop distributed framework. The experimental results show that our proposed K-means algorithm outperforms traditional K-means while clustering large volume of datasets.

Access

Check availability in my library

Order at Subito €

Page navigation

Document information

Export, share and cite

Document information

Title:

Data Categorization Using Hadoop MapReduce-Based Parallel K-Means Clustering

Contributors:

Ansari, Zahid (author) / Afzal, Asif (author) / Sardar, Tanvir Habib (author)

Published in:

Journal of The Institution of Engineers (India): Series B ; 100 ; 95-103

Publication date:

2019-02-25

Size:

9 pages

ISSN:

2250-2114 , 2250-2106

DOI:

https://doi.org/10.1007/s40031-019-00388-x

Type of media:

Article (Journal)

Type of material:

Electronic Resource

Language:

English

Keywords:

Data mining , Clustering , Parallel K-means , Hadoop , MapReduce Engineering , Communications Engineering, Networks

Similar titles

MapReduce-based Fuzzy C-means Algorithm for Distributed Document Clustering

Sardar, Tanvir H. / Ansari, Zahid | Springer Verlag | 2022

An Analysis of Distributed Document Clustering Using MapReduce Based K-Means Algorithm

Sardar, Tanvir Habib / Ansari, Zahid | Springer Verlag | 2020

Distributed Big Data Clustering using MapReduce-based Fuzzy C-Medoids

Sardar, Tanvir H. / Ansari, Zahid | Springer Verlag | 2022

Urban Point Cloud Mining Based on Density Clustering and MapReduce

Aljumaily, Harith / Laefer, Debra F. / Cuadra, Dolores | British Library Online Contents | 2017

Urban Point Cloud Mining Based on Density Clustering and MapReduce

Aljumaily, Harith | Online Contents | 2017