A platform for research: civil engineering, architecture and urbanism
Data Categorization Using Hadoop MapReduce-Based Parallel K-Means Clustering
Abstract The volume of datasets is increasing in a very fast rate due to the expansion of digitalization of each file of work. The traditional clustering algorithm becomes ineffective in analyzing such huge volume of datasets as it requires large time to cluster such huge volume of datasets. The parallel and distributed architectures are designed to process such large datasets. In order to obtain efficiency in clustering job, traditional clustering algorithms are required to be designed for such parallel and distributed architectures. Few parallel clustering algorithms are designed for gaining efficiency in clustering which works on datasets which are loaded and accessed from main memory, which in turn develops a limitation in clustering large datasets that cannot load millions of data objects in memory at once. In this work, we have proposed a parallel version of traditional K-means so as to execute it over Hadoop distributed framework. The experimental results show that our proposed K-means algorithm outperforms traditional K-means while clustering large volume of datasets.
Data Categorization Using Hadoop MapReduce-Based Parallel K-Means Clustering
Abstract The volume of datasets is increasing in a very fast rate due to the expansion of digitalization of each file of work. The traditional clustering algorithm becomes ineffective in analyzing such huge volume of datasets as it requires large time to cluster such huge volume of datasets. The parallel and distributed architectures are designed to process such large datasets. In order to obtain efficiency in clustering job, traditional clustering algorithms are required to be designed for such parallel and distributed architectures. Few parallel clustering algorithms are designed for gaining efficiency in clustering which works on datasets which are loaded and accessed from main memory, which in turn develops a limitation in clustering large datasets that cannot load millions of data objects in memory at once. In this work, we have proposed a parallel version of traditional K-means so as to execute it over Hadoop distributed framework. The experimental results show that our proposed K-means algorithm outperforms traditional K-means while clustering large volume of datasets.
Data Categorization Using Hadoop MapReduce-Based Parallel K-Means Clustering
Ansari, Zahid (author) / Afzal, Asif (author) / Sardar, Tanvir Habib (author)
Journal of The Institution of Engineers (India): Series B ; 100 ; 95-103
2019-02-25
9 pages
Article (Journal)
Electronic Resource
English
MapReduce-based Fuzzy C-means Algorithm for Distributed Document Clustering
Springer Verlag | 2022
|An Analysis of Distributed Document Clustering Using MapReduce Based K-Means Algorithm
Springer Verlag | 2020
|Distributed Big Data Clustering using MapReduce-based Fuzzy C-Medoids
Springer Verlag | 2022
|Urban Point Cloud Mining Based on Density Clustering and MapReduce
British Library Online Contents | 2017
|Urban Point Cloud Mining Based on Density Clustering and MapReduce
Online Contents | 2017
|