A platform for research: civil engineering, architecture and urbanism
An Analysis of Distributed Document Clustering Using MapReduce Based K-Means Algorithm
Clustering is considered as one of the important data mining techniques. Document clustering is among many applications of clustering. The traditional clustering algorithms are proven inefficient for clustering rapidly generating large real world datasets. As a solution, traditional clustering algorithms are modified using distributed programming paradigm. MapReduce is a popular distributed programming paradigm designed for Hadoop distributed framework. This paper demonstrates a MapReduce based modification of K-Means clustering algorithm for document datasets. The result shows that the proposed algorithm is efficient than traditional K-Means for all size of document datasets clustering. The experiments also show that the MapReduce clustering works more efficiently when the dataset size and Hadoop cluster sizes are large.
An Analysis of Distributed Document Clustering Using MapReduce Based K-Means Algorithm
Clustering is considered as one of the important data mining techniques. Document clustering is among many applications of clustering. The traditional clustering algorithms are proven inefficient for clustering rapidly generating large real world datasets. As a solution, traditional clustering algorithms are modified using distributed programming paradigm. MapReduce is a popular distributed programming paradigm designed for Hadoop distributed framework. This paper demonstrates a MapReduce based modification of K-Means clustering algorithm for document datasets. The result shows that the proposed algorithm is efficient than traditional K-Means for all size of document datasets clustering. The experiments also show that the MapReduce clustering works more efficiently when the dataset size and Hadoop cluster sizes are large.
An Analysis of Distributed Document Clustering Using MapReduce Based K-Means Algorithm
J. Inst. Eng. India Ser. B
Sardar, Tanvir Habib (author) / Ansari, Zahid (author)
Journal of The Institution of Engineers (India): Series B ; 101 ; 641-650
2020-12-01
10 pages
Article (Journal)
Electronic Resource
English
MapReduce-based Fuzzy C-means Algorithm for Distributed Document Clustering
Springer Verlag | 2022
|Data Categorization Using Hadoop MapReduce-Based Parallel K-Means Clustering
Springer Verlag | 2019
|Distributed Big Data Clustering using MapReduce-based Fuzzy C-Medoids
Springer Verlag | 2022
|Urban Point Cloud Mining Based on Density Clustering and MapReduce
Online Contents | 2017
|Urban Point Cloud Mining Based on Density Clustering and MapReduce
British Library Online Contents | 2017
|