ISBN: 978-981-09-5471-0 DOI: 10.18178/wcse.2015.04.077
An Improved Clustering Algorithm for Big Data Based on K-Means with Optimized Clusters’ Number
Abstract— To improve the processing ability of big data, a new clustering algorithm is proposed
which is designed based on K-means. In this algorithm, a concept of “Silhouette Coefficient” is
defined to estimate the result of clustering. Based on silhouette coefficient, the optimized clusters’
number would be chosen, and then K-means algorithm would be operated with this clusters’
number. The algorithm is tested by a real production big data set and compared with classical Kmeans.
The result of experiment proves that the improved algorithm has more reasonable result of
clustering with little extra calculation.
Index Terms— Big data, Silhouette Coefficient, clustering, optimized clusters’ number.
Lianjiang Zhu, Shouning Qu
Information network centre, University of Jinan, CHINA
Tao Du, Kai Wang
School of information science and engineering, University of Jinan, CHINA
Yong Zhang
School of electrical engineering, University of Jinan, CHINA
Cite: Lianjiang Zhu, Tao Du, Shouning Qu, Kai Wang, Yong Zhang, "An Improved Clustering Algorithm for Big Data Based on K-Means with Optimized Clusters’ Number," 2015 The 5th International Workshop on Computer Science and Engineering-Information Processing and Control Engineering (WCSE 2015-IPCE), pp. 467-471, Moscow, Russia, April 15-17, 2015.