ISBN: 978-981-11-3671-9 DOI: 10.18178/wcse.2017.06.062
GBTM: A Short Text Clustering Model Based on Word Pairing
Abstract— With the rapid development of Internet and many kinds of mobile applications, the number of
short text has been growing rapidly. Because of the semantic spare problem and the context dependency
problem, the traditional semantic mining in social network is inefficiency. At present, semantic mining of
short text mainly considers the word correlation, without considering the correlation of word pairs. In order
to more in-depth mine the semantic of short text, the GBTM model for short text clustering based on word
pairing is proposed, firstly the text - topic probability distribution is obtained by mining the word pairs’
correlation, on the basis of this, the topic correlation between the text is calculated using K-means clustering
algorithm combined with the JS distance. The experimental results show that, the proposed GBTM model has
a certain improvements in the clustering effect Purity (accuracy) and F-measure (precision and recall rate
ratio) compared with LDA model and BTM model, Therefore, the mining of the word pairs’ correlation can
help to improve the efficiency of short text topic clustering.
Index Terms— topic modeling, GBTM model, Gibbs sampling, cluster description, word pairs’ correlation
Mengmin Tian, Ping Lu, Jincai Chen, Min Wu
Wuhan National Laboratory for Optoelectronics, Huazhong University of Science and Technology, CHINA
Key Laboratory of Information Storage System (School of Computer Science and Technology, CHINA
Cite: Mengmin Tian, Ping Lu, Jincai Chen, Min Wu, "GBTM: A Short Text Clustering Model Based on Word Pairing," Proceedings of 2017 the 7th International Workshop on Computer Science and Engineering, pp. 360-365, Beijing, 25-27 June, 2017.