GBTM: A Short Text Clustering Model Based on Word Pairing

WCSE 2017
ISBN: 978-981-11-3671-9 DOI: 10.18178/wcse.2017.06.062

Mengmin Tian, Ping Lu, Jincai Chen, Min Wu

Abstract— With the rapid development of Internet and many kinds of mobile applications, the number of short text has been growing rapidly. Because of the semantic spare problem and the context dependency problem, the traditional semantic mining in social network is inefficiency. At present, semantic mining of short text mainly considers the word correlation, without considering the correlation of word pairs. In order to more in-depth mine the semantic of short text, the GBTM model for short text clustering based on word pairing is proposed, firstly the text - topic probability distribution is obtained by mining the word pairs’ correlation, on the basis of this, the topic correlation between the text is calculated using K-means clustering algorithm combined with the JS distance. The experimental results show that, the proposed GBTM model has a certain improvements in the clustering effect Purity (accuracy) and F-measure (precision and recall rate ratio) compared with LDA model and BTM model, Therefore, the mining of the word pairs’ correlation can help to improve the efficiency of short text topic clustering.

Index Terms— topic modeling, GBTM model, Gibbs sampling, cluster description, word pairs’ correlation

Mengmin Tian, Ping Lu, Jincai Chen, Min Wu
Wuhan National Laboratory for Optoelectronics, Huazhong University of Science and Technology, CHINA
Key Laboratory of Information Storage System (School of Computer Science and Technology, CHINA

[Download]

Cite: Mengmin Tian, Ping Lu, Jincai Chen, Min Wu, "GBTM: A Short Text Clustering Model Based on Word Pairing," Proceedings of 2017 the 7th International Workshop on Computer Science and Engineering, pp. 360-365, Beijing, 25-27 June, 2017.

PREVIOUS PAPER
Numerical Simulation and Analysis of the Coupled Flow Field of Multiple Propeller-Type Current-Meters Based on CFD

NEXT PAPER
Formal Verification for AltaRica3.0 Models Based on SPIN