DOI: 10.18178/wcse.2019.06.106
Research on Network Public Opinion Detection Based on Improved TF-IDF Algorithm
Abstract— TF-IDF algorithm is a widely used text feature weighting technology. The core idea of TF-IDF
algorithm is as follows: In a corpus, if a participle appears frequently in a certain text and appears less in
other texts, then it proves that the participle has a good feature of expression to this text. Although this idea is
very simple, it also faces some problems in practical applications. Because it blindly increased the
importance of uncommon words in the text and this blindness will also appear in the field of public opinion
monitoring. In order to solve the mentioned problem, this thesis has done the following work:
Introduce the lexical weight coefficient of the characteristic word into TF-IDF;
Introduce the word position weight (span weight) coefficient into TF-IDF.
The experiment proves that the improved TF-IDF method highlights the importance of text feature
words and facilitates classification. Furthermore, the improved method is applied to the public opinion
analysis system and got good results.
Index Terms— Network Public Opinion; Cosine Similarity; TF-IDF; Emotional Analysis.
Lu Peng, Zongfeng Qin
City College, Wuhan University of Science and Technology, CHINA
Cite: Lu Peng, Zongfeng Qin, "Research on Network Public Opinion Detection Based on Improved TF-IDF Algorithm," Proceedings of 2019 the 9th International Workshop on Computer Science and Engineering, pp. 715-720, Hong Kong, 15-17 June, 2019.