DOI: 10.18178/wcse.2019.06.107
Keyphrase Generation with a Seq2seq Model
Abstract— Keyphrases provide the core information of the source text, thus useful in many applications.
Previous researches focus on extracting the keywords from the original document but miss those absent
keyphrases unseen from the source. So we propose a generative model based on seq2seq RNN, which can
generate both present and absent keyphrases by capturing the semantic information of the source. We adopt
the large vocabulary trick to construct the target words corpus so as to improve the efficiency. We also
introduce the feature-rich encoders to leverage the linguistic and statistical information in the source.
Additionally, we incorporate the switching generator-pointer mechanism to extract those out-of-vocabulary
words from the original document. To evaluate our model, we conduct two tasks, i.e., predicting present
keyphrases and generating absent keyphrases, on real-life datasets. The results prove the effectiveness of our
model as it outperforms the state-of-the-are models consistently and significantly.
Index Terms— Keyphrase Generation; Seq2seq Model; Recurrent Neural Network
Pengfei Zhang, Dan Li, Yuheng Wang, Yang Fang
National University of Defense Technology, CHINA
Cite: Pengfei Zhang, Dan Li, Yuheng Wang, Yang Fang, "Keyphrase Generation with a Seq2seq Model," Proceedings of 2019 the 9th International Workshop on Computer Science and Engineering, pp. 721-727, Hong Kong, 15-17 June, 2019.