Sentence-Ranking-Enhanced Keywords Extraction from Chinese Patents

Patent keywords, a high-level topic representation of patents, hold an important position in many patent-oriented mining tasks, such as classification, retrieval and translation. However, there are few studies concentrated on keywords extraction for patents in current stage, and neither exist human-annotated gold standard datasets, especially for Chinese patents. This paper introduces a new human-annotated Chinese patent dataset and proposes a sentence-ranking based Term Frequency-Inverse Document Frequency (SR based TF-IDF) algorithm for patent keywords extraction, motivated by the thought of ＂the keywords are in the key sentences＂. In the algorithm, a sentence-ranking model is constructed to filter top-K_S percent sentences from each patent based on a sentence semantic graph and heuristic rules. At last, the proposed algorithm is evaluated with TF-IDF, TextRank, word2vec weighted TextRank and Patent Keyword Extraction Algorithm (PKEA) on the homemade Chinese patent dataset and several standard benchmark datasets. The experimental results testify that our proposed algorithm effectively improves the performance of extracting keywords from Chinese patents.

關鍵字

Chinese patents ； key sentences ； sentence-ranking model ； keywords extraction ； human-annotated dataset

國際替代計量

全文下載

主題瀏覽

Sentence-Ranking-Enhanced Keywords Extraction from Chinese Patents

摘要

關鍵字

延伸閱讀

國際替代計量

本網站使用Cookies