透過您的圖書館登入
IP:3.16.83.150
  • 學位論文

應用文字探勘技術建置專利自動分類系統

Using Text Mining Technology to Construct the Automatic Patent Classification System

指導教授 : 陳灯能
本文將於2024/08/06開放下載。若您希望在開放下載時收到通知,可將文章加入收藏

摘要


現今專利的申請量迅速增長,每天都有很多發明者向各國家地區的專利局提交專利的申請。為了提高專利研究的效率,每項專利都會有各自的歸類。因此,專利分類一直是研究和實踐課題中最重要的部分。許多研究都集中在英文的專利自動分類系統上,而忽略了中文專利的重要性。中文專利的數量已慢慢增長,我們也可以透過中文的專利去了解當今亞洲國家的技術。自動專利分類系統可以快速比對識別現有專利的可能衝突,對發明者和專利審查委員而言,可幫他們節省許多人工比對成本與時間,因此是相當有價值的研究。近年來,使用國際專利分類(IPC)來進行專利文件的分類已日益普遍,而國際專利分類則是一個複雜階層式的分類系統。因此,本研究將通過支援向量機,隨機森林,XGBoost和簡單貝氏分類等的文本挖掘技術和中文分詞對中文專利的國際分類號進行分類。最終,本研究在台灣專利局的H部的中文專利中,XGBoost的實驗結果可高達93.52%的精確率。

並列摘要


Nowadays, the number of patents granted has rapidly increased. Every day, a lot of inventors have filed patent applications to the regional patent offices in different countries. To make patent research more efficient, every patent is classified. Therefore, patent classification has been one of the most significant parts for research and practical topic. Many studies have focused on the English patent automatic classification system, thereby ignoring the importance of Chinese patents. Chinese patents allow us to better understand the technology of the countries in Asia today. The automated patent classification system can quickly compare the possible conflicts with existing patents, and it can be a valuable study for inventors and patent examiners to save labor costs and time. In recent years, the use of the International Patent Classification (IPC) for the classification of patent documents has become increasingly common, while the International Patent Classification is a complex hierarchical classification system. In this paper, we apply text-mining techniques through the Chinese word segmentation with SVM, Random Forest, XGBoost and Naive Bayes techniques to classify the H section of Chinese patent documents. With all the techniques of this research, the identification of similar patents in the same category can be realized. The experimental results for the XGBoost classification can achieve 93.52% precision at the class level in the H section of the Chinese patent documents.

並列關鍵字

Patent IPC XGBoost SVM Naive Bayes Random Forest

參考文獻


Adams, S. (2000, December). Using the International Patent Classification in an online environment. World Patent Information, 291-300.
2. Adams, S. (2001, March). Comparing the IPC and the US classification systems for the patent searcher. World Patent Information, 15-23.
3. Asch, V. V. (2013). Macro- and micro-averaged evaluation measures.
4. Bates, M. (1993, February). Models of natural language understanding. Human-Machine Communication by Voice Vol. 92, 9977-9982.
5. Benzineb, K., & Guyot, J. (2011). Automated Patent Classification. Current Challenges in Patent Information Retrieval, 239-261.

延伸閱讀