結合關鍵字擷取與分類器於專利文件分類

企業如果能使用有效的分析工具，在研發產品之前，先針對本身所屬的產業專利進行分析，掌握這些重要技術資訊，將可提升企業競爭力。然而，目前專利系統分析工具不夠完善，使得專利分析仍需仰賴人工方式，而必須耗費相當大的時間或人力成本，且在專利構面缺乏客觀的定義，使得技術分類或產業分類之觀點會依個人主觀判定而不同，而影響分析的結果。本論文建立一自動化分類系統，以腳踏車及非腳踏車之專利文件進行自動化分類，電腦輔助專利分析及產品研發。分類器的部分，我們使用廣義迴歸類神經網路(General Regression Neural Network, GRNN)分類器、K 個最近鄰居分類法(K-Nearest Neighbors, KNN)分類器、向量空間模型(Vector Space Model, VSM)分類器等三種分類模型，搭配訓練資料於全部特徵及逐次前饋式搜尋法(Sequential Forward Selection, SFS)特徵擷取兩種特徵組合之下，所對應而建立的兩組關鍵詞彙－文件權重矩陣來訓練分類器。本論文以F-measure及受試者作業特徵(Receiver Operating Characteristic, ROC)曲線之Az值作為績效評估準則，將各組合模組的分類結果求得的分類績效值來進行分析。實驗結果顯示，經過SFS特徵選擇之資料來建構GRNN分類器提供最佳之分類績效。就特徵資料而言SFS選擇的353個特徵來建構GRNN分類器為最佳組合（F-measure=0.933及Az=0.9859），其次為SFS選擇51個特徵來建構KNN分類器為最佳組合(F-measure=0.805)。

關鍵字

詞頻與反文件頻率；逐次前饋式搜尋法；分類器； F-measure ；受試者作業特徵曲線

並列摘要

To enhance the corporate competitiveness, the enterprises should effectively analyze related patents of the critical technology before the development of the new products. However, most patent search is still conducted manually due to the lack of reliable computerized tools. In addition, the lack of objective definition influences the quality of the patent search results. In this paper, an automatic classification system is developed for patent documents classification of bicycle and non-bicycle. The purpose of the proposed system is to assist the patent analysis and the product development. We apply the techniques of General Regression Neural Network (GRNN), K-Nearest Neighbors (KNN) and Vector Space Model (VSM) for classification. Two types of input data sets, the input features selected by the Sequential Forward Selection (SFS) and the input features without been selected by the SFS, are applied for performance comparison. The performance comparison is conducted under the combination of two types of input features and three types of classifiers. The F-measure and Az values of the Receiver Operating Characteristic (ROC) curves are applied as the indices of the performance comparison. Experimental results show that the data set processed by the SFS and the GRNN classifiers are the best combination. The 353 features selected by the SFS with the GRNN classifiers provide the best classification performance (F-measure=0.9333 and Az=0.9859). The second to the best are the for KNN classifier, the F-measure of SFS with 51 features selected by the SFS with the KNN classifiers (F-measure=0.8046).

並列關鍵字

Frequency-Inverse Document Frequency TF-IDF ； Sequential Forward Selection SFS ； classifier ； F-measure ； Receiver Operating Characteristic ROC curve

國際替代計量

結合關鍵字擷取與分類器於專利文件分類

全文下載

主題瀏覽