處於知識經濟的時代,企業無不倡導「知識管理」的概念,真正能充分表現科技技術的獨特性、競爭力及產業的發展趨勢則是近來備受各界關注的專利資訊。如果善加利用專利資訊,則可縮短60%的研發時間及節省至少40%的研發經費,專利可說是企業在全球經濟中保持競爭力的一項利器。 過去有很多專利分析研究,利用各種集群演算法將專利分類及分群,大多數的重心都是放在使用各種分類法則的績效比較,較少研究是繼續判讀專利的內文資訊來確立分類的好壞,因為專利的閱讀本身是一件較困難的事,更遑論是大量的專利文件。又或者是請領域專家來為專利集群做解讀,不僅耗時又費工。 因此本研究提出專利領域自動命名系統的設計,在專利資料來源方面脫離需要依靠商業軟體來獲取資料的限制,使用開放原始碼工具進行專利網頁資料的蒐取。透過資訊檢索技術的輔助,探索專利文件集合的本質,根據各專利集群領域的不同,將具代表性的關鍵片語擷取出來,再透過適當的評估架構測量關鍵片語的準確性。 根據本研究的實驗所產出的二詞及三詞關鍵片語大約有近八成與該專利文件原本的分類定義相符且系統在產出第一個關鍵片語時有五成以上的準確率,且發現關鍵詞彙網路(TOPO)片語組成法較適合專利資訊的關鍵字擷取。
The study develops an automatic labeling system that may derive proper labels for the patent documents of the same classification. The algorithm used by the system is based on the kernel functions and the mutual information calculated from adjacent words. The system can extract the representative key phrases from the patent documents of the same classification that collected from the United States Patent and Trademark Office. The accuracy the labels is evaluated by applying several benchmark indicators. The results of the study show that the accuracy of key phrases approximately reaches eighty percent. The top ranked key phrase approximately reaches fifty percent of matching accuracy. The results show the key phrases derived by the system agree with the USPTO classification scheme.