透過您的圖書館登入
IP:18.118.12.101
  • 學位論文

本體論為基之智慧型專利文件自動摘要方法論研究

A Novel Methodology for Automated Ontology-Based Patent Document Summarization

指導教授 : 張瑞芬

摘要


根據世界智慧財產組(WIPO, 1996)指出,專利資訊是中含有全世界90%~95%的商品化研發成果,相對於其他技術報告或期刊報導僅含有5~10%的核心技術來說,專利文件是唯一能夠完整揭露核心技術的知識文件。經WIPO調查顯示,只要公司能善用專利資訊,將可以節省40%的研發成本以及縮短60%的研發時程,因此,專利文件在知識經濟的時代扮演著極為重要的角色;然而,由於專利文件的日益遽增,人們無法有效地閱讀、組織和充分了解,另外,專利文件中包含了許多技術及法律上的專業詞彙,更增加了專利文件閱讀的困難性,因此,如何有效地組織、理解並從專利文件中擷取出重要的資訊變為知識管理領域中的ㄧ重要課題。在本論文中,我們提出了一個以本體論為基之智慧型專利文件自動摘要系統,並以動力手工具及化學機械研磨領域之知識文件來測試自動摘要系統之成效。首先,系統藉由事先定義好的動力手工具和化學機械研磨本體論樹狀架構以及TF-IDF為基之技術來擷取出專利文件中之領域關鍵字和出現頻率次數較高的字詞,並在擷取出的關鍵字詞基礎上,探勘出內容中重要的詞彙,再依據一遞迴演算法來擷取出重要的多字詞,並將重複的資訊予以整併;接著,由K-Mean分群演算法進行段落分群,將文件中擁有相同概念主題的段落聚集在一起;隨後,利用先前所取出之所有關鍵詞彙來衡量每一段落群集之資訊重要程度以挑選出候選摘要段落;最後,將候選摘要配合事先建置好的模板產出文字形式的摘要。除文字摘要之外,系統會將文件中有對應到的本體論架構樹狀節點的字詞予以標示註解並產出一視覺化圖形形式之樹狀摘要。

並列摘要


According to the report of World Intellectual Property Organization (WIPO), patent documents are the only type of documents that can totally disclose core techniques, and there are 90% to 95% R&D achievements in commercialization comparing to 5% to 10% disclosure rate of other types of documents (e.g. technical reports, and journal articles). By the investigation of WIPO, as long as a company can make the best use of patent information, it can save R&D costs by 40% and shorten the R&D time by 60%. As a consequence, patent information has been playing an important role in the era of knowledge-based economy. However, the numbers of patent documents are increasing dramatically, and most researchers cannot process, organize and understand them with an effective manner. Moreover, it is increasingly difficult for researchers to fully understand patent documents with a lot of technical and legal vocabularies in the context. In this paper, we propose an ontology-based key-phrase recognition technology for the construction of an automated summarization system. In addition, the patents of Power Hand Tool and Chemical Mechanical Polishing are used to verify the effectiveness of proposed summarization system. First, the system extracts domain key words by using a pre-defined ontology, and uses TF-IDF method to extract high frequency terms. Second, a clustering algorithm, K-Mean, is adopted, and the content with similar concept will be gathered together. Third, the candidate paragraphs are picked up from each cluster by using key words and phrases to measure every paragraph importance in each cluster. Finally, the candidate paragraphs are combined with template that is defied in advance, and the text summary is generated at this stage. In addition, the system will mark, annotate and highlight the nodes of ontology tree that are corresponding to words in the document, and produce a visualized feature of summary.

參考文獻


[97] 鄭寶庭,「使用語意認知機制建置資源調配管理系統之研究」(指導教授:戚玉樑),碩士論文,中原大學資訊管理研究所,2003年。
[94] 楊凱傑,「以本體論為基礎的可重用軟體元件搜尋方法之研究」(指導教授:王惠嘉),碩士論文,國立成功大學資訊管理研究所,2003年。
[70] Swartout, B., Patil, R., Knight, K., and Russ, T., 1997, “Toward Distributed Used of Large-Scale Ontologies,” Proceedings of Ontological Engineering, AAAI-97 Spring Symposium Series, pp. 138-148.
[74] Trappey, A. J. C., Charles, V. Trappey, and Burgess, H. S. Kao, 2006, “Automated Patent Document Summarization,” Proceedings of 10th International Conference on Computer Supported Cooperative Work in Design, May 3-5, Nanjing, China.
[53] Miller, G. A., 1990, “Wordnet: An Online Lexical Database,” International Journal of Lexicography, Vol. 3, No. 4, pp. 235-312.

被引用紀錄


方品軒(2014)。針對離岸風力發電之產業進行專利分析以及其關鍵廠商之發展趨勢〔碩士論文,國立清華大學〕。華藝線上圖書館。https://doi.org/10.6843/NTHU.2014.00282
黃翊軒(2007)。本體論為基之智慧型專利文件分類方法論研究〔碩士論文,國立清華大學〕。華藝線上圖書館。https://doi.org/10.6843/NTHU.2007.00024
葉榕真(2008)。在資料對映中運用機器學習的初探〔碩士論文,中原大學〕。華藝線上圖書館。https://doi.org/10.6840/CYCU.2008.00391
黃培婷(2010)。結合蜜蜂交配演算法與支持向量機應用於專利分類〔碩士論文,國立臺北科技大學〕。華藝線上圖書館。https://www.airitilibrary.com/Article/Detail?DocID=U0006-2507201017293100
彭馨儀(2013)。結合專利與臨床研究之整合分析 探討植牙領域之發展趨勢〔碩士論文,國立清華大學〕。華藝線上圖書館。https://www.airitilibrary.com/Article/Detail?DocID=U0016-2511201310442420

延伸閱讀