知識發掘之技術於智慧型資訊檢索系統之研究

摘要隨著網際網路迅速地發展，網路資訊的檢索系統，雖提供大量資料給使用者，但使用者卻必須花更多的時間來選取資料，且資料如橫跨各領域時，很難考慮使用者的真正所需。在資訊檢索(Information Retrieval)系統中，關鍵字(Index Term)是用來索引文章的關鍵內容。但是在許多資訊檢索系統中，關鍵字之間的關係是被忽略的。這些關係(Relationships)可以是語義上的(Semantic)或者是量化的(Quantize)。不同的關鍵字可能代表不同的意義，也可能代表相同的意義，這些關係也可以應用在資訊檢索系統中。本論文提出一些具有讓個人化更可行且更有效率的自動化資訊檢索架構。使用量化的方式，從所蒐集網際網路文件自動建構關鍵字之間相似的程度，以建立領域知識(Domain Knowledge)的本體論(Ontology)架構。這種能力將藉由概念式階層的叢集演算法(Hierarchical Clustering Algorithms)來實現。本論文以政府採購法的條文為個案研究，任何法律條文是由這個領域專家所研訂，理論上用詞應很嚴謹，本實驗結果可發現部分用詞的相似性很高，可能是同一意義的相同用詞，也可能是不同意義的相關用詞，個案研究證明所提出方法的正確性。本論文方法可提供文章前後用詞的準確性，以求用詞的統一，且本架構的建構方式以自動化的方式建立，可重覆用在建立不同領域的法律知識庫(Knowledge Base)上。

關鍵字

資訊檢索；本體論；叢集演算法；知識庫

並列摘要

ABSTRACT Along with the Internet development rapidly, although the web net information retrieval systems provide a large volume of information to the users, the users have to spend a lot of time to screen the information. While the information crossing the different fields, it is difficult to consider the users' real need. In the Information Retrieval System, the Index Term is to search the key word for the content of articles. However, in many Information Retrieval Systems, the relationship between Index terms is overlooked. These relationships can be semantic or quantized. The different index terms may stand for different meaning, and may stand for the same meaning. These relationships can also apply to the Information Retrieval System. This thesis presents an automatic Information Retrieval System structure, which is more workable and more efficient for the personal computer users. Using the quantized method to collect the similar index terms from the Internet documents through the automatic information structure to build an Ontology structure of Domain Knowledge. This function is to be accomplished by the concept of Hierarchical Clustering Algorithms. This thesis is presented as an individual research according to the Government procurement regulation. All the terms in the regulations are made by the experts in this field. Theoretically the terms are very strict.From the experiment result, you can find that part of the terms are quite similar, of which same terms are with the same meaning, while some related terms are with different meaning. This individual research has proved the correctness of the method. The Hierarchical Clustering Algorithms provide accuracy of the wording in the whole articles in order to use the terms in unity. This Ontology structure of Domain Knowledge is built by automatic method that can be multi-used in different Law Knowledge Base.

並列關鍵字

Information Retrieval ； Ontology ； Clustering Algorithms ； Knowledge Base

參考文獻

4. Eleanor D. Dym , Subject and Information Analysis , Marcel Dekker Publishers, New York ,1985.

6. Lagus,K.:Kaski,s.,”Keyword selection method for characterizing text document maps”,artificial Neural Networks,1999.ICANN 99.Ninth International Conference on (Conf.Publ. No.470),Page(s):371-376 vol.1,1999

7. O’Neil,P.,”An incremental approach to text representation, categorization, and retrieval”, Proceedings of the fourth international Conference on Volume:2,Page(s):714-717 vol.2,1997

1. 政府採購法，如行政院公共工程委員會網址: http://www.pcc.gov.tw/

Google Scholar

2. Baeza-Yates Ricardo, Berthier Ribeiro-Neto, Modern information Retrieval, Addison-Wesley Publishers,New York,1999

Google Scholar

被引用紀錄

欒富安（2009）。結合動態分群與字詞類型權重觀念的分散式新聞查詢模型〔碩士論文，中原大學〕。華藝線上圖書館。https://doi.org/10.6840/CYCU.2009.00813

劉錦興（2011）。台電事故案例系統之設計及群集演算法於文件搜尋之應用〔碩士論文，國立臺北科技大學〕。華藝線上圖書館。https://www.airitilibrary.com/Article/Detail?DocID=U0006-2008201118164200

國際替代計量

知識發掘之技術於智慧型資訊檢索系統之研究

主題瀏覽