以相關性辭典建構為基礎---實現複合關鍵字之概念查詢擴張

透過網路尋找資料已成為人們在搜尋資料時不可或缺的輔助工具，目前最為普遍的方式為透過提供搜尋引擎的站台，以「全文檢索」的方式，將符合使用者輸入關鍵字之資料搜尋出來；然而，網際網路上充斥的資訊繁雜，單單以搜尋符合關鍵字的資料回應將會遺失許多資訊，如此的做法已無法滿足大多數使用者；因為對使用者而言，除了符合關鍵字的資料外，與此關鍵字高度相關的其他關鍵字亦可能是其所需要的。要達成這樣的目標，一個具有詞彙與詞彙間關聯的詞庫是必備元件，因此，一個具備詞與詞間關聯性的詞庫便應運而生，以達到查詢時能一併搜尋並回應相同概念資料的目的，要達到這樣的目的，便需將這樣的詞庫做為中介以提供更佳的搜尋回應，此類詞庫中較著名的如：WordNet、HowNet等，其皆以人工方式定義字詞，並建立字詞與字詞之關聯性以供檢索時搭配使用，然具有需要耗費大量人力、時間來維護詞庫以更新新生詞彙等缺點；若能透過自動化方式建立辭庫，建立詞彙與詞彙間之關聯性，且能運用於任何領域中，可有效改進上述缺點。相關性辭典建構完成後，對於關鍵字之查詢便可依此辭典做查詢擴張並回應使用者所需之資訊；然而，許多時候使用者需要的可能不僅僅只是單一關鍵字的搜尋，因為如此的回應資訊可能太多且許多未必真是其所要的，因此，我們在此提供複合關鍵字的查詢擴張，即提供使用者以輸入句子的方式，搜尋並回應符合此句概念的資訊；並以階層展現方式列示。本研究成果為以自動化的方式建構相關性辭典，並可於不變更系統框架的前提下，適用於不同領域；此外，一致的查詢介面、透過句子而非僅僅關鍵字的查詢輸入、自動與互動兼具的查詢擴張，提供使用者更彈性的查詢方式與回應。

關鍵字

相關性辭典；查詢擴張；複合關鍵字；向量空間

並列摘要

It has been an important way for people to acquire information through network nowadays. The most common mehtod which search engines use is to search with unique keyword by Full-Text Retrieval. However, there may be a lot of information which could not be found in this way but is also useful for users. Instead, this kind of informtaion could be found by the keywords which are highly relative to the one that users use. To offer this kind of information, a thesauri which keeps the similarity of words is preprequisite. It could help search engines to search with relative keywords for the information with the same concept. In thesauries such as WordNet and HowNet, the similarity of words is built manually so far. And the imperfection of it is that it needs too much man power and time for similarity maintenance. If there is an automatic way to build the thesauri and the similarity, that would be much better. With Similarity Thesauri, we could do Query Expansion for keyword searching and offer the information that users need. However, sometimes users might query information with not only one word. Therefore, we also do Query Expansion for multiplex words searching. That is, we accept sentences as keywords, and show the result of searching by Hierarchy Presention. The object of this research is to build Similarity Thesauri in automatic way and to make it suitable for different domain without changing system frame. It also tries to offer a more flexible way of query and responding with consistant interface, sentence searching and Query Expansion which has both automatic and interactive funtcions.

並列關鍵字

Similarity Thesauri ； Query Expansion ； Multiplex Keywords ； Vector Space

參考文獻

[6] Fagan, J. L. "The Effectiveness of a Nonsyntactic Approach to Automatic Phrase Indexing for Document Retrieval," Journal of American Society for Information Science, 40(2), 1989, 115-132.

[7] Fellbaum, C. WORDNET. An Electronic Lexical Database.，1988，The MIT Press.

[10] Jones, L. P., Gassie, E. W., & Radhakrishnan, S. "INDEX: The Statistical Basis for an Automatic Conceptual Phrase-indexing System," Journal of American Society for Information Science, 41(2), 1990, 87-98.

[11]「Modern Information Retrieval」,ACM Press,1999.

[12] Paijmans, H, "Comparing the Document Representation of Two IR Systems: CLARIT and TOPIC," Journal of American Society for Information Science, 44(7), 1993, 383-392.

被引用紀錄

蔡元豪（2012）。萃智(TRIZ)演化趨勢專利分類系統建置—氣動打釘機之實證研究〔碩士論文，國立臺北科技大學〕。華藝線上圖書館。https://doi.org/10.6841/NTUT.2012.00604

國際替代計量

以相關性辭典建構為基礎---實現複合關鍵字之概念查詢擴張

主題瀏覽