支援布林查詢的以序列為基礎之文件檢索系統

In most text retrieval models, relevance is judged using keywords. In contrast, the sequence model judges relevance by the similarity between character sequences. The sequences suggest the importance of positional information, which can avoid the Chinese word segmentation problem when applied to Chinese text retrieval. The sequence model can satisfy users’ information needs for long natural queries about some specific terms, because the query is represented as a sequence. This model can be enhanced by allowing Boolean queries, which can describe a user’s information needs more precisely, especially when the user is highly trained. In this study, a method based on Fuzzy Set Theory, which supports Boolean queries in the sequence model, is proposed. In addition, two algorithms are introduced by transforming the Boolean queries into the Disjunctive Normal Form (DNF) or the Conjunctive Normal Form (CNF). For the sake of efficiency, these algorithms are designed to obtain approximate results. In this work, the three algorithms are incorporated into a new implementation in C/C++. This version of the system also improves the efficiency of the query process, since efficiency is always an issue of the SIR system, an implementation of the sequence model.

並列關鍵字

Boolean Operators ； Boolean Queries ； Information Retrieval ； Text Retrieval ； Sequence Model ； Sequence Similarity

參考文獻

[9] K.J. Chen Huang, C.R. and Li-Li Chang. Segmentation standard for chinese natural

[1] Ricardo Baeza-Yates and Berthier Ribeiro-Neto. Modern Information Retrieval.

[2] A. BOOKSTEIN. Fuzzy requests: An approach to weighted boolean searches. J.

text retrieval without using a dictionary. In Proceedings of the 20th Annual

[4] K.J. Chen and Wei-Yun Ma. Unknown word extraction for chinese documents.

國際替代計量

支援布林查詢的以序列為基礎之文件檢索系統

全文下載

主題瀏覽