透過您的圖書館登入
IP:3.128.199.210
  • 學位論文

改善以序列為基礎之文件檢索系統之有效性與彈性

Improving the Effectiveness and Scalability of a Sequence-Based Text Retrieval System

指導教授 : 蔡益坤

並列摘要


The purpose of a text retrieval system is to locate documents from a large, textual document collection that meet a user’s needs. The SIR system is such a system that is based on the sequence model. As it was designed and implemented as a sequential, rather than a parallel application, it becomes less efficient when the size of the data collection gets larger. Another drawback of the SIR system is that the index must be rebuilt entirely when the data collections are modified. Also, compared with other models, the query evaluation process of the sequence model is time consuming. In this thesis, we seek to make improvements that address these problems. To facilitate parallel query processing, we implement three kinds of index partitioning schemes in the system, and evalauete their load balancing characteristics. To improve the scalability of index building, we design and implement a mechanism that allows the SIR system to support incremental index updates. We also make other improvements such as support of queries with homophones and support of more types of token, that make the system more flexible.

參考文獻


[1] Ricardo Baeza-Yates and Bertheir Ribeiro-Neto. Modern Information Retrieval.
full-text information retrieval. In VLDB ’94: Proceedings of the 20th International
Conference on Very Large Data Bases, pages 192–202. Morgan Kaufmann Publishers
Inc., 1994.
[3] B. Ribeiro-Neto C. Badue, R. Baeza-Yates and N. Ziviani. Distributed query processing

延伸閱讀