透過您的圖書館登入
IP:3.136.22.50
  • 學位論文

版本文件中對時間查詢最佳化之索引架構

Index Framework for Efficient Time-Travel Phrase Queries on Versioned Documents

指導教授 : 韓永楷

摘要


近來版本文件增長有飛躍上的速度。搜索這類文件通常有時間範圍的限制。如何進行索引文件這些種類和回答time-travel phrase query在資料檢索的社群裡被廣泛討論。在本文中,我們提出了基於後綴樹索引文件新的框架來幫這些文件做索引。這個框架可以在保證空間複雜度O(nlogn + nlogT) bits和查詢的複雜度O(plogn + k)來存儲版本文件的索引,並支持任何pattern的查詢。在實作中,我們討論了一些實際問題,在此框架,並用簡潔的數據結構,減少我們的空間到接近為O(nlogT) bits。同時,我們也做了實驗來證明我們的概念。最後,我們還簡要提出了關於這個框架的延伸.

並列摘要


The volume of versioned documents is growing very quickly nowadays. How to exploit the redundancy among the documents to index the documents compactly, while supporting effi- cient time-constrained keyword queries has been a hot topic recently [Anand et al., SIGIR’11, SIGIR’12; He et al., CIKM’09, CIKM’10, SIGIR’11, SIGIR’12]. In this paper, we propose a new framework to index versioned documents, and extend the keyword queries into the more general phrase queries. Our index is based on suffix tree, taking O(n) space and answering a one-sided time-constrained phrase query for any phrase P in O((|P | + k) log n) time, where n is the dataset size and k is the output size. We discuss how to tune our framework with realistic assumptions; our experiments shows that under similar space budgets, our index supports queries five times faster than the baseline inverted lists when the query length |P| is at least four.

參考文獻


[4] Peter G Anick and Rex A Flynn. Versioning a full-text information retrieval system. In
[12] Roberto Grossi and Jeffrey Scott Vitter. Compressed suffix arrays and suffix trees with applications to text indexing and string matching. SIAM Journal on Computing, 35(2):378–407, 2005.
[13] Antonin Guttman. R-trees: A dynamic index structure for spatial searching, volume 14. ACM, 1984.
[18] Hitwise. Hitwise: Search queries are getting longer.
[19] Wing-Kai Hon, Rahul Shah, and Jeffrey Scott Vitter. Space-efficient framework for

延伸閱讀