透過您的圖書館登入
IP:216.73.216.60
  • 學位論文

A Multi-level Hierarchical Index Structure for Supporting Efficient Similarity Search of Tagsets

A Multi-level Hierarchical Index Structure for Supporting Efficient Similarity Search of Tagsets

指導教授 : Jia-Ling Koh Chung-Wen Cho
若您是本文的作者,可授權文章由華藝線上圖書館中協助推廣。

摘要


In this thesis, we propose a multi-level hierarchical index structure to support efficient similarity search for tagsets. The proposed method is designed based on a previous method which supports similarity search in transaction databases with a two-level bounding mechanism. Similar to the previous method, the tagsets are incrementally grouped into clusters. However, a cluster may have sub-clusters in our approach. The tagsets in a leaf-cluster are grouped into batches. Three different thresholds are used to control the degree of similarity at each level of the index structure. Furthermore, we require the tagsets in the same cluster containing at least one common tag to prevent from grouping unrelated tagsets into a cluster. The experimental results show that the proposed multi-level hierarchical index structure provides better performance on execution time of searching than both the proposed method and the naïve method significantly. Besides, with the assistant of an inverted list of clusters, the execution time of the proposed method for deletion and updating is also much better than the other two methods.

並列摘要


In this thesis, we propose a multi-level hierarchical index structure to support efficient similarity search for tagsets. The proposed method is designed based on a previous method which supports similarity search in transaction databases with a two-level bounding mechanism. Similar to the previous method, the tagsets are incrementally grouped into clusters. However, a cluster may have sub-clusters in our approach. The tagsets in a leaf-cluster are grouped into batches. Three different thresholds are used to control the degree of similarity at each level of the index structure. Furthermore, we require the tagsets in the same cluster containing at least one common tag to prevent from grouping unrelated tagsets into a cluster. The experimental results show that the proposed multi-level hierarchical index structure provides better performance on execution time of searching than both the proposed method and the naïve method significantly. Besides, with the assistant of an inverted list of clusters, the execution time of the proposed method for deletion and updating is also much better than the other two methods.

參考文獻


[8]. C. Ordonez, E. Omiecinski, N. Ezquerra, “A fast Algorithm to Cluster High Dimensional Basket Data” in Proceedings in Proceedings of 17th International Conference on Data Engineering (IEEE), 2001.
[10]. C. Xiao, W. Wang, X. Lin, H. Shang, “Top-k Set similarity Joins”, in Proceedings of the IEEE International Conference on Data Engineering (ICDE), 2009.
[11]. C. Xiao, W. Wang, X. Lin, “Ed-Join: An Efficient Algorithm for Similarity Joins With Edit Distance Constraints” in Proceeding of the 17th International Journal on Very Large Data Bases (VLDB), 2008.
[16]. J. Chuang, C. Cho, A. Chen, “Similarity Search in Transaction Databases with a Two Level Bounding Mechanism” in Proceeding of the 11th International Conference of Database Systems for Advanced Applications (DASFAA), 2006.
[17]. N. Mamoulis, D. Cheung, W. Lian, “Similarity Search in Sets and Categorical Data Using the Signature Tree” in Proceedings of 19th International Conference on Data Engineering (IEEE), 2003.

延伸閱讀