透過您的圖書館登入
IP:3.144.250.223
  • 學位論文

樹狀結構編輯距離在可擴展標記語言文件上之分析研究

Analysis of Tree Edit Distance on XML Data

指導教授 : 顏嗣鈞

摘要


在資訊科學研究中,樹狀結構比較問題廣泛地出現在各種領域,其中也包括了可擴展標記語言文件的處理。而在解決樹狀結構比較問題時,樹狀結構編輯距離是一個常見且重要,並可用來量化樹狀結構差異的方法。因此,在比較大量串流可擴展標記語言文件時,有效率的樹狀結構嵌入演算法就相當重要。 在本篇論文中,我們提出一個新的樹狀距離嵌入演算法,用來比較從串流可擴展標記語言文件中取得的無標記有序樹狀結構。與最近的研究相較,我們的貢獻是在不增加時間及空間複雜度的情況下,簡化得到樹狀距離的過程。同時我們也分析了這個演算法在樹狀結構編輯距離上造成失真的上下界以及誤差的機率。

並列摘要


The problem of comparing tree structures occurs in various areas in computer science and engineering, including the application to XML data processing. To solve this problem, tree edit distance is a common and significant measurement defining the difference between two tree structures quantitatively. Efficient tree edit distance embedding algorithms are therefore of significant importance in comparing large streaming XML document trees. In this thesis, we propose a new algorithm to obtain edit distance between unlabeled ordered trees derived from streaming XML data. In comparison with the previous work, our contribution lies in simplifying the procedure of obtaining the tree edit distance without increasing the time and space complexities. The upper and lower bounds as well as the error probability of our algorithm are also analyzed in this thesis.

參考文獻


[1] P. Bille. A Survey on Tree Edit Distance and Related Problems. Theoretical Computer Science, 337(1-3): 217-239, 2005.
[2] Kaizhong Zhang and Dennis Shasha. Simple Fast Algorithms for the Editing Distance between Trees and Related Problems. SIAM Journal of Computing, 18: 1245–1262, 1989.
[4] A. Micheli and D. Rossin. Edit Distance Between Unlabeled Ordered Trees. RAIRO - Theoretical Informatics and Applications, 40: 593-609, 2006,
[6] M. Garofalakis and A. Kumar. XML Stream Processing Using Tree-Edit Distance Embeddings. ACM Transactions on Database Systems, 30(1): 279-332, 2005.
[8] T. Grust and M. van Keulen. Tree Awareness for Relational DBMS Kernels: Staircase Join. Intelligent Search on XML Data, 231-245, 2003.

延伸閱讀