透過您的圖書館登入
IP:3.137.160.137
  • 學位論文

混合樹之間的複合距離之研究

A Study on Compound Distance Problems between Two Mixture Trees

指導教授 : 阮夙姿

摘要


演化樹是一種描述物種演化關係的樹狀圖,在圖上我們可以看出物種演化的過程及物種和物種間的祖先是哪一個物種。這樣的樹可以經由物種資訊計算得到,由電腦的高速計算我們可以得到大量計算出來的演化樹,樹狀圖的比較是測量兩樹狀圖的相似程度,這是在統計學及生物資訊上重要的議題。 混合樹是在2006年由Chen和Lindsay提出的重要演化樹。混合樹帶有比傳統的演化樹更多的資訊,包含突變時間、突變位置集合等。於2008年Lin及Juan提出了一個演算法去計算二棵混合樹之距離,但是由於這個演算法所計算的距離只考慮突變時間所造成的距離,因此我們在此提出演算法去考慮突變位置集合所造成的距離。這樣一來我們便可利用我們的方法及Lin及Juan的方法去計算二棵混合樹的突變時間及突變位置集合的完整距離- 複合距離。 在本論文中,我們共定義了三種以突變位置集合所造成的距離,分別稱之為突變距離、簡單突變距離、總合突變距離。首先定義突變距離,並發展一個O(n2MaxHeight(T1, T2))演算法,其中MaxHeight(T1, T2)代表T1與T2兩棵樹中高度較高的那棵樹的高度,但為要整合Lin及Juan所提出的演算法,因此另外用Lin及Juan所提出的演算法概念發展了一個O(n2MaxHeight(T1, T2))的演算法。接著改進這個演算法使其時間複雜度縮減為O(n2)。其次,為了可以更快速的計算出二棵混合樹的突變距離,我們定義另一個測量混合樹之距離方式-簡單突變距離,發展了一個O(nlogn)的演算法,並改進時間複雜度,另外提出一個O(n)的演算法。由於突變距離及簡單突變距離計算方式相似,並且我們認為每條葉節點到根節點之路徑上的每個點在每個位置發生突變的總次數,亦可以用來當作測量距離的資料,因此定義了第三種距離定義-總合突變距離,並發展一個O(n)演算法。

並列摘要


The evolutionary tree is an important topic in bioinformation. In 2006, Chen and Lindsay proposed a new method to build the mixture tree from DNA sequences. Mixture tree is an evolutionary tree, and it has two information. One of the information is time parameter, and the other is the set of mutated sites. In 2008, Lin and Juan proposed an algorithm to compute the distance between two mixture trees. Their algorithm computes the distance with only considering the time parameter between two mixture trees. In this work, we proposes three methods to measure the similarity of two mixture trees with considering the set of mutated sites and develop algorithms to compute the distance between two mixture trees. In this thesis, we give three new definitions of distance, called mutated distance, simple mutated distance, and total mutated distance between two mixture trees by considering the set of mutated sites. For mutated distance, we give three algorithms to compute the mutated distance between two mixture trees. The time complexity are O(n2MaxHeight(T1, T2)), O(n2MaxHeight(T1, T2)) and O(n2), respectively. Where MaxHeight(T1, T2) is the maximum height of tree T1 and T2. For simple mutated distance, we design two algorithms. The time complexity are O(nlogn) and O(n), respectively. Finally, we think that the total mutated number of each site in every path between root and leaf can be used to compute the distance between two mixture trees, so we define the total mutated distance, and design an algorithm for total mutated distance in O(n).

參考文獻


[1] M. A. Bender and M. Farach-Colton, “The lca problem revisited,” Latin American Theoretical Informatics, pp. 88–94, 2000.
[2] J. Bluis, D. Shin, and J. Bluis, “Nodal distance algorithm: calculating a phylogenetic tree comparison metric,” Proc. Third IEEE Symposium on Bioinformatics and
Bioengineering (D. Shin, ed.), pp. 87–94, 2003.
[3] G. S. Brodal, R. Fagerberg, and C. N. S. Pedersen, “Computing the quartet distance between evolutionary trees in time O(nlogn),” Algorithmica, Vol. 38, No. 2, pp. 377–395,

延伸閱讀