在過去,生物分類學家以生物的形態及生理特徵做為物種分類的依據,再佐以解剖學或化石來驗證物種之間的關係,並依照物種的相互關係畫出演化樹。在發現生物分子序列後,科學家嘗試利用這些分子序列來分析物種間的相互關係,並陸續發展出各種不同的方法來建立演化樹。而利用演化樹分析基因間的關係,在生物科技研究方面也扮演重要的角色。本論文將研究5種演化樹建立方法:無權重群組算數平均法 (UPGMA) 、Fitch-Margoliash (FM) 法、鄰點加入法 (NJ) 、最大吝嗇度法 (MP) 及最大可能性法 (ML) ,以及一些評估方法的基本理論﹔然後再利用現有之軟體,對不同的分子序列資料來推導演化樹,比較各種建立演化樹方法所需之執行時間,以bootstrapping統計方法來評估所建立演化樹可能出現的機率。另外再依據模型樹模擬操作分類單元 (OTU,operational taxonomic unit) 的序列資料,用各種方法推導出演化樹後再與模型樹比較正確度,以此比較各種建立演化樹方法之差異。實際的實驗利用seqgen程式及PHYLIP現有之軟體進行,在改變OTU數目比較執行時間的實驗後發現,徹底搜尋法dnaml (ML) 、dnapars (MP) 、fitch (FM) 三者以MP的執行速度較快,FM法在OTU個數少時較ML快,但超過30個OTU後就比ML慢。逐步群集法NJ與UPGMA二者中以UPGMA較快。在準確度分析方面發現5種演化樹建立方法各有優缺點,整體而言在OTU序列長度越小時,ML表現比其他方法好,OTU序列長度越長時,FM與NJ表現較好。
In the past, the biological taxonomist classified the organisms according to the morphological and physiological characters, and confirmed the relationship of species by anatomy or fossils. Then, an evolutionary tree was drawn to show the relationship of species. After discovering the biological molecular sequences, the scientists have attempted to analyze the relationship between taxa by their molecular sequences, and continually developed different methods for constructing phylogenetic trees. And using the phylogenetic trees to analyze the relationship between genes plays an important role in the biotechnology area. This thesis studies the elementary theories of five construction methods (ML, MP, FM, NJ and UPGMA) and evaluation methods for phylogenetic trees. Then, the software on the Web is used to construct the phylogenetic trees with different molecular sequence data. The execution time of different methods is compared. Bootstrapping is used to estimate the appearance probability of inferred trees from the construction methods. Moreover, a model tree is used to simulate the sequence data of operational taxonomic units (OTUs). Phylogenetic trees are reconstructed with the generated data by the five construction methods, and are compared with the model (correct) tree. The correctness of the phylogenetic tree by the five methods is then analyzed. The ‘seqgen’ program and the software on the PHYLIP are used to perform the experiments. From the results of the experiment of execution time with different OTU numbers, the MP method is the fastest one in the three exhaustive methods. The FM method is faster than ML with smaller OTU numbers but slower than ML when the OTU number is over thirty. The UPGMA method is faster than NJ in the two stepwise clustering methods. In the aspect of accuracy analysis, the five methods have respective advantages and disadvantages. In general, the ML method is better than others with shorter OTU sequence lengths, the FM and NJ methods are better than others with longer OTU sequence lengths.