RY-Coding and Non-Homogeneous Models Can Ameliorate the Maximum-Likelihood Inferences From Nucleotide Sequence Data with Parallel Compositional Heterogeneity

In phylogenetic analyses of nucleotide sequences, 'homogeneous' substitution models, which assume the stationarity of base composition across a tree, are widely used, albeit individual sequences may bear distinctive base frequencies. In the worst-case scenario, a homogeneous model-based analysis can yield an artifactual union of two distantly related sequences that achieved similar base frequencies in parallel. Such potential difficulty can be countered by two approaches, 'RY-coding' and 'non-homogeneous' models. The former approach converts four bases into purine and pyrimidine to normalize base frequencies across a tree, while the heterogeneity in base frequency is explicitly incorporated in the latter approach. The two approaches have been applied to real-world sequence data; however, their basic properties have not been fully examined by pioneering simulation studies. Here, we assessed the performances of the maximum-likelihood analyses incorporating RY-coding and a non-homogeneous model (RY-coding and non-homogeneous analyses) on simulated data with parallel convergence to similar base composition. Both RY-coding and non-homogeneous analyses showed superior performances compared with homogeneous model-based analyses. Curiously, the performance of RY-coding analysis appeared to be significantly affected by a setting of the substitution process for sequence simulation relative to that of non-homogeneous analysis. The performance of a non-homogeneous analysis was also validated by analyzing a real-world sequence data set with significant base heterogeneity.

並列關鍵字

RY-coding ； non-homogeneous model ； model misspecification ； long-branch attraction ； compositional heterogeneity

延伸閱讀

張雅雯（2022）。Reliability Estimation in a Multicomponent Stress–Strength Model for the Generalized Exponential Distribution with a Type-I Hybrid Censoring Scheme〔碩士論文，淡江大學〕。華藝線上圖書館。https://www.airitilibrary.com/Article/Detail?DocID=U0002-0407202211522800
張連健（2020）。Using Molecular Simulation to Study the Effects of Methylation on the Structural and Mechanical properties of Double-Stranded Nucleic Acid〔碩士論文，國立交通大學〕。華藝線上圖書館。https://www.airitilibrary.com/Article/Detail?DocID=U0030-1504202111235166
Chen, H. C. (2014). Finding Potential Adverse Drug Reaction Related Nonsynonymous Single Nucleotide Polymorphism Using Online Databases and An Amino Acid Substitution Prediction Tool [master's thesis, National Tsing Hua University]. Airiti Library. https://doi.org/10.6843/NTHU.2014.00504
Peng, S. M. (2015). Cox and Random Walk Statistical Models for Dynamics of Intractable Ordinal Data: An Example of Fecal Hemoglobin Concentration [master's thesis, National Taiwan University]. Airiti Library. https://doi.org/10.6342/NTU.2015.02662
Sakawa, M., & Kato, K. (2009). An Interactive Fuzzy Satisficing Method for Multiobjective Nonlinear Integer Programming Problems with Block-Angular Structures through Genetic Algorithms with Decomposition Procedures. Advances in Operations Research, 2009(), 78-94. https://doi.org/10.1155/2009/372548

國際替代計量

RY-Coding and Non-Homogeneous Models Can Ameliorate the Maximum-Likelihood Inferences From Nucleotide Sequence Data with Parallel Compositional Heterogeneity

全文下載

主題瀏覽