透過您的圖書館登入
IP:216.73.216.51
  • 學位論文

以機器學習方法預測人類粒線體核糖體DNA變異之致病性

Using machine learning methods to predict the pathogenicity of human mitochondrial ribosomal DNA mutations

指導教授 : 賴飛羆 李妮鍾
若您是本文的作者,可授權文章由華藝線上圖書館中協助推廣。

摘要


本研究提出了一項對於預測人類粒線體核糖體DNA變異致病性的綜合分析。我們提出了一種基於機器學習極限梯度提升加上特徵整合的新方法,該方法集成了多個因素,包括同質性、異質性、等位基因頻率、異質性程度、變異導致的良性或致病性變化率,通過核苷酸突變熵計算的核苷酸突變的可變性與複雜性,以及核苷酸突變導致的序列信息改變(例如結構變化、酮基氨基存在等),並通過SHAP找出模型預測致病性所判定的特徵重要度。目前尚未發表任何針對人類粒線體核糖體DNA的預測方法,我們的方法是第一個且在評估數據集上取得了0.9886的F1分數。通過利用機器學習的力量並考慮粒線體核糖體DNA的獨特特徵,我們的方法為準確預測粒線體核糖體DNA變異的致病性提供了一個有價值的工具。

並列摘要


This study proposes a comprehensive analysis for predicting the pathogenicity of human mitochondrial ribosomal DNA (mt-rDNA) variations. We introduce a novel approach based on XGB model with feature integration, which integrates multiple factors including homogeneity, heterogeneity, allele frequency, heteroplasmy level, variation-induced benign or pathogenic rate of change, variability and complexity of nucleotide mutations calculated through nucleotide mutation entropy, and sequence information alterations caused by nucleotide mutations (such as structural changes and presence of keto-amino bases). Additionally, we utilize SHAP (Shapley Additive Explanations) to identify feature importance in determining the pathogenicity predicted by the model. Currently, no prediction methods specifically targeting human mt-rDNA variations have been published, and XGB with feature integration is the first to achieve an F1 score of 0.9886 on the evaluation dataset. By harnessing the power of machine learning and considering the unique characteristics of mt-rDNA, our approach provides a valuable tool for accurately predicting the pathogenicity of mitochondrial ribosomal DNA variations.

參考文獻


1. Breiman, L. Random Forests. Machine Learning 45, 5–32 (2001). https://doi.org/10.1023/A:1010933404324
2. Rokach, L., Maimon, O. (2005). Decision Trees. In: Maimon, O., Rokach, L. (eds) Data Mining and Knowledge Discovery Handbook. Springer, Boston, MA. https://doi.org/10.1007/0-387-25465-X_9
3. Tianqi Chen and Carlos Guestrin. 2016. XGBoost: A Scalable Tree Boosting System. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD '16). Association for Computing Machinery, New York, NY, USA, 785–794. https://doi.org/10.1145/2939672.2939785
4. Natekin A, Knoll A. Gradient boosting machines, a tutorial. Front Neurorobot. 2013 Dec 4;7:21. doi: 10.3389/fnbot.2013.00021. PMID: 24409142; PMCID: PMC3885826.
5. Guolin Ke, Qi Meng, Thomas Finley, Taifeng Wang, Wei Chen, Weidong Ma, Qiwei Ye, and Tie-Yan Liu. 2017. LightGBM: a highly efficient gradient boosting decision tree. In Proceedings of the 31st International Conference on Neural Information Processing Systems (NIPS'17). Curran Associates Inc., Red Hook, NY, USA, 3149–3157.

延伸閱讀