本研究提出了一項對於預測人類粒線體核糖體DNA變異致病性的綜合分析。我們提出了一種基於機器學習極限梯度提升加上特徵整合的新方法,該方法集成了多個因素,包括同質性、異質性、等位基因頻率、異質性程度、變異導致的良性或致病性變化率,通過核苷酸突變熵計算的核苷酸突變的可變性與複雜性,以及核苷酸突變導致的序列信息改變(例如結構變化、酮基氨基存在等),並通過SHAP找出模型預測致病性所判定的特徵重要度。目前尚未發表任何針對人類粒線體核糖體DNA的預測方法,我們的方法是第一個且在評估數據集上取得了0.9886的F1分數。通過利用機器學習的力量並考慮粒線體核糖體DNA的獨特特徵,我們的方法為準確預測粒線體核糖體DNA變異的致病性提供了一個有價值的工具。
This study proposes a comprehensive analysis for predicting the pathogenicity of human mitochondrial ribosomal DNA (mt-rDNA) variations. We introduce a novel approach based on XGB model with feature integration, which integrates multiple factors including homogeneity, heterogeneity, allele frequency, heteroplasmy level, variation-induced benign or pathogenic rate of change, variability and complexity of nucleotide mutations calculated through nucleotide mutation entropy, and sequence information alterations caused by nucleotide mutations (such as structural changes and presence of keto-amino bases). Additionally, we utilize SHAP (Shapley Additive Explanations) to identify feature importance in determining the pathogenicity predicted by the model. Currently, no prediction methods specifically targeting human mt-rDNA variations have been published, and XGB with feature integration is the first to achieve an F1 score of 0.9886 on the evaluation dataset. By harnessing the power of machine learning and considering the unique characteristics of mt-rDNA, our approach provides a valuable tool for accurately predicting the pathogenicity of mitochondrial ribosomal DNA variations.