透過您的圖書館登入
IP:18.222.67.251
  • 學位論文

結合以結構為基礎的功能區域預測方法預測基因體中蛋白質轉譯區段的有害變異

Incorporating structure-based functional site prediction in predicting deleterious protein coding region variation in human genome.

指導教授 : 楊安綏

摘要


隨著高通量技術的發展,以及各個不同定序計畫產生的序列變異的數量逐漸增加,如何應用電腦計算方法來協助解釋這些序列變異,成為大家所關注的研究議題。在現存的方法中,大多利用以序列或結構為基礎的資訊來檢測這些序列變異的影響,並且他們試著解釋這些變異在蛋白質功能的破壞或者疾病致病性上的影響是什麼。在本篇研究中,我們整合以序列為基礎的資訊以及ISMBLab功能性區域預測方法,來辨識會破壞蛋白質功能的有害的胺基酸取代。我們從VIPUR預測工具的訓練集中,蒐集8,884個蛋白質變異來建構出一個SVM的分類器,而這些蛋白質變異皆是已經有明確的實驗上證明它是否會破壞蛋白質功能的變異。從結果中可以得知,我們的分類器能夠可以推展運用至其他物種預測上,且在ROC及PR曲線下的面積皆能得到更好的數值。若和其他方法做比較的話,對於人類變異的測試資料集,我們的分類器可以得到0.405的Matthews相關係數。總結,我們提出一個整合以結構為基礎的功能區域預測方法,可以來預測胺基酸取代對於蛋白質功能的影響,另外也能夠證明衍生自ISMBLab功能區域預測的特徵值對於蛋白質變異的預測是有幫助的。

並列摘要


As high-throughput techniques advance and massive sequence variation data is generated by different sequencing projects, the application of computational methods to annotate these variations tends to be an issue of concern. Existing methods exploit sequence-based or structure-based information to interpret the effects of variations and most of them correlate the effects with the functional disruption of a protein or the disease pathogenicity. Here we present a method that integrates sequence-based information and ISMBLab functional site prediction to identify the deleterious amino acid substitutions which disrupt the functions of proteins. In this work, we collect 8,884 protein variants from VIPUR training set, which have clear experimental evidences on the disruptions of protein functions, to train a SVM classifier. The results show that our classifier can generalize to other organism with better values of the area under ROC and PR curves. Compare to other methods, the Matthews correlation coefficients for human variants testing set is 0.405. In summary, we provide an incorporating structure-based functional site prediction method to predict the effects of amino acid substitutions on protein functions, and prove that features derived from ISMBLab functional site prediction are useful for predicting protein variations.

參考文獻


1. Ward, L.D. & Kellis, M. Interpreting noncoding genetic variation in complex traits and human disease. Nat Biotechnol 30, 1095-106 (2012).
2. Lek, M. et al. Analysis of protein-coding genetic variation in 60,706 humans. Nature 536, 285-91 (2016).
3. Genomes Project, C. et al. A map of human genome variation from population-scale sequencing. Nature 467, 1061-73 (2010).
4. Kumar, P., Henikoff, S. & Ng, P.C. Predicting the effects of coding non-synonymous variants on protein function using the SIFT algorithm. Nat Protoc 4, 1073-81 (2009).
5. Choi, Y., Sims, G.E., Murphy, S., Miller, J.R. & Chan, A.P. Predicting the functional effect of amino acid substitutions and indels. PLoS One 7, e46688 (2012).

延伸閱讀