透過您的圖書館登入
IP:3.145.178.157
  • 期刊

CONSTRUCTING FEATURE MODELS FOR CITRULLINATION SITES PREDICTION

利用機器學習方法預測瓜胺酸化作用位置

摘要


Protein citrullination is catalyzed by peptidylarginine deiminase (PAD), during which the positive charge of arginine is changed to the neutral charge of citrulline. Some human diseases such as rheumatoid arthritis, autoimmune diseases, and Alzheimer's disease are known to be associated with PAD enzymes and citrullinated proteins. However, none of the existing prediction tools for citrullination have resulted in a good outcome. This study was conducted to evaluate the performance of the catalyzing rules of PADs, which have been described in previous studies. Machine-learning approaches were used to construct a prediction model for citrullination sites with eight features, i.e., catalyzing rules, sequence similarity, evolutionary information, physicochemical and biochemical properties of amino acids, secondary structure, and disorder and surface accessibility, which were derived from previous studies. We then designed small data modeling and proposed a feature model selection to construct the evaluation of feature model selection (FMS) model that could predict unknown citrullination candidates. Finally, our prediction model was able to achieve an accuracy of up to 0.90 and a Matthews correlation coefficient (MCC) of 0.80, while the selected features were almost similar to those in previous biological analyses.

並列摘要


蛋白質瓜胺酸化 (citrullination) 是藉由胜肽精胺酸脫亞胺酶 (peptidyl arginine deiminase, PAD) 將受質蛋白上的 胺酸轉變成不帶電的瓜胺酸,瓜胺酸化與類風濕性關 節炎、多發性硬化症和阿茲海默症等疾病有關。現今尚未有針對瓜胺酸化的相關工具 提供使用。而本研究從PADs文獻提及之可能催化規則,重新評估其可用性。並基於催化規則衍生8種特徵;催化規則、序列相似度、演化保留訊息、胺基酸理化和生化特性、二級結構、蛋白質不穩定結構,以機器學習建構預測模型。再提出針對小資料設計建模方法與特徵模型選擇,用以預測未知辨識瓜胺酸化候選蛋白質,最終模型準確可達準確度0.90及馬修斯相關係數0.80。而本預測模型所使用的特徵亦與前人文獻中之胜肽精胺酸脫亞胺酶催化規則有所呼應。

參考文獻


Anzilotti, C.,Pratesi, F.,Tommasi, C.,Migliorini, P.(2010).Peptidylarginine deiminase 4 and citrullination in health and disease.Autoimmunity Reviews.9(3),158-160.
Arita, K.,Shimizu, T.,Hashimoto, H.,Hidaka, Y.,Yamada, M.,Sato, M.(2006).Structural basis for histone N-terminal recognition by human peptidylarginine deiminase 4.Proceedings of the National Academy of Sciences of the United States of America.103(14),5291-5296.
Atchley, W. R.,Zhao, J.,Fernandes, A. D.,Druke, T.(2005).Solving the protein sequence metric problem.Proceedings of the National Academy of Sciences of the United States of America.102(18),6395-6400.
Baka, Z.,Barta, P.,Losonczy, G.,Krenacs, T.,Papay, J.,Szarka, E.(2011).Specific expression of PAD4 and citrullinated proteins in lung cancer is not associated with anti-CCP antibody production.International Immunology.23(6),405-414.
Boeckmann, B.,Bairoch, A.,Apweiler, R.,Blatter, M.-C.,Estreicher, A.,Gasteiger, E.(2003).The SWISSPROT protein knowledgebase and its supplement TrEMBL in 2003.Nucleic Acids Research.31(1),365-370.

延伸閱讀