透過您的圖書館登入
IP:3.147.66.178
  • 學位論文

深度圖類神經網路學習蛋白質之動力耦合功能預測

Dynamics-informed Protein Function Prediction through Deep Graph Neural Network

指導教授 : 張書瑋

摘要


探討蛋白質分子結構和功能之間的關係對於新興生物材料、藥物、抗體的開發具有高度的重要性,影響遍及生物科技及藥物製造產業。近期許多研究透過圖類神經網路(graph neural networks, GNNs)學習蛋白質結構和拓樸的性質,進而預測蛋白質的功能。最新的端到端(end-to-end)深度學習(deep learning)模型PersGNN在蛋白質之基因本體(gene ontology)的功能分類表現上,相較於只包含結構特徵的深度學習模型,有顯著的提升。然而,在生物體內的蛋白質是動態而非靜態的分子,和環境互相作用下不斷改變構型,甚至和其他蛋白質單體形成四級複合結構(quaternary complexes)。 本研究展示如何利用圖類神經網路的穩健性和表現性,嵌入從胺基酸之間所獲得有關空間分布(spatial proximity)、拓樸同調(persistence homology)、以及簡正模分析(normal mode analysis)的資訊。模型從局部到全域集結各個節點的特徵,胺基酸之間經由動力特性耦合彼此的資訊,嵌入並表示成圖作為蛋白質的表徵(protein representation)。我們針對7,765個從Protein Data Bank取得的蛋白質和155種從Gene Ontology資料庫取得的功能標籤進行多標籤分類(multi-label classification),獲得的結果證實含有動力資訊的表徵能夠大幅提升分類表現。我們也利用胺基酸尺度的激發圖(activation map)偵測蛋白質中重要的功能胺基酸。透過比較包含動力資訊和沒有包含動力資訊的激發圖,我們進一步找出動力激發的胺基酸(dynamically-activated residues, DARs)。本研究提出的方法能夠從分子的動力行為取得強烈的推論,並能夠延伸至廣泛能被表示成圖的晶態或非晶態的材料。

並列摘要


Understanding the protein structure-function relationship is essential for broad biological fields, with important applications in the design of new functional biomaterials, drugs, and antibiotics for biotechnology and pharmaceutical industries. Recent advances in protein function prediction take advantage of graph-based deep learning approaches to correlate protein 3D structure and topological features with molecular functions. The latest end-to-end model PersGNN had achieved a boost in Gene Ontology classification compared with baseline graph neural networks (GNNs). However, proteins \textit{in vivo} are not static but dynamic molecules interacting with the environment, constantly alternating conformation, and even forming assemblies of quaternary complexes. This work presents the expressiveness and robustness of GNNs to encode information from spatial proximity, persistent homology, together with normal mode analysis across protein residues. The learned protein representation aggregates node-level features from local to global hierarchy and provides graph-level embedding with inter-residue dynamical couplings for downstream function prediction. We perform multi-label classification task for 7,765 proteins with 155 different function annotations from Protein Data Bank and Gene Ontology. Our result demonstrates a remarkable performance gain in the discriminatory power based on the dynamics-informed representation. We also exploit the residue-level activation map to detect the important functional residues in the protein. By comparing the activation map with and without dynamics information, we further identify the dynamically-activated residues (DARs). The proposed method has strong inference from the dynamics of molecules and can be readily extended to wide crystalline and amorphous materials that can naturally be represented as graph-structured data.

參考文獻


[1] Helen Berman, Kim Henrick, and Haruki Nakamura. Announcing the worldwide protein data bank. Nature Structural Molecular Biology, 10(12):980–980, 2003.
[2] Uniprot: the universal protein knowledgebase in 2021. Nucleic Acids Research, 49 (D1):D480–D489, 2021.
[3] Andrew Waterhouse, Martino Bertoni, Stefan Bienert, Gabriel Studer, Gerardo Tauriello, Rafal Gumienny, Florian T Heer, Tjaart A P de Beer, Christine Rempfer, Lorenza Bordoli, et al. Swiss-model: homology modelling of protein structures and complexes. Nucleic acids research, 46(W1):W296–W303, 2018.
[4] John Jumper, Richard Evans, Alexander Pritzel, Tim Green, Michael Figurnov, Olaf Ronneberger, Kathryn Tunyasuvunakool, Russ Bates, Augustin Z´ıdek, Annaˇ Potapenko, et al. Highly accurate protein structure prediction with alphafold. Nature, pages 1–11, 2021.
[5] Minkyung Baek, Frank DiMaio, Ivan Anishchenko, Justas Dauparas, Sergey Ovchinnikov, Gyu Rie Lee, Jue Wang, Qian Cong, Lisa N Kinch, R Dustin Schaeffer, et al. Accurate prediction of protein structures and interactions using a three-track neural network. Science, 2021.

延伸閱讀