透過您的圖書館登入
IP:13.59.9.236
  • 學位論文

以變分自編碼器輔助平均力勢能之計算

Variational Autoencoder Enhanced Calculation of Potential of Mean Force

指導教授 : 張書瑋

摘要


近年來,由於計算機器性能之提升,從有線元素法之於連體力學、分子動力學之於顆粒力學到密度泛函理論之於量子力學,越來越多複雜的物理問題得以用數值模擬方法求解。其中分子動力學更使我們得以用無法從實驗中得到的高時空解析度來探討生物分子的行為,譬如蛋白質之折疊延展與配體基體之結合分離。 但縱使機器發展是如此地蓬勃,分子動力模擬依然有其模型規模與模擬歷時之限制,這兩樣限制分別造成了原子模型與真實世界之差異以及觀測物理量之抽樣不足,繼而導致結果之偏移與不確定性。 同樣受惠於機器發展的另一領域為機器學習。基於神經網絡的概念,機器學習模型藉由自行調整自身參數來學習資料的特徵與流向,在輸入與輸出之間取得一條最佳映射路徑。這項能力使我們能以更簡易的型態來了解巨量且高維度的資料,進而掌握其模式並重建歷史以及預測未來。在其中,有些機器模型在充足學習後甚至能具有「創造性」,如對抗生成網絡以及變分自編碼器。許多研究開始將機器學習納入分子動力模擬以求得更高的計算效率但同時又不失準確性,舉例來說,分子結構、性質之預測與模擬後數據處理。 在此文章中,我們提出藉由機器學習方法輔佐系統自由能計算,以降低分子動力計算負擔。基於分子動力學本立於熱統計力學理論之上,而熱統計力學又實為探討物理性質機率分佈之學問,我們選擇同樣具分佈推論性質的變分自編碼器做為機器學習之模型。我們用此模型學習模擬結果,並嘗試為真實物理性質之分佈與機器學習之潛變數分佈進行連結,進而生成後續計算所需資料。 在本論文中我們提出兩個策略,分別為重建等速下之自由能差異分布與外插慢速模擬之自由能差異。一個簡易數值模型與一個小蛋白分子模型將會作為驗證之標的。結果顯示,在簡易數值模型中,兩個策略皆符合期待。至於小蛋白分子模型,僅策略一有效成功。

並列摘要


Recently, scientific computing has progressed a lot in many fields due to the improvement of computational capacity. A number of numerical methods, such as, finite element method (FEM), molecular dynamics (MD) and density functional theory (DFT), have been developed for increasingly complex physical systems. Among these methods, MD allows us to probe the complex process of biophysical systems, such as ligand binding-unbinding and protein folding-unfolding, with a high spatiotemporal resolution [1]. Even though the computational power is at such a high level, there remains limits for the reachable size of simulation models and feasible computational time, which correspond to the complexity and time scale of the systems. These limits lead to the noticeable difference between simulations and reality, and the insufficient sampling of underlying free energy surface and kinetics [1]. Another emerging tool benefiting from the increasing resource and capacity of computational technology is machine learning (ML). Based on the concept of neural network, ML models find the optimal path and mapping between input data and output by learning the features and regularities of data during training. This ability could be used to identify the pattern and to further predict the future [2] or to reconstruct the history [3]. The flexibility and simplicity of ML model also make itself a powerful tool to reveal the characteristics of deluge of data and to transform them into a human comprehensible form [1][4]. Moreover, a generative function of ML model is possible after it understands the data, for instance, generative adversarial network (GAN) and variational autoencoder (VAE) [3][5]. Researchers have proposed new methods based on machine learning (ML) algorithm to seek a higher efficiency for both accuracy and time consuming. Applications include molecular structure prediction [6][7], property prediction [8][9][10][11][12][13] and the post data analyzing [1][14]. In this article, we proposed two methods based on ML to reduce the required computing power for solving the free energy problem with MD simulations. Since MD is based on the theory of thermodynamics, a field of studying the properties, distribution and probability of microscopic states [3], we choose VAE as our ML model for it is established on the variational Bayesian method, a technique allowing us to re-formulate statistical inference problems [15][16]. We intend to use this unsupervised ML model to learn from the MD results and to build the connection between the distributions of microscopic state and the latent variable. Furthermore, we manipulate this relationship and augment the data required for enhancing the post statistical computation. Two schemes are proposed to increase the efficiency of free energy calculation. One is to predict the free energy curve of slow simulation, another is to reconstruct the distribution of free energy difference. These schemes are discussed on two models. First is a simple numerical model, second is an atomic model of molecular dynamics. The results show that both schemes work for first model, but for the second model only the extrapolating method works.

參考文獻


[1] Machine learning approaches for analyzing and enhancing molecular dynamics simulations. Yihang Wang, João Marcelo Lamim Ribeiro, and Pratyush Tiwary (2020).
[2] Predicting solar generation from weather forecasts using machine learning. Navin Sharma, Pranshu Sharma, David Irwin, and Prashant Shenoy (2011).
[3] Variational Autoencoder Reconstruction of Complex Many-Body Physics. Luchnikov, A. Ryzhov, P.-J. C. Stas, S. N. Filippov and H. Ouerdane (2019).
[4] Modeling and Optimization for Big Data Analytics. Konstantinos Slavakis, Georgios B. Giannakis, and Gonzalo Mateos (2014).
[5] Learning from Imperfections: Predicting Structure and Thermodynamics from Atomic Imaging of Fluctuations. Lukas Vlcek, Maxim Ziatdinov, Artem Maksov, Alexander Tselev, Arthur P. Baddorf, Sergei V. Kalinin, and Rama K. Vasudevan (2019).

延伸閱讀