透過您的圖書館登入
IP:3.141.31.209
  • 學位論文

分子動態模擬之再現性研究

A Study of the Reproducibility of Molecular Dynamics Simulation

指導教授 : 林勇欣

摘要


長久以來分子動態模擬(Molecular dynamics simulation)一直被認為是研究生物分子機制,及原子交互作用的動態與變化的有用工具。然而,近年來很多研究可能忽視了分子動態模擬基本的再現性問題,因此這個問題需要嚴謹的重新再審視一次。 在此研究中,我們使用四個溶菌酶蛋白,分別為正常型的人類溶菌酶蛋白(簡稱HLW,PDB ID 1LZ1)、正常型的婆羅洲猩猩溶菌酶蛋白(簡稱PLW)、位於序列37位置的glycine更換為glutamine 的點突變人類溶菌酶蛋白(簡稱為HLMG37Q)、位於序列67位置的aspartic acid突變為histidine的另一種點突變人類溶菌酶蛋白amyloidogenic variant,(簡稱為HLMD67H,PDB ID:1LYY)。我們針對這四個溶菌酶蛋白都進行多次獨立且不同初始速度的分子動態模擬。在溫度300K下,我們總共獨立進行15次軌跡長度為500ns的模擬(HLW八次,PLW兩次,HLMG37Q三次,HLMD67H兩次),我們將HLW其中的四次和HLMG37Q三次模擬再延長至1000 ns。 本研究探討了三種常見的分子模擬分析方法的再現性和可靠度。首先,我們計算模擬出來隨著時間變化的結構與初始結構之間的RMSD,並畫出其隨著時間變化的曲線;第二,我們以軌跡檔中每1 ns為一單位切出結構,且計算軌跡檔內與軌跡檔間任兩兩單位結構之間的距離(RMSD),呈現在矩陣中,並利用此數值作為聚集分群的依據。第三,我們建構出此矩陣中距離的分布圖。 如同先前的研究所建議,長時間(1000 ns或更長)的模擬應該是必要的。在我們的模擬過程中,發現其中一個模擬在500 ns以前有穩定的RMSD曲線,但是之後卻呈現一定程度的波動。我們也發現即使是同一個蛋白,使用不同初始速度模擬後,也有可能會呈現不同的RMSD曲線。同時,即便兩個模擬有著相似的RMSD曲線,根據距離(RMSD)矩陣我們也可以發現他們的結構有可能很不一樣(在矩陣圖中這兩個模擬是分開的,沒有聚集在一起)。而當比較正常型人類溶菌酶蛋白和人類突變蛋白(HLMG37Q) 時,某些模擬的HLMG37Q和HLW有相似的結構(在距離(RMSD)矩陣中聚集在一起);然而某些HLW之間卻可能有著完全不同的結構。 所以我們的研究認為,在之前的某些研究中,那些來自短時間模擬,看起來穩定的RMSD曲線,其實並不是一個可靠的結果。當延長模擬的時間,可能會發現RMSD曲線有所波動,或是當使用不同初始速度時,也可能會發現完全不同的樣貌。而當兩組模擬有相似的RMSD曲線時,也不代表他們會有相同的結構。使用距離(RMSD)矩陣來評估結構的相似性是更適當的做法。最後,即使使用距離(RMSD)矩陣來對模擬的結構作聚集分群比較,例如,比較正常型和點突變蛋白,依舊需要進行多次不同初始速度的模擬,來確保結果有再現性。

並列摘要


Molecular dynamics simulation has long been considered a powerful tool to study biomolecular mechanisms and to reveal dynamics or fluctuations of atomic interactions. However, the fundamental issue regarding the reproducibility of molecular dynamics simulation might be ignored for many recent studies and should be seriously reinvestigated. Here in this study, four lysozyme proteins were utilized. They are human (wild-type) lysozyme (HLW, PDB ID 1LZ1), the Bornean orangutan Pongo pygmaeus (wild-type) lysozyme (PLW), a point mutation of human lysozyme that glycine at position 37 was replaced by glutamine (HLMG37Q), and the amyloidogenic variant (aspartic acid at position 67 was replaced by histidine) of human lysozyme (HLMD67H, PDB ID 1LYY). For each of them, we performed multiple molecular dynamics simulations with different initial atomic velocities independently. Totally we performed fifteen independent 500 ns trajectories at 300K (among them, eight times for HLW, twice for PLW, three times for HLMG37Q and twice for HLMD67H). We further extended the trajectories of four HLW simulations and three HLMG37Q simulations to 1000 ns. Three frequently used analysis methods are utilized in this study to investigate their reproducibility and reliability. First, we used the initial structure as the reference structure to calculate RMSD over time for each simulation. Second, we took a snapshot structure every 1 ns along each simulation trajectory, and then calculated the distance (RMSD) between all snapshots from all simulations and represented these distances in a matrix for the clustering purpose. Third, we constructed the RMS distribution for the distances in the matrix. As previous studies suggested, a long term simulation (1000 ns or more) should be necessary. During our simulation processes, we found that one simulation may have a stable RMSD profile (as a function of time) before 500 ns but become fluctuant afterward. Our results also indicated that, with different initial atomic velocities, the same protein could have different RMSD profiles. Meanwhile, based on the distance (RMSD) matrix result, two simulations with quite similar RMSD profiles could have totally different simulated structures (i.e., the snapshots from the two simulations are separated and not clustered together). When comparing the wild-type human lysozyme with the mutant HLMG37Q, in some simulation cases HLMG37Q may have quite similar structures (their snapshots could be clustered together) with some HLW; while for some other cases two wild-type human lysozymes may have totally different simulated structures as described above. Our results suggest that, one stable RMSD profile in a short term simulation should not be considered as a reliable result as some previous studies did. It is possible that the RMSD profile would become fluctuant when the simulation was extended further, or have totally different patterns when other initial atomic velocities were used. Meanwhile, two simulations with similar RMSD profiles do not imply that they have similar simulated structures either. Using clustering based on the distance (RMSD) matrix should be a better strategy. Finally, even when we utilize clustering method to compare the simulated structures, for example, comparing a wild type and a mutant, multiple simulations with different initial atomic velocities are still necessary to ensure the reproducibility of the simulation results.

參考文獻


Adcock SA, McCammon JA. 2006. Molecular dynamics: survey of methods for simulating the activity of proteins. Chemical reviews 106:1589.
Andrews CT, Elcock AH. 2013. Molecular dynamics simulations of highly crowded amino acid solutions: comparisons of eight different force field combinations with experiment and with each other. Journal of chemical theory and computation 9.
Auffinger P, Louise-May S, Westhof E. 1995. Multiple molecular dynamics simulations of the anticodon loop of tRNAAsp in aqueous solution with counterions. Journal of the American Chemical Society 117:6720-6726.
Babin V, Baucom J, Darden TA, Sagui C. 2006. Molecular dynamics simulations of DNA with polarizable force fields: convergence of an ideal B-DNA structure to the crystallographic structure. The journal of physical chemistry B 110:11571-11581.
Berendsen HJ, van der Spoel D, van Drunen R. 1995. GROMACS: a message-passing parallel molecular dynamics implementation. Computer Physics Communications 91:43-56.

延伸閱讀