透過您的圖書館登入
IP:3.16.70.101
  • 學位論文

透過EM-2SDR方法使用低溫電子顯微照片求3D蛋白質結構密度之主要成分

Directly reconstructing principal components of 3D protein structure density map from 2D cryo-EM projection images by EM-2SDR method

指導教授 : 杜憶萍
共同指導教授 : 黃俊郎(Jiun-Lang Huang)

摘要


低溫電子顯微鏡技術近年來對於研究蛋白質分子結構的重要性越來越大。相對於X光繞射結晶的方法在解蛋白質結構時,需要先取得蛋白質的結晶狀態來解其結構,低溫電子顯微鏡技術透過拍攝溶液中可以自由運動的蛋白質分子狀態回推其3D結構,可以幫助我們捕捉蛋白質的結構異質性。但蛋白質在溶液中的異質性,也是我們難以將其從2D圖像回推3D結構的挑戰之一。Pawel A. Penczek等人提出了一套Codimensional PCA的流程來處理此問題:首先將拍攝所得之蛋白質的圖片依照角度分組,再使用超幾何分層抽樣方法(HGSR)從不同角度分組中抽出照片來重構蛋白質3D結構,接著透過多個重構的3D蛋白質結構估計出一個共變異矩陣,並解出其特徵值與特徵向量,來對此蛋白質的異質性分類,也就是說,利用此方法對蛋白質異質性結構的分群效果 與估計出來的共變異數矩陣的品質有關。但因為蛋白質在溶液中會有角度偏好、不同構型的比例也不一定平均,在資料有限、採樣到的角度也有限的情況下,無論是使用HGSR或是Fourier slice theorem for covariance所估計出來的共變異數矩陣的品質都無法被保證。為了解決此問題,Hemant D. Tagare 之團隊提出透過機率模型最佳化,直接由2D低溫電子顯微鏡影像算出3D結構的PCA解。此方法中使用機率模型再加上EM演算法,成功避免掉了計算共變異矩陣的步驟,直接透過最大期望演算法的方式找出有著最大似然的主要成分。 除此之外,杜老師團隊於2020年提出2SDR模型,在取代Codimensional PCA流程中的降維方法PCA之後,其蛋白質的分群結果明顯優於PCA的表現。所以我們推測,2SDR同樣可以被應用於Tagare團隊提出的EM-PCA演算法,將其中的蛋白質機率模型從PCA改成MPCA,再依照2SDR的流程在第二階段使用PCA進一步降維,進而優化整個演算法的分群效果。但目前演算法EM-PCA之程式部分尚未開源。故本論文的第一部分先將EM-PCA演算法的程式完成,第二部分則包含EM-2SDR演算法的數學推導以及程式。實驗結果顯示EM-PCA以及EM-2SDR兩演算法應用於包含兩個蛋白質構型的人工資料集的分群實驗上,都成功將兩個蛋白質構型的照片完全分開,正確率100\%,即使加上SNR=0.8以上的雜訊,v-measure分數也都在0.95以上。但是在包含五個蛋白質組態的人工資料集上,EM-PCA演算法的分群效果就沒那麼理想,掉到約0.5216分,而EM-2SDR在包含五個蛋白質組態的人工資料集仍舊能夠將 大多數的照片分群成功,v-measure分數為0.9871。

並列摘要


The importance of Cryogenic Electron microscopy (cryo-EM) has gradually increased in recent years. The cryo-EM image data is usually comprised of structures from co-existing conformations or mixed compositions. To investigate the structural heterogeneity of particles through cryo-EM images, one can try to solve the particles' three-dimensional (3D) principal components. Pawel A. Penczek has proposed a method called Co-dimensional PCA to solve this problem: In the first step, cryo-EM images are stratified with respect to their projection angles. Then they use the hypergeometric stratified resampling (HGSR) method repeatedly to draw samples from different strata to reconstruct 3D structures. Finally, they use the Principal component analysis (PCA) to find the eigenvolumes of the resampled structures and use a clustering algorithm to group the particle images based on the vector of the coefficient corresponding to each eigen-volume. However, under any of the following circumstances: (1) the proteins adopt orientation preference in solution, (2) the amount of conformation varies quite widely for different projection directions, (3) there are only finite images and projections in a dataset, there is no guarantee for the quality of covariance matrix we find through the co-dimensional PCA method or Fourier slice theorem. To handle this problem, Hemant D. Tagare proposes a maximum-likelihood-based method called EM-PCA which directly recovered 3D PCA from cryo-EM images without computing the covariance matrix. Besides that, earlier work in our group (Chung et al., 2020) has demonstrated that replacing PCA with two-stage dimension reduction (2SDR) in co-dimensional PCA can greatly increase the performance of clustering results. So we presumed that 2SDR can also be used in the maximum-likelihood-based method to increase the performance. Therefore, in the first part of this paper, we completed the implementation of the EM-PCA method. In the second part, we derived the math formulas for EM-2SDR and implemented them with Python coding. Our in silico experiment results using simulated images showed that when EM-PCA or EM-2SDR was applied to a noise-free dataset consisting of two conformations, both algorithms worked quite well to achieve 100\% correctness on the image clustering. Even with image SNR decreased to 0.8, the v-measure score of both algorithms are still above 0.95. We further tested both algorithms on a noise-free dataset consisting of 5 conformations. Remarkably, the v-measure score for EM-2SDR remains close to 1 (0.9871) whereas that for EM-PCA dropped to 0.5216, demonstrating the superiority of EM-2SDR.

並列關鍵字

Cryo-EM PCA 2SDR

參考文獻


Hemant D Tagare, Alp Kucukelbir, Fred J Sigworth, Hongwei Wang, and Murali Rao. Directly reconstructing principal components of eterogeneous particles from cryo-em images. Journal of structural biology, 191(2):245–262, 2015.
Pawel A Penczek, Marek Kimmel, and Christian MT Spahn. Identifying confor-mational states of macromolecules by eigen-analysis of resampled cryo-em images.Structure, 19(11):1582–1590, 2011.
Szu-Chi Chung, Shao-Hsuan Wang, Po-Yao Niu, Su-Yun Huang, Wei-Hau Chang, I Tu, et al. Two-stage dimension reduction for noisy high-dimensional images and application to cryogenic electron microscopy. Annals of Mathematical Sciences and Applications, 5(2):283–316, 2020.
Jieping Ye. Generalized low rank approximations of matrices. Machine Learning, 61 (1):167–191, 2005. Gabor T Herman and Miroslaw Kalinowski. Classification of heterogeneous electron microscopic projections into homogeneous subsets. Ultramicroscopy, 108(4):327–338, 2008.

延伸閱讀