在這篇論文中,我們提出了一個基於序列比對的方法,用於偵測蛋白質環狀排列現象。根據先前的研究,一對具有環狀排列現象的蛋白質,具有相似的三維結構但序列卻呈現環狀排列的架構。現今用於偵測此種現象的方法,大多需要用到蛋白質三級結構資訊,然而,現今蛋白質三級結構的數量 (約七萬筆) 卻遠少於序列的數量 (約一千五百萬筆)。目前對此現象的資訊仍舊不足,再加上以序列為主的方法並不夠準確。基於這樣的原因,在此論文中,我們修改了傳統區域序列比對的方法,僅使用蛋白質序列以及結合二級結構元素的資訊 (由一級結構推測得到的二級結構資訊),進而可不需透過三級結構即可偵測兩兩蛋白質序列是否具有環狀排列現象。我們嘗試了不同的替代矩陣並將其用於序列排比的方法上,並觀察其準確度的變化。最後,依據不同的蛋白質序列相似度,探討此方法偵測蛋白質環狀排列現象的準確度,並舉例說明所提出之方法能夠用來幫助生物學家快速且準確地找出可能具有環狀排列蛋白質結構。未來此方法可用於快速掃描蛋白質資料庫以及尋找可能的環狀排列位點,提供更多具有環狀排列現象的蛋白質資訊。
A pair of proteins is called a circular permutation (CP) if it has similar sequence compositions and most likely share the same fold but the C-terminal and N-terminal regions of the protein sequence are interchanged. CP is useful in protein engineering; however, the biological functions and origination of naturally occurred CP phenomenon are not clear. Most CP detecting methods are based on structural comparison strategies such as GANSTA+ and CPSARST. But now the amount of 3D protein structures (about 70,000) is much less than the amount of protein sequences (about 15,000,000). To our knowledge, there are only two sequence-based CP detecting methods developed by Uliel et al. [1] and Weiner et al. [2], and these two methods lack obvious criteria to distinguish CPs and non-CPs. Here, we purposed a high efficient and accurate sequence-based detection method based on sequence local alignment with secondary structural elements information. In this thesis, we evaluated our method by different substitution matrices and variant sequence identity data. The result shows that the accuracy of our prediction is highly positive correlated with sequence identity of paired protein. In the future, this developed method is helpful for possible CP site determination and large-scale protein screening.