透過您的圖書館登入
IP:18.220.34.198
  • 學位論文

蛋白質與配體嵌合強度預測之改善

Improving Scoring Function Model for Predicting Protein-Ligand Binding Affinity

指導教授 : 黃乾綱

摘要


本研究設計了一套新的評分函數:MIXScore,改善了預測蛋白質與配體嵌合強度的準確度表現。使用評分函數預測蛋白質與配體的嵌合強度在以結構為基礎的藥物探索與設計上是一項重要的議題,一般而言,評分函數可以分為三大類:力場方法、知識勢能與經驗法則。 目前的驗證方法(如:五回交叉驗證法和單樣本交叉驗證法)雖然尚未遭遇過度適應的問題,但卻有可能會因為同樣類型的複合體分佈在訓練及測試資料集中而使預測難度降低造成過於樂觀的表現。因此Kramer與Gedeck提出了一種特別的群組交叉驗證法並建議使用它來避免在驗證上過度樂觀的偏誤。 論文中合併了知識勢能與經驗法則的混成軌域原子對特徵值與X-CSCORE特徵值,共210個特徵值,並利用隨機森林建立預測模型。使用PDBbin07與PDBbind09資料集當作驗證資料集並和之前發表的評分函數比較,PDBbin07用於獨立驗證法而PDBbind09用於群組交叉驗證法。 MIXScore在獨立驗證法下的表現較2010年發表的RF-Score進步,方均根誤差(RMSE)為1.98kcal/mol而判定係數(R2)為0.691。另外在去除訓練資料集與測試資料集複合體相似度的群組交叉驗證方法下也能有穩定的表現,且表現相對於RF-Score和Kramer與Gedeck在2011年設計的評分函數出色。這些結果都說明了MIXScore的表現優於現行的評分函數,而且在獨立驗證上的修正判定係數高於0.5 (0.530),代表MIXScore有良好的外部預測能力。 本論文除了改善了蛋白質與配體嵌合強度的預測外,也敘述了蛋白質同質性對於預測上的影響、PDBbind09中較難以準確預測的複合體以及X-CSCORE特徵值加入混成軌域原子對特徵值後的改善幅度。

並列摘要


Our study proposes a novel MIXScore, a scoring function which improves the prediction of protein-ligand binding affinities. The prediction is an important issue in structure-based drug discovery and design. Typically, scoring functions can be classified into three groups: force-field, knowledge-based, and empirical. Traditional validation methods such as 5-fold cross validation and Leave-One-Out cross validation (LOO) do not encounter over-fitting problem, but the assessments may be too optimistic because the complexes in the same protein families may be distributed in training set and testing set at the same time. Therefore, Kramer and Gedeck provided a special method called Leave-Cluster-Out cross validation (LCO) and recommended that LCO could avoid an overoptimistic bias. We combine hybridized orbital atom type pair descriptors and X-CSCORE descriptors which in the knowledge-based and empirical fields into a feature vector, totally 210 descriptors. Random forest regression is applied to build the predict model. The performance of MIXScore is evaluated by adopting PDBbind07 and PDBbind09 as benchmarks and compared with several existing scoring functions. PDBbind07 is used for independent test and PDBbind09 is used for LCO cross validation. The independent test shows that MIXScore is better than RF-Score published in 2010 (RMSE = 1.98kcal/mol and R2 = 0.691). In LCO cross validation, although the similarities between training and testing sets are excluded, MIXScore still provides stable predicting ability such that MIXScore outperforms RF-Score and the work proposed by Kramer and Gedeck. These results show that MIXScore is a competitive scoring function. MIXScore may also have good external predictability as the modified R2 (Rm2) is greater than 0.5 (0.530) in the independent test. This study not only improves the performance of predicting binding affinities but discovers the homogenous of proteins in PDBbind dataset will cause overoptimistic bias. The strongest outlier in PDBbind09 and the importance of each X-CSCORE descriptors are shown as well.

參考文獻


54. 吳智棚, 應用非線性函數於分子嵌合能量函數之研究, in 電機資訊學院資訊工程學研究所. 2007, 國立臺灣大學: 臺北.
56. 蔣鈞堯, 應用G2DE分群法於分子嵌合能量回歸模型之研究, in 工程科學及海洋工程研究所. 2010, 國立臺灣大學: 臺北.
3. Ajay and M.A. Murcko, Computational methods to predict binding free energy in ligand-receptor complexes. J Med Chem, 1995. 38(26): p. 4953-67.
4. Gohlke, H. and G. Klebe, Approaches to the description and prediction of the binding affinity of small-molecule ligands to macromolecular receptors. Angew Chem Int Ed Engl, 2002. 41(15): p. 2644-76.
5. Diercks, T., M. Coles, and H. Kessler, Applications of NMR in drug discovery. Curr Opin Chem Biol, 2001. 5(3): p. 285-91.

延伸閱讀