透過您的圖書館登入
IP:3.147.43.190
  • 學位論文

以機械學習方式預測藥物之小腸吸收度

Predicting Drug Human Intestinal Absorption using Machine Learning

指導教授 : 林志侯
若您是本文的作者,可授權文章由華藝線上圖書館中協助推廣。

摘要


摘要 一開始我們從NCBI網站獲得的180個擁有不同小腸吸收率藥物分子。由於做任何分子計算前必須先將分子結構最佳化,我們用國家高速網路中心(NCHC)的Gaussian09,以DFT(density function)/6-31方法,B3LYP函數,將藥物分子最佳化。由於Gaussain09所運算之MOL檔是以座標形式儲存之座標檔,故再將檔案以Discovery Studio 3.1 轉成真正的3D立體結構。Padel-molecular descriptor是計算分子特徵以及特性軟體,是由新加坡國立大學(NUS)發展,擁有強大計算能力並可計算1875種平面和立體分子特徵。WEKA是紐西蘭Wekato大學發展之機械學習軟體,用於機械學習(machine learning)、資料探勘(data mining )以及特徵選取(feature selection)等。而特徵選取(feature selection)的主要依據計算方法是以最佳特徵選取(CfsSubsetEval)並配合學習粒子群最佳化(Particle Swarm Optimization, PSO)、演化演算法(Evolutionary algorithm)以及其它五種輔助的演算法來選取。支持向量機(Support Vector Machine, SVM)是本研究分類的工具,它有三種主要參數,分別是C (cost)、gamma (γ)以及ε。由於不佳的參數選取會導致分類結果不理想或過度擬合,因此以Pearson相關係數做為參數選取的依據。分類與選取是交替進行的,直到最後選取的特徵無法再繼續縮小範圍為止,而特徵選取是為了要讓分類結果更完整,最後幾個階段選出來的特徵也是最具有代表性的。在分類與選取交替使用後,依序得到與吸收度越發相關的分子特徵,此特徵群體分別是1104, 625, 280, 177, 98, 50 以及 37等特徵數目。最後再以non validated statistics, R^2 和 10-fold statistics, Q^2,得到選取特徵數(NFS)=98是具預測能力最佳的一組特徵,並將預測結果與原來小腸吸收度比較,得到線性之相關係數correlation coefficient R^2 = 0.887與0.5431。因此這98是所選之最有預測能力的分子特徵。最後再將額外的13個藥物分子以所建立的模型預測之,得到 q^2=0.729 以及correlation coefficient R^2=0.7536 .

並列摘要


Abstract In the beginning there are 180 drug compounds with different human intestinal absorption (HIA) values obtained from literatures. From NCBI these 180 compound 3D structures are obtained. Before when starts any chemical compounds calculation, it is necessary to have them optimized. We use Gaussian09 in NCHC (National Center for High-performance Computing) with DFT, 6-31G via B3LYP energy levels to optimize those compounds. Discovery Studio is responsible to convey the coordinates to real 3D structures. Padel(Pharmaceutical Data Exploration Laboratory)-molecular descriptors is developed from National University of Singapore (NUS), is a powerful software contains 1875 molecular descriptors from 2D to 3D. And then we calculated 2D and 3D descriptors via Padel. WEKA is a data exploring machine learning soft ware developed via Wekato University, New Zealand. WEKA offers classification selecting attributes function. Feature selection is mainly based on selecting attributes function. The selecting methods are dominantly calculating via best-first evaluator with PSO (Particle Swarm Optimization) and EA (evolutionary algorithm) methods, another 5 algorithms are used to compensate whether there are some possible related to HIA features maybe being lost . SVM is used to classify and valid the results via choosing the proper parameters. There are three parameters important to classification: Cost (C), Gamma (γ) and epsilon (ε). Bad parameter chooses induces incorrect classification or overffiting results. In this research we have tried many series of parameters to obtain the best set validated via Pearson correlation coefficient and cross validate statistics. Classification and feature selection are used alternately until the ranges of selected features are not narrowed. By means of feature selection, the classification is getting better. Selecting features step by step its number of selected features (NFS) vary from 1875, 1104, 625, 280, 177, 98, 50 and 37. Using non validated statistics and 10-fold statistics, NFS=98 is the best predictive feature sets and its correlation coefficient R^2 is 0.88. That is these 98 molecular descriptors are highly correlated to intestinal absorption. Finally the testing task is completed via external validation. 13 drug compounds with different HIA values are used to the external validation. The q^2 is 0.729 and R^2 is 0.7536.

參考文獻


38. 林豐澤, 演化式計算上篇: 演化式演算法的三種理論模式.
1. Klopman, G., L.R. Stefan, and R.D. Saiakhov, ADME evaluation. 2. A computer model for the prediction of intestinal absorption in humans. Eur J Pharm Sci, 2002. 17(4-5): p. 253-63.
2. Eddershaw, P.J., A.P. Beresford, and M.K. Bayliss, ADME/PK as part of a rational approach to drug discovery. Drug Discovery Today, 2000. 5(9): p. 409-414.
3. Selick, H.E., A.P. Beresford, and M.H. Tarbit, The emerging importance of predictive ADME simulation in drug discovery. Drug Discovery Today, 2002. 7(2): p. 109-116.
4. Fung, M., et al., Evaluation of the characteristics of safety withdrawal of prescription drugs from worldwide pharmaceutical markets-1960 to 1999*. Drug Information Journal, 2001. 35(1): p. 293-317.

延伸閱讀