透過您的圖書館登入
IP:3.145.63.136
  • 學位論文

以深度學習模型及詞嵌入方法建立通用數位化學空間預測純物質性質

Using Deep Learning Model and Word Embedding Method to Establish Universal Digital Chemical Space to Predict Pure Component Properties

指導教授 : 汪上曉

摘要


分子性質的預測在分子設計領域一直是重要的議題,過去,有像是Joback法、UNIFAC法等等的官能基貢獻法。近年,隨著演算法的革新與電腦硬體的進步,深度學習在各項領域都有出色的發揮。本文提出一種深度學習模型,以監督學習的方法將一種近似於自然語言的分子表示法:簡化分子線性輸入規範(SMILES)投影至高維度的空間中。分子特徵被嵌入一個稱作通用數位化學空間(UDCS)的高維向量,並藉由另一神經網路將此空間向量轉換為數種不同的分子指紋。使用UDCS作為輸入層,便能夠建構數個不同結構的深度學習模型,用以預測數種分子性質,包含運用電腦輔助計算獲得的GDB-9資料庫性質、過去須運用量子力學方法進行高度耗時且複雜的計算才可獲得,稱為Sigma Profile的量子力學性質以及透過實驗獲得的數種定溫熱力學性質。這些模型大大的簡化了分子性質的計算及取得過程。使用深度學習模型計算獲得的sigma profile亦可被用於COSMO-SAC的計算中,並計算出準確的活性係數。

並列摘要


The prediction ability of molecular properties has been an important issue in the region of molecule design. In the past, scientists proposed the group contribution methods such as Joback method, UNIFAC method…etc. In recent years, thanks to the revolution of the algorithm in machine learning and the improvement of computer hardware, the advancement of deep learning grows up fast. In this work, a supervised learning method was used to embed the molecular features into a high dimensional space, called universal digital chemical space (UDCS), from the nature language-like description, called Simplified Molecular Input Line Entry Specification (SMILES). Then, this high dimensional space will be decoded into numbers of presentation of molecular features, called molecular fingerprints. Using this UDCS, numbers of models could be built to predict different molecular properties. These properties included a database using computer-aid calculation called GDB-9, a complex quantum mechanical property called sigma profile, and some fixed-point thermodynamic properties. These models accelerate the calculation of these properties significantly.

參考文獻


[1] Lin, S. T. (2011). Marching into molecular design. Asia‐Pacific Journal of Chemical Engineering, 6(2), 195-198.
[2] Ng, L. Y., Chong, F. K., & Chemmangattuvalappil, N. G. (2015). Challenges and opportunities in computer-aided molecular design. Computers & Chemical Engineering, 81, 115-129.
[3] Joback, K. G., & Reid, R. C. (1987). Estimation of pure-component properties from group-contributions. Chemical Engineering Communications, 57(1-6), 233-243.
[4] Fredenslund, A., Jones, R. L., & Prausnitz, J. M. (1975). Group‐contribution estimation of activity coefficients in nonideal liquid mixtures. AIChE Journal, 21(6), 1086-1099.
[5] Rogers, D., & Hopfinger, A. J. (1994). Application of genetic function approximation to quantitative structure-activity relationships and quantitative structure-property relationships. Journal of Chemical Information and Computer Sciences, 34(4), 854-866.

延伸閱讀