以機器學習方法預測分子熔點的應用範例

我們從Reaxys資料庫取得數萬筆有機分子作為熔點樣本的數據集，以PaDEL分子描述符為基礎，來進行聚類和機器學習分析，預測有機分子的熔點。在這個研究中，我們展示了一個系統化的流程，藉由聚類的分切樣本，減少描述符的維度，能適用於分析過於散亂的資料集。我們最終使用XGBoost非線性回歸模型，針對樣本子群得出符合統計學意義，且成功預測分子熔點的數學模型。該模型確定了分子結構中的極化作用力、分散力、分子構型對稱性，與該分子的熔點相關。

關鍵字

定量結構活性關係；機器學習；分子作用力

並列摘要

We imported tens of thousands of organic molecules from Reaxys database as the sample data set for predicting the molecular melting points. We used the PaDEL package to generate the molecular descriptors, followed by a machine learning approach to find the mathematic relationship of describing the melting point in terms of structural characteristics. In this study, we showed a systematic process of using clustering method to reduce the descriptor dimensions and to categorize a highly diverse data set. We finally applied the XGBoost nonlinear regression method to the subgroup data set and obtained a statistically significant model. The model was found to fulfill the chemical consensus of molecular melting point, having contribution from the polarization force, dispersion force, and symmetry of the molecular configuration in the molecular structure.

並列關鍵字

QSAR ； Machine learning ； Molecular interactions

國際替代計量

以機器學習方法預測分子熔點的應用範例

全文下載

主題瀏覽