透過您的圖書館登入
IP:18.222.197.35
  • 學位論文

PolyName2Structure系統 - 利用預測聚合物單體結構的反應途徑,來達成從聚合物名稱轉成結構的目標

PolyName2Structure - A Polymer Name to Structure System of Structure-Based and Source-Based Names by Predicting the Reaction Pathways of Monomers

指導教授 : 曾宇鳳

摘要


目前資料探勘技術應用在分析聚合物科學文獻時,因為聚合物複雜的命名方式而讓其非常的具有挑戰性。現在已發展出的技術只能夠用來處理IUPAC所定義的根據聚合物結構所做的化學命名。在這份研究裡面,PolyName2Structure這個新的系統被開發出來,它可以把IUPAC根據結構的聚合物命名,還有根據單體所做的聚合物命名給轉成聚合物的結構。其中,根據結構的命名是使用最新版的OPSIN (Open Parser for Systematic IUPAC Nomenclature)來處理。而根據單體的命名則是先將單體從聚合物名稱中分析出來,然後經由OPSIN和PubChem PUG (Power User Gateway) REST (Representation State Transfer)轉成單體結構,之後將從PoLyInfo資料庫所學習出來的預測模型來預測單體的反應途徑,最後根據預測出的反應途徑來產生聚合物的結構。在此過程中,新的演算法也被開發出來,用來產生機器學習模型所使用的變數、簡化聚合物結構、產生所有最短重複結構、模擬聚合物各種反應、還有從聚合物及單體結構中找出反應基團等等。為了檢驗PolyName2Structure系統的表現,Sigma-Aldrich的聚合物產品目錄也被採用來當作外部測試資料。所有預測反應途徑的模型幾乎都有達到95%以上的正確率。在訓練的資料和外部測試的資料上面,PolyName2Structure也可以分別達到98.1%和92.1%的正確率。有了這樣一個準確的系統,我們可以更好地將期刊、教科書、專利文件裡面的聚合物名稱轉換成結構,增強資料探勘在聚合物中的效果。而在此研究中所開發的方法、機器學習的變數、預測模型等等都可以被重複使用並且應用到未來聚合物資訊學的領域的研究裡面。

並列摘要


Current text mining in polymer scientific documents is highly challenging, mainly due to the complex names used in polymer science. The current tools are only capable of handling systematic IUPAC structure-based polymer names. In this study, a system that can automatically convert polymer structure-based names and source-based names to polymer structures, PolyName2Structure, was developed. Structure-based names are processed using the latest version of OPSIN (Open Parser for Systematic IUPAC Nomenclature). Source-based names are analyzed first to obtain the structural information of monomers using OPSIN and PubChem PUG (Power User Gateway) REST (Representation State Transfer). Then, prediction models built using the predicted reaction pathways of monomers learnt from the dataset in the PoLyInfo database are used to convert monomer structures into a polymer structure. Several algorithms are designed to generate the descriptor sets used in each prediction model, simplifying polymer structures, generating all repeating units, simulating polymer reaction types and finding functional groups from a given set of monomer structures and polymer structures. To validate the performance of the PolyName2Structure system, the Sigma-Aldrich polymer product catalog was used as an external testing dataset. The prediction models of polymer reaction pathways show very high performance (most with above 95% accuracy). The PolyName2Structure system also performs very well on both the training dataset and the external testing dataset, with 98.1% and 92.1% accuracy, respectively. Based on its excellent performance, the PolyName2Structure system can be used to convert polymer names in journal papers, textbooks, patents, and other documents into polymer structures. All the methods, descriptor sets, and models designed in this study also can also be re-used and applied for the future research of polymer informatics.

參考文獻


[1] Kahovec, J. A., Fox, R. B., & Hatada, K. (2002). Nomenclature of regular single-strand organic polymers (IUPAC Recommendations 2002). Pure and applied chemistry, 74(10), 1921-1956.
[2] Wilks, E. S. (2000). Polymer nomenclature: the controversy between source-based and structure-based representations (a personal perspective). Progress in Polymer Science, 25(1), 9-100.
[3] Lowe, D. M., Corbett, P. T., Murray-Rust, P., & Glen, R. C. (2011). Chemical name to structure: OPSIN, an open source solution. Journal of chemical information and modeling, 51(3), 739-753.
[4] Paul J. Flory. (1953). Principles of polymer chemistry. Cornell University Press.
[5] Otsuka, S., Kuwajima, I., Hosoya, J., Xu, Y., & Yamazaki, M. (2011, September). PoLyInfo: Polymer Database for polymeric materials design. In Emerging Intelligent Data and Web Technologies (EIDWT), 2011 International Conference on (pp. 22-29). IEEE.

延伸閱讀