從醫學文獻摘要擷取蛋白質之間的交互作用

隨著生物醫學和很多分析方的快速發展，使用文件探勘工具去尋找蛋白質間交互作用變得越來越重要。現今研究學者藉由閱讀生醫文獻以獲得重要資訊，但生醫文獻的數量卻以驚人的速度成長，如果以人工擷取資訊，將會耗費大量人力跟時間，因此從文件中自動擷取重要訊息的需求量增加。我們利用了淺層剖析器跟考量句子結構，發展了一個能從文獻中自動擷取蛋白質間交互作用的資訊系統。我們系統比對句子的文法樣式跟傳統作法不同。我們設計有效率的演算法並考量句子的語意制定一些規則以擷取蛋白質交互關係，而關係中並區分出有作用蛋白質跟被作用蛋白質。我們的系統由以下數個步驟所組成，分別是醫學文獻前處理、斷句、斷字、詞類標記、蛋白質名詞辨識、描述交互作用的關鍵字、介係詞及連接詞標記、蛋白質間交互作用的擷取。最後利用兩個測試集來評估此系統，分別是 LLL05競賽與BioCreAtIvE-PPI。

關鍵字

蛋白質和蛋白質間的交互作用；文字探勘；語意型樣；斷句；斷字；詞類標記；生醫名詞的辨識

並列摘要

With the rapid progress of biomedical science and large amounts of analysis methods, many researchers nowadays access knowledge about protein-protein interaction through PubMed abstracts, but the amount of biomedical literature is enormous and continues to grow at exponential rate. Therefore, the demand for automatic extraction of information from text has been increasing, using text mining tools to find knowledge such as protein-protein interactions, which is useful for specific analysis tasks has become critical. We develop a system which can automatically extracts protein-protein interactions from free text using a shallow parser and sentence structure analysis techniques. Our system matches sentences against syntax patterns typically describing protein-protein interactions. We design an efficient algorithm and develop a set of rules which extracts protein-protein interactions from their syntactic roles. Protein-protein interactions include ACTOR ( doner of action) and OBJECT (receiver of action).There are essential steps to accomplish our system which includes preprocessor, sentence splitting, tokenization, part-of-speech tagging, protein names recognition, interaction keywords , prepositions , conjunction tagging and protein-protein interactions extracting. Finally, we evaluate our system on two samples, one derived from the LLL05 challenge, the other from BioCreAtIvE-PPI.

並列關鍵字

protein-protein interaction ； text mining ； syntax patterns ； sentence splitting ； tokenization ； part-of-speech ； protein names recognition

參考文獻

[1]. Valencia, C.B.a.A., The Frame-Based Module of the SUISEKI Information Extraction System. IEEE Intelligent Systems,, March/April 2002. 17: p. 14-20.

[2]. N. Daraselia, A.Y., S. Egorov, S. Novichkova, and a.I.M. A. Nikitin, Extracting human protein interactions from MEDLINE using a full-sentence parser. Bioinformatics, 2004. 20(5): p. 604-611.

[3]. E. Marcotte, I.X., and D. Eisenberg, Mining Literature for Protein Interactions. Bioinformatics, April 2001. 17: p. 359-363.

[5]. J. Pustejovsky, J.C., J. Zhang, M. Kotecki, and and B. Cochran. Robust Relational Parsing over Biomedical Literature: Extracting Inhibit Relations. In Proc 7th Pac Symp Biocomput. 2002.

[7]. Corney, D.P.A., B. F. Buxton, BioRAT: extracting biological information from full-length papers. Bioinformatics, 2004. 20(17): p. 3206-3213.

國際替代計量

從醫學文獻摘要擷取蛋白質之間的交互作用

主題瀏覽