透過您的圖書館登入
IP:18.117.227.191
  • 學位論文

不只一對一:使用機器閱讀理解的聯合生物醫學實體-關係萃取方法

Beyond Extracting One-to-One Relations: Joint Biomedical Entity-Relation Extraction Method Using Machine Reading Comprehension

指導教授 : 魏志平
若您是本文的作者,可授權文章由華藝線上圖書館中協助推廣。

摘要


命名實體識別(NER)和關係萃取(RE)是資訊萃取(IE)相關研究的兩個基本任務。實體關係提取(ERE)近年來吸引了大量的注意,其結合了 NER 以及 RE,旨在建立一個整合的技術來同時解決這兩個子任務。然而,現有的研究大部分都著重在一對一的關係萃取,幾乎沒有探討一對多關係萃取的相關文獻。 在本研究中,我們嘗試將 ERE 的任務轉換成機器閱讀理解 (MRC) 的問題,並提出一個 MRC 的聯合模型(joint model)來同時萃取實體以及關係,包括一對一與一對多的關係。同時,本研究也探索了在不同資料量的情況下,我們的聯合模型與管道模型(pipeline model)的表現,並探討我們提出的模型在少樣本學習(few-shot learning)的可擴展性。 我們的實驗顯示,本研究提出的聯合模型在大量資料集上的表現與管道模型相當,但在資料集合較小的前提下,聯合模型勝過了管道模型。在少樣本學習中,本研究提出的聯合模型也達到相當不錯的表現。

並列摘要


Named entity recognition (NER) and relation extraction (RE) are two fundamental tasks in information extraction (IE) research. Entity relation extraction (ERE) task, which attracts lots of research attention, combines NER and RE, aiming to establish an integrated technique to solve the two subtasks concurrently. However, most of the existing studies focus on extracting one-to-one relations, and there is very few literature examining the extraction of one-to-many relations. In this research, we attempt to transform the task of ERE into a machine reading comprehension (MRC) problem and propose a MRC model to jointly extract entities and relations, including one-to-one and one-to-many relations. At the same time, this research also evaluates the effectiveness of our proposed joint model with the pipeline model under various sizes of training data, and exploit the extensibility of our proposed joint model for few-shot learning. Experimentally, our joint model performs as well as the pipeline model on larger datasets and outperforms the pipeline model on relatively smaller datasets. Moreover, our proposed joint model is capable of solving the few-shot learning problem of one-to-many relations.

參考文獻


Arihara, Z., Sakurai, K., Yamashita, R., Niitsuma, S., Ueno, T., Yamamura, N., Yamada, S., Inoshita, N., and Takahashi, K. (2014). Bromocriptine, a dopamine agonist, increases growth hormone secretion in a patient with acromegaly. The Tohoku Journal of Experimental Medicine, 234(2):129–135.
Bodenreider, O. (2004). The unified medical language system (umls): integrating biomedical terminology. Nucleic Acids Research, 32(suppl 1):D267–D270.
Chen, D. (2018). Neural Reading Comprehension and Beyond. Stanford University.
Dai, D., Xiao, X., Lyu, Y., Dou, S., She, Q., and Wang, H. (2019a). Joint extraction of entities and overlapping relations using position-attentive sequence labeling. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 33, pages 6300–6308.
Dai, Z., Yang, Z., Yang, Y., Carbonell, J. G., Le, Q., and Salakhutdinov, R. (2019b). Transformer-xl: Attentive language models beyond a fixed-length context. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pages 2978–2988.

延伸閱讀