醫療社群問答系統提問意圖偵測之研究

本論文建立一個使用者提問文本之意圖類型偵測系統，提出三種類型的特徵資料，第一種是詞嵌入向量產生向量維度之間的關聯性特徵資料﹐第二種是每個單詞與醫療概念關鍵字相似度特徵資料，第三種是詞性嵌入向量特徵資料。本論文並提出兩種基於卷積神經網路的學習網路，第一種是CNN Joint Model，利用多種特徵資料的特徵向量，學習預測提問文本之意圖類型，第二種是Ensemble CNN Model，每種特徵資料會先獨立預測提問文本之意圖類型程度值，並利用Ensemble參數學習每個特徵比重，再將每個特徵資料的預測結果與比重相乘後再相加，用以調整模型預測結果。實驗結果顯示，醫療概念關鍵字特徵與詞向量維度關聯特徵同時作為輸入特徵時，能更有效地預測提問文本的意圖類型，再與傳統的詞嵌入向量或詞性嵌入向量做為同時輸入的特徵資料時，可使模型分類效果提升。透過實驗綜合評估，當系統推薦程度值大於門檻值0.3的意圖類型時，可以實現最佳的意圖類型預測效果，F1評估值可達到0.75。

關鍵字

意圖類型分類；醫療概念關鍵字特徵；基於卷積神經網路的學習網路

並列摘要

This paper aims to establish an intention type detection system for user questions. We propose three types of feature data. The first one is using the word embedding vector to generate the correlation features between the various vector dimensions. The second is the similarity features of each word with a set of pre-defined medical concept keywords. The third one is the embedded vector feature of the part-of-speech for each word. Then two frameworks of CNN-based learning models are proposed. The first one is CNN Joint Model, which concatenates CNN output results of various types of features to learn the intention types. The second one is Ensemble CNN Model. The feature data is used to predict the intention type degree value independently. Then the Ensemble parameters are used to learn the weight of each feature to combine the prediction results of various types of features. The results of experiments show that when the medical concept keyword feature and the word vector dimension association feature are combined as input features, the intent type of the question text can be predicted with high F1 measure. To combine with the traditional word embedding vector or part-of-speech embedding vector as the input feature data at the same time, the prediction result can be improved furthermore. Through the comprehensive evaluation on the experiments, when the predicted intention type degree value greater than a threshold value 0.3, the best result of intention types prediction can be achieved, whose F1 measure is at least 0.75.

並列關鍵字

intention types classification ； medical concept keyword feature ； learning network based on CNN

參考文獻

[1] Adlassnig, K. P. (1986). Fuzzy set theory in medical diagnosis. In IEEE Transactions on Systems, Man, and Cybernetics.

Google Scholar

[2] Chen, Z., Lin, F., Liu, H., Liu, Y., Ma, W. Y., & Wenyin, L. (2002). User Intention Modeling in Web Applications Using Data Mining. In Journal of World Wide Web.

Google Scholar

[3] Collobert, R., Weston, J., Bottou, L., Karlen, M., Kavukcuoglu, K., & Kuksa, P. (2011). Natural Language Processing (Almost) from Scratch. In Journal of Machine Learning Research.

Google Scholar

[4] Chen, L., Zhang, D., & Levene, M. (2013). Question retrieval with user intent. In Proceedings of the 36th international ACM SIGIR conference on Research and development in information retrieval.

Google Scholar

[5] Ding, X., Liu, T., Duan, J., & Nie, J. Y. (2015). Mining User Consumption Intention from Social Media Using Domain Adaptive Convolutional Neural Network. In Proceedings of the Twenty-Ninth AAAI Conference on Artificial Intelligence.

Google Scholar

國際替代計量

醫療社群問答系統提問意圖偵測之研究

主題瀏覽