運用中文相似詞改善智慧問答系統

問答系統為透過使用者所輸入的問句進行處理後，給予回覆。因此，使用者所輸入的字詞與表達方式，為影響回覆準確的因素之一。本研究透過相似詞詞庫取代字詞，改善對於使用者在輸入文字時，因輸入錯誤、表達方式不同等問題，造成問答系統無法準確理解使用者想法，導致回覆準確度降低。問答系統模型分別以深度學習中的遞歸神經網路（Recurrent Neural Networks, RNN）、長短期記憶（Long Short-Term Memory, LSTM）、閘循環單元神經網路（Gated Recurrent Unit, GRU）建立，且比較在各模型中使用相似詞詞庫與未使用相似詞詞庫以及有無使用預訓練詞向量的正確率。本研究將相似詞應用於美食、美妝、電影、旅遊領域，研究結果顯示，使用相似詞詞庫正確率皆優於未使用相似詞詞庫，且使用預訓練詞向量可提高正確率，其中美食為上下文相似詞詞庫最佳，美妝為混合詞庫最佳，電影與旅遊皆為字元相似詞詞庫最佳，模型方面RNN表現最差，而在小資料集可選擇GRU，大資料集可選擇LSTM。

關鍵字

相似詞；問答系統；深度學習；檢索式模型；生成式模型

並列摘要

The question answering gives answers to questions entered by the user. Therefore, the words and expressions input by the user are one of the factors that affect the accuracy of the response. The research by using Chinese near-synonym corpus to replace words, which improves the user's input errors and different expressions when inputting texts will result in the question answering not being able to understand user's thoughts accurately, resulting in lower response accuracy. The models of question answering are constructed by Recurrent Neural Networks (RNN), Long Short-Term Memory (LSTM), and Gated Recurrent Unit (GRU), and compare the accuracy rate of using Chinese near-synonym corpus base with or without Chinese near-synonym corpus base and using pre-trained word vector in each model. The research applies Chinese near-synonym corpus to the fields of food, beauty, film, and tourism, and the use of pre-training vector can improve the accuracy rate. The results show that the accuracy of Chinese near-synonym corpus is better than the unused of Chinese near-synonym corpus. The food has using corpus with similar contexts is best. The integrating near-synonym corpus is the best in beauty, and both movies and travel are best for characters corpus. In terms of model, RNN performs the worst, while GRU can be selected in small data sets and LSTM can be selected in large data sets.

並列關鍵字

Near-Synonym ； Question Answering ； Deep Learning ； Retrieval-Based Models ； Generative Models

參考文獻

英文文獻：

Google Scholar

Ageishi, R., & Miura, T. (2010). Automatic extraction of synonyms based on statistical machine translation. In Tools with Artificial Intelligence (ICTAI), 2010 22nd IEEE International Conference on, pp. 313-317.

Google Scholar

Bi, Y., Deng, K., & Cheng, J. (2017). A keyword-based method for measuring sentence similarity. In Proceedings of the 2017 ACM on Web Science Conference, pp. 379-380.

Google Scholar

Cho, K., Van Merriënboer, B., Gulcehre, C., Bahdanau, D., Bougares, F., Schwenk, H., & Bengio, Y. (2014). Learning phrase representations using RNN encoder-decoder for statistical machine translation.

Google Scholar

Gao, L., & Chen, H. (2018). An automatic extraction method based on synonym dictionary for web reptile question and answer. In 2018 13th IEEE Conference on Industrial Electronics and Applications (ICIEA), pp. 375-378.

Google Scholar

國際替代計量

運用中文相似詞改善智慧問答系統

不提供下載

主題瀏覽