問答系統為透過使用者所輸入的問句進行處理後,給予回覆。因此,使用者所輸入的字詞與表達方式,為影響回覆準確的因素之一。本研究透過相似詞詞庫取代字詞,改善對於使用者在輸入文字時,因輸入錯誤、表達方式不同等問題,造成問答系統無法準確理解使用者想法,導致回覆準確度降低。問答系統模型分別以深度學習中的遞歸神經網路(Recurrent Neural Networks, RNN)、長短期記憶(Long Short-Term Memory, LSTM)、閘循環單元神經網路(Gated Recurrent Unit, GRU)建立,且比較在各模型中使用相似詞詞庫與未使用相似詞詞庫以及有無使用預訓練詞向量的正確率。本研究將相似詞應用於美食、美妝、電影、旅遊領域,研究結果顯示,使用相似詞詞庫正確率皆優於未使用相似詞詞庫,且使用預訓練詞向量可提高正確率,其中美食為上下文相似詞詞庫最佳,美妝為混合詞庫最佳,電影與旅遊皆為字元相似詞詞庫最佳,模型方面RNN表現最差,而在小資料集可選擇GRU,大資料集可選擇LSTM。
The question answering gives answers to questions entered by the user. Therefore, the words and expressions input by the user are one of the factors that affect the accuracy of the response. The research by using Chinese near-synonym corpus to replace words, which improves the user's input errors and different expressions when inputting texts will result in the question answering not being able to understand user's thoughts accurately, resulting in lower response accuracy. The models of question answering are constructed by Recurrent Neural Networks (RNN), Long Short-Term Memory (LSTM), and Gated Recurrent Unit (GRU), and compare the accuracy rate of using Chinese near-synonym corpus base with or without Chinese near-synonym corpus base and using pre-trained word vector in each model. The research applies Chinese near-synonym corpus to the fields of food, beauty, film, and tourism, and the use of pre-training vector can improve the accuracy rate. The results show that the accuracy of Chinese near-synonym corpus is better than the unused of Chinese near-synonym corpus. The food has using corpus with similar contexts is best. The integrating near-synonym corpus is the best in beauty, and both movies and travel are best for characters corpus. In terms of model, RNN performs the worst, while GRU can be selected in small data sets and LSTM can be selected in large data sets.