運用字根與詞彙改善智慧問答系統

現代社會中社交軟體與個人行動裝置蓬勃發展，人與人之間的溝通漸漸地已經不用面對面的對話，透過行動裝置中的社交軟體就能隨時隨地的與人進行溝通，同時，許多店家開始藉由社交軟體進行宣傳，甚至將實體的店面轉換為電子商店。而大多的電子商店都會使用智慧問答系統來服務顧客，而這些智慧問答系統大多為檢索式的問答系統，這類型的問答系統若不更新資料庫或是擴充資料庫，容易產生回應錯誤。然而，若只一昧的增加資料不進行資料的整理，則會導致回應時間拉長的問題，因此本論文目的在於藉由資料分類結合分群的方式，並結合過往研究較少應用於問答系統的字根轉換，來改善問答系統。在本研究中針對社群軟體中的Dcard論壇進行對話資料的獲取並建立對話資料集，藉由5W1H分類先進行資料處理再將字詞轉換為部首後採用Dynamic K-MEANS分群進行資料處理，最後透過判斷檢索產生的答案是否與原始答案是否為相同群集來進行正確率評估。並另外將整理完成的資料以LSTM模型進行訓練並應用於生成式問答系統，透過模型的混淆度及問卷請進行人工的評估。研究結果顯示，經由5W1H的分類，並轉換部首後進行Dynamic K-MEANS分群，能降低檢索式問答系統所花費的時間；準確度與傳統分群相比也有較為優秀的表現。

關鍵字

問答系統； K-means ； 5W1H ； LSTM ；部首

並列摘要

In modern society, social software and personal mobile devices are booming. Communication between people has gradually disappeared without face-to-face dialogue. Through social software in mobile devices, people can communicate with people anytime and anywhere. At the same time, many stores start to borrow. Promoted by social software, and even converted the physical storefront into an electronic store. Most e-shops use smart question-and-answer systems to serve customers. Most of these smart question-answering systems are search-based Q&A systems. This type of question-and-answer system is prone to response errors if it does not update the database or expand the database. However, if only one increase of data is not organized, it will lead to a lengthy response time. Therefore, the purpose of this paper is to combine the grouping by means of data classification, and to combine the past research with the question-and-answer system. Root conversion to improve the question and answer system. In this study, the Dcard Forum in the community software is used to obtain the dialogue data and establish the dialogue data set. After the 5W1H classification, the data processing is performed first, then the words are converted into the radicals, and then the Dynamic K-MEANS group is used for data processing. Finally, the correct rate is evaluated by judging whether the answer generated by the search is the same cluster as the original answer. In addition, the completed data is trained by the LSTM model and applied to the generous question and answer system. The model confusion and questionnaire are manually evaluated. The results of the study show that the classification of 5W1H and the conversion of the radical K-MEANS after the radicals can reduce the time spent on the search-based question-and-answer system; the accuracy is better than that of the traditional group.

並列關鍵字

Interrogator-responder system ； K-means ； 5W1H ； LSTM ； Radical

參考文獻

Akaka, D. K. (2004). Data mining: federal efforts cover a wide range of uses. Washington Federal Government General Accounting Office (GAO) Report (GAO-04-548).

Google Scholar

Banerjee, S., & Lavie, A. (2005). METEOR: An automatic metric for MT evaluation with improved correlation with human judgments. In Proceedings of the acl workshop on intrinsic and extrinsic evaluation measures for machine translation and/or summarization, (65-72).

Google Scholar

Bengio, Y., Ducharme, R., Vincent, P., & Jauvin, C. (2003). A neural probabilistic language model. Journal of machine learning research, 1137-1155.

Google Scholar

Bengio, Y., Simard, P., & Frasconi, P. (1994). Learning long-term dependencies with gradient descent is difficult. IEEE transactions on neural networks, 5(2), (157-166.)

Google Scholar

Brown, P. F., Pietra, V. J. D., Mercer, R. L., Pietra, S. A. D., & Lai, J. C. (1992). An estimate of an upper bound for the entropy of English. Computational Linguistics, 18(1), (31-40.)

Google Scholar

國際替代計量

運用字根與詞彙改善智慧問答系統

不提供下載

主題瀏覽