問答系統答案生成方法之比較-以客戶服務系統為例

本研究將問答系統應用於 Line 平台的聊天機器人上，並提供四種資料檢索的方法，都會先將使用者提出的問題經過斷詞後再進行，並利用句子向量從已整理好的知識庫中找到對應或者相似的問題回覆給使用者。第一是從客服常問問題集中找到相似的問答，並回傳相似度最高的前幾項供使用者做選擇;第二是透過 LDA 主題模型分類出且經過文本摘要的知識庫中進行搜尋;第三種則透過過去使用者填寫的分類同樣經過文本摘要的知識庫中進行搜尋;最後則是不經過分類的所有工作單中搜尋。本研究主要可分為文本資料知識庫預處理與問答系統，問答系統又可分為問句分析、資料檢索以及答案擷取。文本資料主要以中原大學電算中心歷年的工作處理單之資料庫為主，資料庫欄位總共有六十個，其中本研究僅針對處理單之問題、處理之回覆、使用者填選之分類進行分析。文本資料知識庫預處理是在清理清理尚未處理之問題與電算中心例行回報等不必要的資料，透過模型對資料進行訓練以及分類，經過斷詞、詞向量模型、主題模型與文本摘要整理出後續問答系統在資料檢索時所需要的知識庫。問答系統則相同透過斷詞、句子向量計算以及主題模型判斷主題，並從整理好的知識庫中尋找與輸入的相似問題。透過問答系統，不僅能減少行政單位的工作量，也能因縮小搜尋的範圍而減短使用者在等待問題解決的時間。

關鍵字

問答系統；斷詞；詞向量；主題模型；文本摘要

並列摘要

This research applies the question answering (QA) system to the Chatbot of the Line platform and provides four kinds of methods for data retrieval. The questions raised by the user will be tokenized, sentence vector calculating and determined the topic through LDA topic model before proceeding, and find similar questions from knowledge bases through sentence vector. First, we will find similar questions and answers from the past customer answering manual and returns items with the highest similarity for users to choose. Second, we will find similar questions through the knowledge base which is classified by LDA topic and summarized. The third one method is to search the data through the categories filled in by the users in the past and be summarized, too. At last, we will find corresponding or similar questions from the organized knowledge base and reply to the users. This paper can be divided into text data pre-processing and question answering (QA) system. Question answering system can be classified into question analysis, data retrieval, and answer extraction. The text data is mainly based on the database of the work process sheet in the Office of Information Technology of Chung Yuan Christian University over the years. There are a total of 60 database fields. Among them, this research only focuses on the problems of the processing sheets, the response of the processing, and the classification selected by users. Text data pre-processing is to clean up unnecessary data such as unprocessed problems and routine reports from the office, and through models to train and classify the data. After word segmentation, word vector model, topic model, and text summarization, it sorts out the follow-up knowledge base needed by the question and answer system for data retrieval. The question answering (QA) system also classifies question types through the topic model and uses the word vector model to search for similar questions during data retrieval. Through, it can not only reduce the workload of the administrative units, but also narrow down the scope of the search and reduce the time users wait for the problem to be solved.

並列關鍵字

Question Answering System ； Tokenize ； Word Embedding ； Topic Model

參考文獻

疾管署推出 LINE@聊天機器人流感疫苗問答即時搞定. (2017). Retrieved from https://www.mohw.gov.tw/cp-16-37646-1.html

Google Scholar

追蹤最新全球疫情!武漢肺炎 Chatbot 一鍵了解即時世界病例數. (2020). Retrieved from https://goskyai.com/tw/blog/casestudy/2019ncov-chatbot/

Google Scholar

維基百科. 隱含狄利克雷分布. Retrieved from https://zh.wikipedia.org/wiki/隱含狄利克雷分布

Google Scholar

Blei, D. M., Ng, A. Y., Jordan, M. I. (2003). Latent dirichlet allocation. Journal of machine Learning research, 3(Jan), 993-1022.

Google Scholar

Clark, A., Fox, C., Lappin, S. (2013). The handbook of computational linguistics and natural language processing: John Wiley Sons.

Google Scholar

國際替代計量

問答系統答案生成方法之比較-以客戶服務系統為例

全文下載

主題瀏覽