透過您的圖書館登入
IP:18.222.179.186
  • 學位論文

問答系統答案生成方法之比較-以客戶服務系統為例

Comparison of Answer Generation Methods in Question Answering System - Taking Customer Service System as an example

指導教授 : 賀嘉生 鄭憲永

摘要


本研究將問答系統應用於 Line 平台的聊天機器人上,並提供四種資料檢索的方 法,都會先將使用者提出的問題經過斷詞後再進行,並利用句子向量從已整理好的知 識庫中找到對應或者相似的問題回覆給使用者。第一是從客服常問問題集中找到相似 的問答,並回傳相似度最高的前幾項供使用者做選擇;第二是透過 LDA 主題模型分類 出且經過文本摘要的知識庫中進行搜尋;第三種則透過過去使用者填寫的分類同樣經 過文本摘要的知識庫中進行搜尋;最後則是不經過分類的所有工作單中搜尋。 本研究主要可分為文本資料知識庫預處理與問答系統,問答系統又可分為問句分 析、資料檢索以及答案擷取。文本資料主要以中原大學電算中心歷年的工作處理單之 資料庫為主,資料庫欄位總共有六十個,其中本研究僅針對處理單之問題、處理之回 覆、使用者填選之分類進行分析。文本資料知識庫預處理是在清理清理尚未處理之問 題與電算中心例行回報等不必要的資料,透過模型對資料進行訓練以及分類,經過斷 詞、詞向量模型、主題模型與文本摘要整理出後續問答系統在資料檢索時所需要的知 識庫。 問答系統則相同透過斷詞、句子向量計算以及主題模型判斷主題,並從整理好的 知識庫中尋找與輸入的相似問題。透過問答系統,不僅能減少行政單位的工作量,也 能因縮小搜尋的範圍而減短使用者在等待問題解決的時間。

並列摘要


This research applies the question answering (QA) system to the Chatbot of the Line platform and provides four kinds of methods for data retrieval. The questions raised by the user will be tokenized, sentence vector calculating and determined the topic through LDA topic model before proceeding, and find similar questions from knowledge bases through sentence vector. First, we will find similar questions and answers from the past customer answering manual and returns items with the highest similarity for users to choose. Second, we will find similar questions through the knowledge base which is classified by LDA topic and summarized. The third one method is to search the data through the categories filled in by the users in the past and be summarized, too. At last, we will find corresponding or similar questions from the organized knowledge base and reply to the users. This paper can be divided into text data pre-processing and question answering (QA) system. Question answering system can be classified into question analysis, data retrieval, and answer extraction. The text data is mainly based on the database of the work process sheet in the Office of Information Technology of Chung Yuan Christian University over the years. There are a total of 60 database fields. Among them, this research only focuses on the problems of the processing sheets, the response of the processing, and the classification selected by users. Text data pre-processing is to clean up unnecessary data such as unprocessed problems and routine reports from the office, and through models to train and classify the data. After word segmentation, word vector model, topic model, and text summarization, it sorts out the follow-up knowledge base needed by the question and answer system for data retrieval. The question answering (QA) system also classifies question types through the topic model and uses the word vector model to search for similar questions during data retrieval. Through, it can not only reduce the workload of the administrative units, but also narrow down the scope of the search and reduce the time users wait for the problem to be solved.

參考文獻


疾管署推出 LINE@聊天機器人 流感疫苗問答即時搞定. (2017). Retrieved from https://www.mohw.gov.tw/cp-16-37646-1.html
追蹤最新全球疫情!武漢肺炎 Chatbot 一鍵了解即時世界病例數. (2020). Retrieved from https://goskyai.com/tw/blog/casestudy/2019ncov-chatbot/
維基百科. 隱含狄利克雷分布. Retrieved from https://zh.wikipedia.org/wiki/隱含狄利克雷分布
Blei, D. M., Ng, A. Y., Jordan, M. I. (2003). Latent dirichlet allocation. Journal of machine Learning research, 3(Jan), 993-1022.
Clark, A., Fox, C., Lappin, S. (2013). The handbook of computational linguistics and natural language processing: John Wiley Sons.

延伸閱讀