使用深度學習的FAQ聊天機器人：實作與比較

常見問題集(frequently asked questions, FAQ)是在業務場景中客戶最常問的問題集合，本篇論文在建立一有效回答常見問題集的聊天機器人(chatbot)。首先，問題的答案經常會隨著時間而改變，為了語料的穩定性和模型建立的準確性，我們將回答 FAQ 的問題轉變為從候選中檢索出最合適的匹配對象。接著，我們使用 term frequency–inverse document frequency (TFIDF)作為聊天機器人檢索匹配對象的根據，我們發現到 TFIDF 並不能識別客戶對同一個標準問題所產生出的不同測試題(query)。所以我們提出使用 BERT 來提升模型識別問題語義的能力，我們探討了使用不同比對模式來微調 BERT的情況，我們的結果超越了傳統上使用 BERT 對 query 進行文本分類的結果。同時我們比較text classification with BERT、cross-encoder BERT、Siamese BERT，在小資料量資料集例如：公司常見問題集，準確率從text classification with BERT 的74.20%和Siamese BERT的74.50%提升到cross-encoder BERT的81.00%。但是在大資料量資料集例如：Yahoo! Answers，text classification with BERT則有最高的準確率。另外，我們使用了不同的資料擴增方法，reverse pair和繁簡增生在cross-encoder BERT上都能提高準確率。

關鍵字

常見問題集；聊天機器人；問題相似度；問題回答； BERT

並列摘要

Frequently Asked Questions (FAQ) is a collection of questions most frequently asked by customers in business scenarios. This paper built a chatbot that can effectively answer the frequently asked questions. First of all, the answers to questions often change over time. For the stability of the corpus and the accuracy of model prediction, we built the model by retrieving the most matching answer from the candidates. Next, we used the term frequency-inverse document frequency (TFIDF) as the basis for the chatbot to retrieve matching objects. We found that TFIDF cannot identify different test questions (queries) generated by customers on the same standard question. So we proposed to use BERT to improve the ability of the model to identify the semantics of the problem. We explored the use of different comparison modes to fine-tune the BERT. Our results surpassed the traditional use of BERT for text classification of queries. At the same time, we compared text classification with BERT, cross-encoder BERT, Siamese BERT. For small datasets such as the company's common problem sets, the accuracy increased from 74.20% for text classification with BERT and 74.50% for Siamese BERT to 81.00% for cross-encoder BERT. But in large datasets such as Yahoo! Answers, text classification with BERT has the highest accuracy. In addition, we used different data augmentation methods. Both reverse pair and Traditional Chinese Simplified Chinese conversion can improve accuracy on cross-encoder BERT.

並列關鍵字

FAQ ； Chatbot ； Question Similarity ； Question Answering ； BERT

參考文獻

K. Kowsari, K. Jafari Meimandi, M. Heidarysafa, S. Mendu, L. Barnes, and D.Brown, “Text classification algorithms: A survey,” Information, vol. 10, no.4, p.150, 2019.

Google Scholar

P. Liu, X. Qiu, and X. Huang, “Recurrent neural network for text classification with multitask learning,” arXiv preprint arXiv:1605.05101, 2016.

Google Scholar

Z. Yang, D. Yang, C. Dyer, X. He, A. Smola, and E. Hovy, “Hierarchical attention networks for document classification,” in Proceedings of the 2016 conference of the North American chapter of the association for computational linguistics: human language technologies, 2016, pp. 1480–1489.

Google Scholar

S. Lai, L. Xu, K. Liu, and J. Zhao, “Recurrent convolutional neural networks for text classification,” in Twentyninth AAAI conference on artificial intelligence, 2015.

Google Scholar

A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, Ł. Kaiser, and I. Polosukhin, “Attention is all you need,” in Advances in neural information processing systems, 2017, pp. 5998–6008.

Google Scholar

國際替代計量

使用深度學習的FAQ聊天機器人：實作與比較

未授權

主題瀏覽