透過您的圖書館登入
IP:18.190.152.38
  • 學位論文

建立和應用具有幽默風格的生成對話系統

Building and applying a generative dialogue system with humorous styles

指導教授 : 曾元顯

摘要


本研究旨在建置與應用一個具有幽默風格之對話系統。藉由2019 年CECG(Chinese Emotional Conversation Generation)評估任務所使用的170萬則對話語料,整合 GPT-2 與 BERT 等工具與技術進行實作,建立與應用一個具備情感對話的系統;而後結合LCCC(Large-scale Cleaned Chinese Conversation)base版本680萬則對話語料,讓對話系統擁有更豐富的對話內容;最後加上 156 句具有幽默風格的少量撩妹語料進行微調(fine-tuning),同時透過前導文句調整(prefix-tuning)來控制文字的生成。 系統成效評估是基於以下準則:(一)建立兩個對話系統,一個經由CECG 和 LCCC-base 語料庫進行訓練,並用撩妹幽默語料進行微調,另一個僅由CECG 和 LCCC-base 語料庫進行訓練。(二)在第一輪當中,使用帶有調情性質的自訂文句作為會話的開始,並測試50次。(三)評估每次對話是否連貫流暢,同時,最後一輪的結束對話是否具有如同調情般的幽默風格。(四)測試最多3輪。 過程由四位人工判斷,沒有使用撩妹語料進行微調的對話系統,其生成回應具有撩妹效果的有29%,而使用撩妹語料進行微調的對話系統,其生成回應具有撩妹效果的有62%。 本研究的主要貢獻如下:(一)將情感融入發文字串,作為條件求機率,以便簡潔地依原方式訓練,並使用 GPT-2。(二)運用 BERT 來預測回應文句的連貫性,以作為排序的依據。(三)透過少量的語料來微調預訓練模型,改變模型的文字生成風格。(四)透過前導文句的調整,來實作出具有幽默風格的多輪對話系統。

並列摘要


The purpose of this study is to build and apply a generative dialogue system with humorous styles. Based on the corpora provided by the 2019 Chinese Emotional Conversation Generation (CECG) evaluation task, Large-scale Cleaned Chinese Conversation base version (LCCC-base) and flirting conversation retrieved from the Internet, an emotional conversation system is implemented in this paper using GPT-2 and BERT. Meanwhile, the generation of response from this system is refined via prefix-tuning. The effectiveness of this system is evaluated based on the steps as shown below: (1) Build two dialogue systems one is trained by the corpora of CECG and LCCC-base and fine-tuned with flirting corpus; the other is only trained by the corpora of CECG and LCCC-base. (2) Use a customized sentence with flirting words in the initial conversation and test this kind of conversation 50 times. (3) Evaluate whether every conversation is coherent and fluent; meanwhile, evaluate whether the ending dialogue of the final round is with humorous style like flirting. (4) Converse with the system at most 3 rounds in each conversation. Following these steps, four human annotators converse with the system. The results show that the effectiveness of the dialogue system which is only trained by the corpora of CECG and LCCC-base is 29%, and the effectiveness of the other which is trained by the corpora of CECG and LCCC-base and fine-tuned with flirting corpus is 62%. The main contributions of this study are: (1) Integrating emotions into the post string as a condition for computing probability, without changing the way to train and apply GPT-2; (2) Applying BERT to predict the coherence of response sentences as a basis for response ranking; (3) Fine-tuning a language model with few-shot to change the styles of the response generated from a dialogue system; (4) Implementing a multi-turn dialogue system with humorous styles via prefix-tuning.

參考文獻


Blinov, P.、Avetisian, M.、Kokh, V.、Umerenkov, D.、Tuzhilin, A. (2020)。 Predicting Clinical Diagnosis from Patients Electronic Health Records Using BERT-based Neural Networks。arXiv preprint arXiv:2007.07562。
Chang, L.-L.、Chen, K.-j.、Huang, C.-R.(1996)。 語料庫在辭典編輯上的運用 (The Application of Language Corpus on Dictionary Editing)[In Chinese]。 於「Proceedings of Rocling IX Computational Linguistics Conference IX 」發表之論文,載於 Book 語料庫在辭典編輯上的運用 (The Application of Language Corpus on Dictionary Editing)[In Chinese],頁255-279。
Chaves, A. P.、Gerosa, M. A. (2020)。 How Should My Chatbot Interact? A Survey on Social Characteristics in Human–Chatbot Interaction Design。International Journal of Human–Computer Interaction,頁 1-30。
Devlin, J.、Chang, M.-W.、Lee, K.、Toutanova, K. (2018)。 Bert: Pre-training of deep bidirectional transformers for language understanding。arXiv preprint arXiv:1810.04805。
Dybala, P.、Ptaszynski, M.、Higuchi, S.、Rzepka, R.、Araki, K.(2008)。 Humor prevails!-implementing a joke generator into a conversational system。 於「Australasian Joint Conference on Artificial Intelligence 」發表之論文,載於 Book Humor prevails!-implementing a joke generator into a conversational system,頁214-225。

延伸閱讀