使用集成學習方法與深度遞迴類神經網路建立網路論壇對話預測模型

自然語言對話涉及語言理解、推理和基本常識的使用，因此是最具挑戰性的人工智慧問題之一，而設計一個通用的對話模型系統，又顯得更為複雜且困難。過去在自然語言對話的研究主要集中在基於規則(Rule-based)或基於機器學習(Machine Learning)的方法，這些方法雖然在其適用之特定領域與範圍內可以解決部分對話問題，但卻有其學習瓶頸存在，直到遞迴神經網路(Recurrent Neural Network, RNN)與序列到序列(Sequence to Sequence)的模型結構被提出後，在此領域的研究得到了進一步的突破。但基於資料驅動的深度學習雖然可以自動抽取大量語言資料中的特徵，卻對於資料集的數量與質量的要求較高，且常有過度學習(Overfitting)的問題，因此，如何在有限的訓練資料中，盡可能的抽取有用特徵，且在不同的情境中達到模型泛化能力，是深度學習在自然語言對話模型中所面臨的困境。本論文提出一個「使用集成學習方法與深度遞迴類神經網路建立網路論壇對話預測模型」，集成學習方法的優點是它能提升模型的泛化能力(Generalization Ability)來強化預測，促使模型能適用於各情境、領域的預測問題。在此研究中，將導入集成學習方法在複雜情境的對話預測問題中，此方法是基於深度遞迴神經網路的對話預測模型，使用集成學習方法訓練多個不同種類、不同參數、不同訓練資料集的遞迴神經網路模型子預測模型，再以特定設計之集成策略來建立預測結果的集成方法，透過多個子模型所共同預測和判斷，以得到一個更通用且具備泛化能力的對話預測模型。

關鍵字

集成學習；對話模型；深度學習；遞迴神經網路；自然語言處理

並列摘要

The study on natural language dialogue or conversation involves language understanding, reasoning, and basic common sense, therefore it is one of the most challenging artificial intelligence issues. To design a common and general conversation model is even more complicated and difficult. In the past, the studies on natural language processing and dialogue mainly focused on the rule-based and machine learning-based methods. Although these methods can solve part of the dialogue problems in the specific fields, but they have their own learning bottlenecks. Until recurrent neural networks (RNN) and sequence to sequence model is proposed, the research in this field has been further breakthrough. However, although deep learning can automatically extract the features of a large number of dialogue data, it has high requirements on the quantity and quality of data sets, and has the overfitting problem. Therefore, how to extract the useful features from the limited training dataset, and achieve model generalization ability in different situations, is the challenge of deep learning in the natural language dialogue problem. This project is titled “Conversation Model using Deep Recurrent Neural Networks with Ensemble Learning”. The advantage of the ensemble learning is that it enhances the generalization ability of the model to reinforce the prediction, and make the model suitable for the prediction of various contexts and scenarios. In this study, ensemble learning will be applied to the natural language dialogue and conversation model in various and complex contexts and scenarios. This method is a deep neural network conversation model, using the ensemble learning method to train the sub-prediction model of multiple different types, different parameters, and different training data sets. Then to obtain the prediction results by the specific designed ensemble strategy. Through a number of sub-models jointly predicted and judged to get a generalized conversation prediction model.

並列關鍵字

ensemble learning ； conversation model ； deep learning ； recurrent neural networks ； natural language processing

參考文獻

Bengio, Y., Simard, P., & Frasconi, P. (1994). Learning long-term dependencies with gradient descent is difficult. IEEE Transactions on Neural Networks, 5(2), 157–166. https://doi.org/10.1109/72.279181

Google Scholar

Chung, J., Gulcehre, C., Cho, K., & Bengio, Y. (2014). Empirical Evaluation of Gated Recurrent Neural Networks on Sequence Modeling. ArXiv:1412.3555 [Cs]. Retrieved from http://arxiv.org/abs/1412.3555

Google Scholar

Dasarathy, B. V., & Sheela, B. V. (1979). A composite classifier system design: Concepts and methodology. Proceedings of the IEEE, 67(5), 708–713. https://doi.org/10.1109/PROC.1979.11321

Google Scholar

Dietterich, T. G. (2000). Ensemble Methods in Machine Learning. In Multiple Classifier Systems (pp. 1–15). Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-45014-9_1

Google Scholar

Gers, F. A., Schmidhuber, J., & Cummins, F. (2000). Learning to Forget: Continual Prediction with LSTM. Neural Computation, 12(10), 2451–2471. https://doi.org/10.1162/089976600300015015

Google Scholar

國際替代計量

使用集成學習方法與深度遞迴類神經網路建立網路論壇對話預測模型

未授權

主題瀏覽