基於深度學習循環類神經結合注意力機制建立語言模型

鑒於網際網路的快速成長，文本資料量也隨之成長，人們從這些資料中獲取到所需的資訊用以決策使用。針對文本任務分析所獲得的潛在資訊可能包含群眾的風向、商品的使用意見或者是一些與市場走向相關的資訊，而這些有用的資訊都指向「如何從文本中獲取特徵」的問題。應用類神經方法於擷取文本特徵的模型稱為類神經語言模型(neural network language model, NNLM)，所要解決的是辭彙與辭彙之間的共現關係(co-occurrence)，其中獲取的特徵基於n-gram模型概念。使用辭向量來保存資訊尤為重要，原因在於語句向量或者文檔向量理論上還是得瞭解辭彙之間的關係，基於如此，本研究進行辭向量的探討。本研究假設一個辭彙含有「辭彙的本身意義」以及「辭彙在語句中對齊的關係」，並使用RNN(recurrent neural network)結合注意力機制建立語言模型。本研究依照英文資料集Penn Treebank (PTB)、WikiText-2 (WT2)以及中文資料集NLPCC2017之實驗結果證實模型添加注意力機制能夠提早收斂於困惑度(perplexity, PPL)低點，以達到更佳的效果。

關鍵字

語言模型；循環類神經模型；人工智慧；注意力機制

並列摘要

The rapid growth of the Internet promotes the growth of textual data, and people get the information they need from the amount of textual data to solve problems. The textual data may include some potential information like the opinions of the crowd, the opinions of the product, or some market-relevant information. However, some problems that point to "How to get features from the text” must be solved. The model of extracting the text features by using the neural Network method is called neural network language model (NNLM). The features are based on n-gram Model concept, which are the co-occurrence relationship between the vocabulary. The word vectors are important because the sentence vectors or the document vectors still have to understand the relationship between the words, and based on this, this study discussing the word vectors. This study assumes that the words contains "the meaning in sentences" and "the position of grammar". This study uses RNN (recurrent neural network) with attention mechanism to establish a language model. This study uses Penn Treebank (PTB), WikiText-2 (WT2) and NLPCC2017 text dataset. According to these dataset, the proposed models provide the better performance by the perplexity(PPL).

並列關鍵字

language model ； recurrent neural network ； artificial intelligence ； attention mechanism

參考文獻

[1] Agarwal, B., & Mittal, N. (2014). Text classification using machine learning methods-a survey. In Proceedings of the Second International Conference on Soft Computing for Problem Solving (SocProS 2012), 701-709.

Google Scholar

[2] Bahdanau, D., Cho, K., & Bengio, Y. (2014). Neural machine translation by jointly learning to align and translate. arXiv preprint arXiv:1409.0473.

Google Scholar

[3] Bengio, Y., Simard, P., & Frasconi, P. (1994). Learning long-term dependencies with gradient descent is difficult. IEEE transactions on neural Network, 5, 157-166.

Google Scholar

[4] Bengio, Y., Ducharme, R., Vincent, P., & Jauvin, C. (2003). A neural probabilistic language model. The Journal of Machine Learning Research (JMLR), 3, 1137-1155.

Google Scholar

[5] Cho, K., Van Merriënboer, B., Gulcehre, C., Bahdanau, D., Bougares, F., Schwenk, H., & Bengio, Y. (2014). Learning phrase representations using RNN encoder-decoder for statistical machine translation. arXiv preprint arXiv:1406.1078.

Google Scholar

國際替代計量

基於深度學習循環類神經結合注意力機制建立語言模型

全文下載

主題瀏覽