文件的向量表達方式在自然語言處理的許多應用上扮演核心角色。尤其,以非監督學習所得到的一般性向量表達在這些應用中更是一大助益。在實務上,情緒分析是一個縱使困難,卻被認為非常語意層面的的應用,也因此常被用來當作檢測向量品質的工具。目前以非監督方式學習文件向量的方法主要可分為以下兩類:序列式的,他們直接把字彙間的排列順序納入考慮,以及非序列式的,他們不直接考慮字彙間的順序。然而,他們各自都有各自的問題仍待解決。在這篇論文中,我們提出一個模型,可以同時解決這兩種主要方法所面臨的難處。實驗證明我們所提出的方法在常見的情緒分析和同時考量多層面的細緻情緒分析上,都遠遠優於現有的最佳方法。
Document representation is the core of many NLP tasks on machine understanding. A general representation learned in an unsupervised manner reserves generality and can be used for various applications. In practice, sentiment analysis (SA) has been a challenging task that is regarded to be deeply semantic-related and is often used to assess general representations. Existing methods on unsupervised document representation learning can be separated into two families: sequential ones, which explicitly take the ordering of words into consideration, and non-sequential ones, which do not explicitly do so. However, both of them suffer from their own weaknesses. In this paper, we propose a model that overcomes difficulties encountered by both families of methods. Experiments show that our model outperforms state-of-the-art methods on popular SA datasets and a fine-grained aspect-based SA by a large margin.