透過您的圖書館登入
IP:18.223.170.21
  • 學位論文

語意理解與語句生成之對偶性的有效利用

Exploiting the Duality between Language Understanding and Generation and Beyond

指導教授 : 陳縕儂

摘要


許多現實世界的人工智慧問題都帶有對偶性質,也就是說,我們可以直接調換一個任務的輸入和目標來形成另外一個任務。機器翻譯是一個很經典的例子,舉例來說,從英文翻譯至中文有一個對偶任務為從中文翻譯至英文。語音辨識和語音合成之間也有結構對偶性。給定一個資訊文本的片段,回答問題和產生問題也是對偶態。最近的研究於有效利用任務之間的對偶性來提升表現也顯現了對偶性的重要性。自然語言理解和自然語言生成皆為自然語言處理以及對話領域的重要研究主題,自然語言理解的目標是抽取出給定語句的核心語意,而自然語言生成則相反,其目標是為基於給定的語意建構對應的句子。然而,語言理解和語言生成之間的對偶性尚未被探討過。 本篇論文旨在探究自然語言理解和自然語言生成之間的結構對偶性。在本篇論文中,我們展示五篇連續的研究,每一篇聚焦在學習以及資料情境的不同層面。第一,我們有效利用了自然語言理解和自然語言生成之間的對偶性並將其作為正則化項導入學習目標。此外,我們利用專業知識來設計適合的方法來估計資料分布。第二,我們進一步提出了聯合學習框架,提供了使用不只是監督式學習還有非監督式學習演算法的彈性、且使兩個模型之間能夠順暢流通梯度。第三,我們研究如何利用最大化相互資訊來增強聯合學習框架。上述的研究都是在訓練階段有效利用對偶性,因此最後,我們向前邁進一步、在推論階段以及預訓練後的微調階段利用對偶性。每一篇研究都展示了一個用不同方式來有效利用對偶性的新模型或是學習框架,總合起來,這篇論文探索了有效利用自然語言理解和自然語言生成之間的結構對偶性的一個新研究方向。

並列摘要


Many real-world artificial intelligence tasks come with a dual form; that is, we could directly swap the input and the target of a task to formulate another task. Machine translation is a classic example, for example, translating from English to Chinese has a dual task of translating from Chinese to English. Automatic speech recognition (ASR) and text-to-speech (TTS) also have structural duality. Given a piece of informative context, question answering and question generation are in dual form. The recent studies magnified the importance of the duality by boosting the performance of both tasks with the exploitation of the duality. Natural language understanding (NLU) and natural language generation (NLG) are both critical research topics in the NLP and dialogue fields. The goal of natural language understanding is to extract the core semantic meaning from the given utterances, while natural language generation is opposite, of which the goal is to construct corresponding sentences based on the given semantics. However, the dual property between understanding and generation has been rarely explored. This main goal of this dissertation is to investigate the structural duality between NLU and NLG. In this thesis, we present four consecutive studies, each focuses on different aspects of learning and data settings. First, we exploits the duality between NLU and NLG and introduces it into the learning objective as the regularization term. Moreover, expert knowledge is incorporated to design suitable approaches for estimating data distribution. Second, we further propose a joint learning framework, which provides flexibility of incorporating not only supervised but also unsupervised learning algorithms and enables the gradient to propagate through two modules seamlessly. Third, we study how to enhance the joint framework by mutual information maximization. Above works exploit the duality in the training stage, hence lastly, we make a step forward to leverage the duality in the inference stage and the finetuning stage after pretraining. Each work presents a new model and learning framework exploiting the duality in different manners. Together, this dissertation explores a new research direction of exploiting the duality between language understanding and generation.

參考文獻


[1]S.­Y. Su, C.­W. Huang, and Y.­N. Chen, “Dual supervised learning for natural lan­guage understanding and generation,” inProceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pp. 5472–5477, 2019.
[2]S.­Y. Su, C.­W. Huang, and Y.­N. Chen, “Towards unsupervised language under­standing and generation by joint dual learning,” in Proceedings of the 58th AnnualMeeting of the Association for Computational Linguistics, 2020.
[3]S.­Y. Su, Y.­S. Chuang, and Y.­N. Chen, “Dual inference for improving language understanding and generation,” inFindings of the Association for ComputationalLinguistics: EMNLP2020, (Online), pp. 4930–4936, Association for ComputationalLinguistics, Nov. 2020.
[4]E. J. L., “Finding structure in time,”Cognitive Science, vol. 14, no. 2, pp. 179–211.
[5]K. Cho, B. Van Merriënboer, C. Gulcehre, D. Bahdanau, F. Bougares, H. Schwenk, and Y. Bengio, “Learning phrase representations using rnn encoder­ decoder for sta­tistical machine translation,”arXiv preprint arXiv:1406.1078, 2014.

延伸閱讀