本文主要在驗證目前被廣泛應用的深度學習方法,即利用類神經網路所建構的機器學習模型,在自然語言處理領域中之成效。同時,我們對各式模型進行了一系列的強健性分析,其中主要包含了觀察這些模型對於對抗性(adversarial)輸入擾動之抵抗力。更進一步來說,本文所進行的實驗對象,包含了近期受到許多注目的 Transformer 模型,也就是建構在自我注意力機制之上的一種類神經網路,以及目前常用的,基於長短期記憶 (LSTM)細胞所搭建的遞歸類神經網路等等不同網路架構,觀察其應用於自然語言處理上的結果與差異。在實驗內容上,我們囊括了許多在自然語言處理領域中最常見的工作,例如:文本分類、斷詞及詞類標註、情緒分類、蘊含分析、文件摘要及機器翻譯等。結果發現,基於自我注意力的 Transformer 架構在絕大多數的工作上都有較為優異的表現。除了使用不同網路架構並對其成效進行評估,我們也對輸入之資料加以對抗性擾動,以測試不同模型在可靠度上的差異。另外,我們同時提出一些創新的方法來產生有效的對抗性輸入擾動。更重要的是,我們基於前述實驗結果提出理論上的分析與解釋,以探討不同類神經網路架構之間強健性差異的可能來源。
In this work, we focus on investigating the effectiveness of current deep learning methods, also known as neural network-based models, in the field of natural language processing. Additionally, we conduct robustness analysis of various neural model architectures. We evaluate the neural network's resistance to adversarial input perturbations, which in essence is replacing the input words so that the model might produce incorrect results or predictions. We compare the differences between various network architectures, including the Transformer network based on the self-attention mechanism, and the commonly employed recurrent neural networks using long short-term memory cells (LSTM). We conduct extensive experiments that include the most common tasks in the field of natural language processing: sentence classification, word segmentation and part-of-speech tagging, sentiment classification, entailment analysis, abstractive document summarization, and machine translation. In the process, we evaluate their effectiveness as compared with other state-of-the-art approaches. We then estimate the robustness of different models against adversarial examples through five attack methods. Most importantly, we propose a series of innovative methods to generate adversarial input perturbations, and devise theoretical analysis from our observations. Finally, we attempt to interpret the differences in robustness between neural network models.