  • 學位論文


Finding Adversarial Examples for Text Classification: A Reinforcement Learning Approach

指導教授 : 林守德


文本分類問題是自然語言處理中的一類問題,目標是學習一個模型可以去理解句子的語意,進而去分類出不同類別。雖然現今深度類神經網路越來越熱門並且被應用到各式各樣的領域包括自然語言處理,也有許多研究在討論深度模型脆弱的地方。藉由一種人工產生的資料 ── 對抗樣本 (adversarial example),某方面說明了一個機器學習模型的弱點。在這篇論文中我們試圖去改進文本分類問題中的對抗樣本尋找方法。並更進一步的利用找到的對抗樣本於對抗訓練 (adversarialtraining) 中,發現這樣可以增進模型的泛化能力在沒看過的資料上。我們期望這發現可以幫助我們訓練出更強健的模型,特別在資料不多的情況下。


Text classification is a specific task in natural language processing that aims at learning a model to know the meaning of given sentences. While deep neural network is becoming more and more popular and be widely used in many domain including natural language processing nowadays, there are some works discussing the vulnerability of deep models. Adversarial examples, a kind of synthetic data, somehow show the weakness of a machine learning model. In this work we aim to improve the efficiency of finding adversarial examples in text classification tasks. Moreover, we use the adversarial examples to do adversarial training and find that it may improve the generalization on the unseen data. We hope this discovery could help us train a robust enough model in the future if the dataset isn’t large enough.


[1] Fake news challenge. http://www.fakenewschallenge.org, 2017.
[2] D. Bahdanau, K. Cho, and Y. Bengio. Neural machine translation by jointly learning to align and translate. arXiv preprint arXiv:1409.0473, 2014.
[3] A. M. Dai and Q. V. Le. Semi-supervised sequence learning. In Advances in neural information processing systems, pages 3079–3087, 2015.
[4] J. Ebrahimi, A. Rao, D. Lowd, and D. Dou. Hotflip: White-box adversarial examples for nlp. arXiv preprint arXiv:1712.06751, 2017.
[5] H. Guo. Generating text with deep reinforcement learning. arXiv preprint arXiv:1510.09202, 2015.
