利用增強式學習法來學習漢語片語結構的剖析

在自然語言學習的領域當中，如何正確地剖析一個句子一直是個非常具有挑戰性的問題。傳統的監督式剖析器學習法通常對於正確的訓練資料做了非常強的語法和規則上的假設。而在如此的假設之下，要取得大量的正確訓練語料對於標定正確答案的訓練者來說是一大負擔。因此語言剖析器的習得也變得相當困難。增強式學習是一種強而有力的學習方法，因為訓練者只需要在正確的動作序列中給定正的報酬值即可。相較於傳統的監督式學習，這種的獎勵給予方式顯然較為輕鬆。但在自然語言學習的方法中，這種獎勵給予的特色卻鮮少被應用到訓練當中。在此篇論文當中，我們證實了如果採用適當的資料結構，那麼增強式學習會是一個相當合適的學習方法。在這篇論文當中，利用增強式學習來學習剖析器的有效性與強大性也是研究的重點。我們特別強調增強式學習中的獎勵給予方法，並且探討在不同的獎勵給予方法設定之下，習得剖析器的效果比較。在此篇論文中，我們提出了兩種不同的獎勵給予方式並且藉由學習中文句子語法結構的實驗來比較他們的優缺點。第一種我們稱之為「中途獎勵給予」；第二種則為「延遲部份獎勵給予」。對於「中途獎勵給予」法來說，當剖析器做完某個動作而到達某個狀態下(這個狀態是在實行正確動作時會經過的)，那麼環境將會給予剖析器獎勵。在其他的狀態下，剖析器則會被懲罰。對於「延遲部份獎勵給予」來說，只有當剖析器完成了正確的部份剖析(或者說一個片語)時，環境才會給予剖析器正向的獎勵。除了一些違反常理的狀態之外，在其他的狀態下，剖析器將不會受到懲罰。針對此兩種獎勵給予方法的表現來做比較，可以看見在F-分數上，「中途獎勵給予」比「延遲部份獎勵給予」有較突出的表現。但就覆蓋率來說，「延遲部份獎勵給予」比「中途獎勵給予」表現為佳。我們將會在實驗那一個章節做詳細的探討。

關鍵字

自然語言；監督式學習；增強式學習；獎勵給予

並列摘要

Learning how to parse a sentence has been a challenging problem in natural language acquisition. Traditional supervised parser learning methods normally had given very strong assumptions on the preparation of the correct parsed training data that makes the acquisition of a large set of well trained corpora a big burden on trainers and thus often makes the parser-learning problem becomes infeasible.Reinforcement learning (RL) is a very powerful learning technique in that only rewards are needed to give in a successful sequence of actions and thus it requires less requirements on the trainers than traditional supervised learning methods. This feature is less addressed in traditional Natural language parser learning methods. In this thesis we show that it is suitable to apply RL if we adopt the proper data structures. The effectiveness and robustness on learning a parser using RL are also the research foci in this thesis. In particular, we emphasize on the strategies of reward giving schema in RL and discuss their corresponding performances on the trained parsers given different rewarding schema. In this dissertation, we proposed two kinds of rewarding schema and compared their advantages and disadvantages by experiments on the learning of Chinese sentence phrase structures. The first one is called intermediate-route rewarding (IRR), and the second one is called delayed partial rewarding (DPR). IRR schema gives the parser reward when it achieves the state that will be traversed if correct actions are conducted. And the parser will be punished when arriving other states. DPR gives the parser rewards only when it finishes a correct sub-parse (or a phrase). Under other states, the parser will not be rewarded or punished. Comparing the performance of these two rewarding schemata, IRR rewarding schema outperforms the DPR schema in F-score. For the coverage, DPR performs better. We will make the discussion in detail in Chapter 4.

並列關鍵字

無資料

參考文獻

for Computational Linguistics, pages 19–26, Morristown, NJ, USA, 2003

[3] Tung-Bo Chen and Von-Wun Soo. Training Recurrent Neural Networks to Learn Lexical Encoding and Thematic Role Assignment in Parsing Mandarin Chinese Sentences. PhD thesis, National Tsing Hua University, Hsinchu, Taiwan, 1997.

[8] Yu-Ming Hsieh, Duen-Chi Yang, and Keh-Jiann Chen. ”Improve parsing performance by self-learning” International Journal of Computational Linguistics and Chinese Language Processing, 2007.

[9] T. F. Kalt. Control Models of Natural Language Parsing. PhD thesis, University of Massachusetts Amherst, Amherst, MA, 2005.

[12] R. S. Sutton and A. G. Barto. Reinforcement Learning I: Introduction, 1998.

國際替代計量

利用增強式學習法來學習漢語片語結構的剖析

主題瀏覽