中文文法之正規近似演算法

Context-free grammar的計算向來需要耗費相當大的時間和空間成本，因此將原本想要計算的context-free grammar轉換成一個運算需求比較小的regular language聽起來會是一個聰明的選擇。在過去所提出的演算法中，以RTN為基底的演算法目前來看是最適合的。此演算法在成本以及效能的考量中，達到一個最佳的平衡。這種演算法還有一種變形，稱之為參數-RTN。參數-RTN會紀錄之前已經走過的路徑，因此，他會花比較多的空間成本，參數為3的效果比2來的好，只是所要付出的空間成本太高了。本文提出一個能有效降低空間需求的演算法，利用這樣的一個演算法，就可以讓參數3甚至是更高的參數變成可以實際應用的。我們主要是利用語言的特性來達到這個目標的，我們可以把中文的每一個詞組之前的詞作分類，例如:S->TN NP，因為TN會被分類成主詞，所以NP可以看成是一個接在主詞之後的句子，當其他的詞也可以被分類成主詞時，若該詞的後面也接一個NP，那這個NP就可以和TN之後的NP做結合如此，當實驗的文法越大時，這樣的結合就會越多，而所能節省的空間自然也就越多。在本文的最後，我們也利用中央研究院所研發的中文解析器(CKIP)來進行我們的實驗，最後分析不同的逼近演算法以及原本的演算法間的差異性，同時也分析這些逼近所需要的空間成本是否符合效益。

關鍵字

中文文法；自動機；正規語言

並列摘要

As parsing context-free grammars is time-consuming, converting the original grammars into approximated regular languages can reduce the complexity. Based on the approaches, the so-called recursive transition network (RTN, which is an -NFA) is the most adequate approximated algorithm. The refinement is parameter RTN, which is more adequate than the original RTN. The suggested parameter is 2. However, the parameter RTN requires large memories, which limits its role in practical applications. In this thesis, we propose a refinement of the parameter RTN which can alleviate the memory requirement effectively. The refined RTN (called group RTN) utilizes the prosperities of Chinese. It classifies the terms into different groups. Based on the different groups of terms, we integrate similar states and transitions of RTN into one. Using this scheme, the memory requirement of the RTN method can be reduced considerably.

並列關鍵字

regular language approximation ； context-free grammars ； Chinese grammar ； automata

參考文獻

[2].T. P. Baker, Extending lookahead for LR parsers, Journal of Computer and System Sciences, 22:243-259, 1981.

[3].M. E. Bermudez and K. M. Schimpf, Practical arbitrary lookahead LR parsing, Journal of Computer and System Sciences, 41:230-250, 1990.

[5].N. Chomsky, A note on phrase structure grammars, Information and Control, 2:393-395, 1959.

[6].N. Chomsky, On certain formal properties of grammars, Information and Control, 2:137-167, 1959.

[8].H. Meng, P. Luk, K. Xu, and F. Weng, GLR parsing with multiple grammars for natural language queries, ACM Trans. Asian Lang. Inf. Process, 1(2): 123-144, 2002.

國際替代計量

中文文法之正規近似演算法

全文下載

主題瀏覽