Toward A Syntactic Parser for Taiwan Southern Min

本篇研究旨在如何使用句法剖析器(Syntactic parser)自動的替自然語言的句子建構出句法結構,本篇論文所使用的語料來自於國立中正大學臺灣閩南語口語語料庫(CCU Taiwanese Spoken Corpus)。在計算語言學的領域內,探討如何讓電腦程式自動得到自然語言的句法結構是一個很重要的議題,也有很多程式在做相關的句法剖析器,然而,大部分的句法剖析器都著重在英文和中文的部分,鮮少有人在看臺灣閩南語。因此,這個研究主要著重在建立臺灣閩南語句法剖析器之建立上。在這份研究中,我主要使用了3750句的句子。其中,3000句是在觀察部份(observing corpus),在此部分我觀察句子規則,並建立出適當的結構,另外750個句子為實驗部份(experimental corpus),在這裡我用在觀察部份所建立出來的所有規則,來檢測程式的正確率,有多少句子可以得到正確的結構。最後,當句法剖析器成功的得到句子的結構之後,這些結構會以程式可辨識的形式儲存起來。


This thesis demonstrates a method to automatically construct syntactic construction from natural language data in CCU Taiwanese Spoken Corpus. The techniques of automatically construct the syntactic representation from natural language expressions is a major issue in computational linguistics. Although there are large-scale programs working on syntactic parser to automatically build up the syntactic representation of the sentence, most of existing research focus on English and Mandarin Chinese, and little attention is paid to Taiwanese Sothern Min. Hence, this study gives effort to fill this research gap. The syntactic parser is built using a hand-coded rule based strategy. The data used in this study are 3,750 sentences from the corpus. In the observing part, this thesis modifies the parser to make the 3,000 sentences to get the correct construction. The composing part uses 750 to test the accuracy of the parser. The syntactic grammar turn natural language sentences into their syntactic construction and they are collected and stored as a syntactic knowledge base.


