透過您的圖書館登入
IP:18.117.107.90
  • 期刊
  • OpenAccess

Building a Bracketed Corpus Using Φ^2 Statistics

並列摘要


Research based on treebanks is ongoing for many natural language applications. However, the work involved in building a large-scale treebank is laborious and time-consuming. Thus, speeding up the process of building a treebank has become an important task. This paper proposes two versions of probabilistic chunkers to aid the development of a bracketed corpus. The basic version partitions part-of-speech sequences into chunk sequences, which form a partially bracketed corpus. Applying the chunking action recursively, the recursive version generates a fully bracketed corpus. Rather than using a treebank as a training corpus, a corpus, which is tagged with part-of-speech information only, is used. The experimental results show that the probabilistic chunker has a correct rate of more than 94% in producing a partially bracketed corpus and also gives very encouraging results in generating a fully bracketed corpus. These two versions of chunkers are simple but effective and can also be applied to many natural language applications.

參考文獻


Atwell, E.(1994)。Proceedings of the Balancing Act-Combining Symbolic and Statistical Approaches to Language
Black, E.(1991).Proceedings of DARPA Speech and Natural Language Workshop.
Bod, R.(1993).Proceedings of 6th European Chapter of ACL.
Brill, E.(1992).Proceedings of Applied Natural Language Processing.
Brill, E.(1993).Proceedings of 33rd Annual Meeting of ACL.

被引用紀錄


曾羽華(2010)。盲文點字應用於手機文字輸入之創新設計研究〔碩士論文,大同大學〕。華藝線上圖書館。https://www.airitilibrary.com/Article/Detail?DocID=U0081-3001201315110628

延伸閱讀