透過您的圖書館登入
IP:3.15.5.183
  • 期刊
  • OpenAccess

An Overview of Corpus-Based Statistics-Oriented (CBSO) Techniques for Natural Language Processing

並列摘要


A Corpus-Based Statistics-Oriented (CBSO) methodology, which is an attempt to avoid the drawbacks of traditional rule-based approaches and purely statistical approaches, is introduced in this paper. Rule-based approaches, with rules induced by human experts, had been the dominant paradigm in the natural language processing community. Such approaches, however, suffer from serious difficulties in knowledge acquisition in terms of cost and consistency. Therefore, it is very difficult for such systems to be scaled-up. Statistical methods, with the capability of automatically acquiring knowledge from corpora, are becoming more and more popular, in part, to amend the shortcomings of rule-based approaches. However, most simple statistical models, which adopt almost nothing from existing linguistic knowledge, often result in a large parameter space and, thus, require an unaffordably large training corpus for even well-justified linguistic phenomena. The corpus-based statistics-oriented (CBSO) approach is a compromise between the two extremes of the spectrum for knowledge acquisition. CBSO approach emphasizes use of well-justified linguistic knowledge in developing the underlying language model and application of statistical optimization techniques on top of high level constructs, such as annotated syntax trees, rather than on surface strings, so that only a training corpus of reasonable size is needed for training and long distance dependency between constituents could be handled. In this paper, corpus-based statistics-oriented techniques are reviewed. General techniques applicable to CBSO approaches are introduced. In particular, we shall address the following important issues: (1) general tasks in developing an NLP system; (2) why CBSO is the preferred choice among different strategies; (3) how to achieve good performance systematically using a CBSO approach, and (4) frequently used CBSO techniques. Several examples are also reviewed.

參考文獻


Amari, S.(1967).A theory of adaptive pattern classifiers.IEEE Trans. on Electronic Computers.EC-16,299-307.
Breiman, L.(1984).Classification And Regression Trees.
Briscoe, Ted,Carroll, J.(1993).Generalized Probabilistic LR Parsing of Natural Language (Corpora) with Unification-based Grammar.Computational Linguistics.19(1),25-59.
Brown, P. F.(1991).Proceedings of 29th Annual Meeting of the Association for Computational Linguistics.
Brown, P. F.,Lai, J. C.,Mercer, R. L.(1991).Proceedings of 29th Annual Meeting of the Association for Computational Linguistics.

延伸閱讀