透過您的圖書館登入
IP:18.221.53.5
  • 期刊

An Integrated Approach to Functional Corpus Construction

功能語料庫的一體化構建方法

摘要


本文論述作者基於系統功能語法框架,構建一個全新語料庫的經驗。我們從Penn Treebank語料庫中選取部份文本,通過一個基於網絡且有著多項高級特性的協作性平台對文本進行標註。我們首先討論我們項目的背景和目的,然後提出我們針對協作性標註過程中所遇到的一些問題和挑戰的解決方法。我們初步構建的語料庫有著較為精確的高質量標註,可對現有的基於語義標註的語料庫資源作有益的補充,同時也為進一步開發相關的大型功能語言學資源乃至語言功能自動分析系統的構建打下基礎。

並列摘要


In this paper, we present our recent experience in constructing a first-of-its-kind functional corpus based on the theoretical framework of Systemic Functional Linguistics. Annotated on selected texts from the Penn Treebank, the corpus was built by a collaborative team on a web-based annotation platform with several advanced features. After a discussion on the background and motivation of the project, we present our solutions to some of the challenges encountered in the collaborative annotation process. With fine-grained annotations of an initial corpus now available, the corpus can serve as a valuable linguistic resource that complements existing semantically annotated corpora and aids in the development of a larger-scale resource crucial for automated systems for analysis of linguistic function.

參考文獻


Argamon, S.,Whitelaw, C.,Chase, P.,Hota, S. R.,Garg, N.,Levitan, S.(2007).Stylistic text classification using functional lexical features.Journal of the American Society for Information Science and Technology.58(6),802-822.
Baker, Collin F.,Fillmore, Charles J.,Lowe, John B.(1998).The berkeley framenet project.Proceedings of the 36th Annual Meeting of the Association for Computational Linguistics and 17th International Conference on Computational Linguistics.(Proceedings of the 36th Annual Meeting of the Association for Computational Linguistics and 17th International Conference on Computational Linguistics).:
Bednarek, M.(2009).Corpora and discourse: A three-pronged approach to analyzing linguistic data.Selected Proceedings of the 2008 HCSNet Workshop on Designing the Australian National Corpus.(Selected Proceedings of the 2008 HCSNet Workshop on Designing the Australian National Corpus).:
Bhatia, V. K.(1993).Analysing Genre: Language Use in Professional Settings.London:Longman.
Bird, S.(2009).Natural Language Processing with Python.Beijing:O'Reilly.

延伸閱讀