透過您的圖書館登入
IP:18.217.194.39
  • 期刊
  • OpenAccess

Reliable and Cost-Effective Pos-Tagging

並列摘要


In order to achieve fast, high quality Part-of-speech (pos) tagging, algorithms should achieve high accuracy and require less manually proofreading. This study aimed to achieve these goals by defining a new criterion of tagging reliability, the estimated final accuracy of the tagging under a fixed amount of proofreading, to be used to judge how cost-effective a tagging algorithm is. In this paper, we also propose a new tagging algorithm, called the context-rule model, to achieve cost-effective tagging. The context rule model utilizes broad context information to improve tagging accuracy In experiments, we compared the tagging accuracy and reliability of the context-rule model, Markov bi-gram model and word-dependent Markov bi-gram model. The result showed that the context-rule model outperformed both Markov models. Comparing the models based on tagging accuracy, the context-rule model reduced the number of errors 20% more than the other two Markov models did. For the best cost-effective tagging algorithm to achieve 99% tagging accuracy, it was estimated that, on average, 20% of the samples of ambiguous words needed to be rechecked. We also compared tradeoff between the amount of proofreading needed and final accuracy for the different algorithms. It turns out that an algorithm with the highest accuracy may not always be the most reliable algorithm.

參考文獻


陳克健 Keh-Jiann, Keh-Jiann(1997).Proceedings of the Natural Language Processing Pacific Rim Symposium.
Brill, E.(1992).Proceedings of Applied Natural Language Processing.
Chen, C. D.,Chang, C. H.(1993).Proceedings of the Workshop on Very Large Corpora: Academic and Industrial Perspectives.
Kveton, P.,Oliva, K.(2002).Proceedings of the 19th International Conference on Computational Linguistics.
Liu, S. H.,Chen, K. J.,Chin, Y. H.,Chang, L. P.(1995).Automatic Part-of-Speech Tagging for Chinese Corpora.Computer Processing of Chinese and Oriental Languages: an international journal of the Chinese Language Computer Society.9(1),31-48.

被引用紀錄


向思蓉(2014)。語音文件摘要與語音問答系統之新技術〔碩士論文,國立臺灣大學〕。華藝線上圖書館。https://doi.org/10.6342/NTU.2014.03041
Chang, G. (2013). 以自然語言處理分析社群網路願望之研究 [master's thesis, National Taiwan University]. Airiti Library. https://doi.org/10.6342/NTU.2013.11038
陳冠宇(2010)。主題模型於語音辨識使用之改進〔碩士論文,國立臺灣師範大學〕。華藝線上圖書館。https://www.airitilibrary.com/Article/Detail?DocID=U0021-1610201315213186
呂國彥(2012)。利用專利文件主題辨識科技趨勢〔碩士論文,國立中央大學〕。華藝線上圖書館。https://www.airitilibrary.com/Article/Detail?DocID=U0031-1903201314444206
Hsieh, Y. M. (2015). 以結構機率重估改進中文句法分析 [doctoral dissertation, National Tsing Hua University]. Airiti Library. https://www.airitilibrary.com/Article/Detail?DocID=U0016-0508201514084771

延伸閱讀