中文斷詞與注音

中文斷詞在中文自然語言的處理上，是相當基礎且重要的工作。新近發展的基於詞位標籤的特製化隱藏式馬可夫模型（Specialized Hidden Markov Model）斷詞法，理論與實作合理簡單，效果優於傳統的長詞優先法（Maximum Matching Algorithm, MM）。本論文的研究目的是要利用詞位標籤斷詞法來提高中文轉注音的正確率，也就是在斷詞之後，使用詞串轉注音會比字串轉注音的正確率高。第一階段，使用各種斷詞法斷詞；第二階段，再使用中文斷詞後的詞串轉換為注音。實驗發現，其結果比單字轉注音的正確率高。而第三階段，利用第二階段M-HMM斷詞轉注音的結果，再尋求某些特定的注音轉換規則，提升注音的正確率，再以第二階段詞串轉注音的正確率為比較基礎，實驗結果也證實了確實可再提升注音的正確率。

關鍵字

斷詞；注音； HASH(0x1cd6e3c0)

並列摘要

Chinese word segmentation is an important and fundamental task. A recent advance in Chinese word segmentation is using a specialized Hidden Markov Model, called M-HMM, based on BIES, labels of the position of a constituent character in a word. The main purpose of this thesis is to see if the M-HMM will improve the pronunciation annotation. Firstly, a character sequence (sentence without word boundary mark－space) is segmented into word sequence, and secondly, the words are transformed into pronunciation annotation. Our experiment shows that M-HMM does help. As a third stage, we apply some transformation rules to further improve the correctness of the pronunciation annotation.

並列關鍵字

Segmentation ； Pronunciation ； Annotation

參考文獻

[10] 林千翔，張嘉惠，“基於特製隱藏式馬可夫模型之中文斷詞研究”，國立中央大學資訊工程學系碩士論文，民95年。

[6] Rabiner, L. R. (1989). “A Tutorial on Hidden Markov Models and Selected Ap-plications in Speech Recognition,” Proceedings of the IEEE, Vol.77, No.2, pp.257-286, 1989

[1] Chen K. J. And S. H. Liu, (1992). “Word Identification for Mandarin Chinese Sen-tences,” Proceeding of COLING-92, 14th Int. Conf. On Computational Linguistics, pp. 101-107, 1992.

Google Scholar

[2] Fan, C. K. and W. H. Tsai, (1988). “Automatic Word Identification in Chinese Sen-tences by the Relaxation Technique,” Computer Processing of Chinese and Oriental Languages, Vol. 2, No. 4, pp. 33-56, 1988.

Google Scholar

[3] Kim, J. D., S. Z. Lee and H. C. Rim. (1999). “HMM Specialization with Selec-tive Lexicalization.” In Proceedings of the join SIGDAT Conference on Empirical Methods in Natural Language Processing and Very Large Corpora (EMNLP-VLC-99), pp. 121-127, 1999

Google Scholar

被引用紀錄

張問賢（2008）。以音斷詞與注音轉漢字〔碩士論文，國立清華大學〕。華藝線上圖書館。https://doi.org/10.6843/NTHU.2008.00585

羅郁仁（2011）。中文專利指標及文字探勘之研究〔碩士論文，國立臺北科技大學〕。華藝線上圖書館。https://doi.org/10.6841/NTUT.2011.00684

國際替代計量

中文斷詞與注音

全文下載

主題瀏覽