透過您的圖書館登入
IP:18.227.228.95
  • 期刊
  • OpenAccess

Transliteration Extraction from Classical Chinese Buddhist Literature Using Conditional Random Fields with Language Models

並列摘要


Extracting plausible transliterations from historical literature is a key issue in historical linguistics and other research fields. In Chinese historical literature, the characters used to transliterate the same loanword may vary because of different translation eras or different Chinese language preferences among translators. To assist historical linguists and digital humanities researchers, this paper proposes a transliteration extraction method based on the conditional random field method with features based on the language models and the characteristics of the Chinese characters used in transliterations. To evaluate our method, we compiled an evaluation set from two Buddhist texts, the Samyuktagama and the Lotus Sutra. We also constructed a baseline approach with a suffix array based extraction method and phonetic similarity measurement. Our method significantly outperforms the baseline approach, and the method achieves recall of 0.9561 and precision of 0.9444. The results show our method is very effective for extracting transliterations in classical Chinese texts.

參考文獻


Goldberg, Y.,Elhadad, M.(2008).Identification of transliterated foreign words in hebrew script.Computational Linguistics and Intelligent Text Processing.(Computational Linguistics and Intelligent Text Processing).
Kuo, J.-S.,Li, H.,Yang, Y.-K.(2007).A phonetic similarity model for automatic extraction of transliteration pairs.ACM Trans. Asian Language Information Processing.6(2)
Lafferty, J.,McCallum, A.,Pereira, F.(2011).Conditional random fields: Probabilistic models for segmenting and labeling sequence data.Proceedings of the 29th Internation Conference on Machine Learning (ICML).(Proceedings of the 29th Internation Conference on Machine Learning (ICML)).
Manning, C. D.,Raghavan, P.,Schütze, H.(2008).Introduction to information retrieval.Cambridge:Cambridge University Press.
Manzini, G.,Ferragina, P.(2004).Engineering a lightweight suffix array construction algorithm.Algorithmica.40(1),33-50.

被引用紀錄


李啟維(2017)。基於隱藏式馬可夫模型的中文改錯〔碩士論文,國立臺灣大學〕。華藝線上圖書館。https://doi.org/10.6342/NTU201701112

延伸閱讀