透過您的圖書館登入
IP:3.145.171.58
  • 期刊
  • OpenAccess

An Approach to Using the Web as a Live Corpus for Spoken Transliteration Name Access

並列摘要


Recognizing transliteration names is challenging due to their flexible formulation and lexical coverage. In our approach, we employ the Web as a giant corpus. The patterns extracted from the Web are used as a live dictionary to correct speech recognition errors. The plausible character strings recognized by an Automated Speech Recognition (ASR) system are regarded as query terms and submitted to Google. The top N snippets are entered into PAT trees. The terms of the highest scores are selected. Our experiments show that the ASR model with a recovery mechanism can achieve 21.54% performance improvement compared with the ASR only model on the character level. The recall rate is improved from 0.20 to 0.42, and the MRR from 0.07 to 0.31. For collecting transliteration names, we propose a named entity (NE) ontology generation engine, called the XNE-Tree engine, which produces relational named entities by a given seed. The engine incrementally extracts high co-occurring named entities with the seed. A total of 7,642 named entities in the ontology were initiated by 100 seeds. When the bi-character language model is combined with the NE ontology, the ASR recall rate and MRR are improved to 0.48 and 0.38, respectively.

並列關鍵字

無資料

參考文獻


Appelt, D. E.,D. Martin(1999).Named Entity Extraction from Speech: Approach and Results Using the TextPro System.Proceedings of DARPA Broadcast News Workshop.51-54.
Bekkerman, R.,A. McCallum(2005).Disambiguating Web Appearances of People in a Social Network.Proceedings of WWW.463-470.
Chen, H. H.(2003).Spoken Cross-Language Access to Image Collection via Captions.Proceedings of 8th Eurospeech.2749-2752.
Chen, H. H.,C. H. Yang,Y. Lin(2003).Learning Formulation and Transformation Rules for Multilingual Named Entities.Proceedings of the Association for Computational Linguistics on Multilingual and Mixed-language Named Entity Recognition.1-8.
Chen, H. H.,Y. W. Ding,S. C. Tsai(1998).Named Entity Extraction for Information Retrieval.Computer Processing of Oriental Languages, Special Issue on Information Retrieval on Oriental Languages.75-85.

被引用紀錄


He, Y. J. (2016). 兩岸申請專利範圍之語內翻譯:基於語料庫之搭配及詞彙比較 [master's thesis, National Taiwan University]. Airiti Library. https://doi.org/10.6342/NTU201600621
Lin, M. S. (2009). 以網際網路語料為基礎之相關性量測研究及其在社群偵測與查詢詞推薦之應用 [doctoral dissertation, National Taiwan University]. Airiti Library. https://doi.org/10.6342/NTU.2009.00845

延伸閱讀