MATBN: A Mandarin Chinese Broadcast News Corpus

The MATBN Mandarin Chinese broadcast news corpus contains a total of 198 hours of broadcast news from the Public Television Service Foundation (Taiwan) with corresponding transcripts. The primary purpose of this collection is to provide training and testing data for continuous speech recognition evaluation in the broadcast news domain. In this paper, we briefly introduce. the speech corpus and report on some preliminary statistical analysis and speech recognition evaluation results.

並列關鍵字

broadcast news ； corpus ； speech recognition ； Mandarin Chinese ； transcription ； annotation

參考文獻

Chen,B.,J. W. Kuo,W. H. Tsai(2004).Lightly Supervised and Data-driven Approaches to Mandarin Broadcast News Transcription.(Proceedings of the 2004 IEEE International Conference on Acoustics, Speech, and Signal Processing).

Wang,H. C.(1997).MAT - A Project to Collect Mandarin Speech Data thtough Telephone Networks in Taiwan.(Computational Linguistics and Chinese Language Processing).

Barras,C.,E. Geoffrois,Z. B. Wu,M. Liberman(2001).Transcriber: Development and Use of a Tool for Assisting Speech Corpora Production.Speech Communication.33,5-22.

Google Scholar

Federico,M.,D. Giordani,P. Coletti(2000).Development and Evaluation of an Italian Broadcast News Corpus.(Proceedings of the 2nd international Conference on Language Resources and Evaluation).

Google Scholar

Graff,D.(2002).An Overview of Broadcast News Corpora.Speech Communication.37,15-26.

Google Scholar

被引用紀錄

Huang, Y. S. (2010). A Generalized-Ditransitive Analysis for Gei in Mandarin Chinese [master's thesis, National Tsing Hua University]. Airiti Library. https://doi.org/10.6843/NTHU.2010.00559

張瑩如（2007）。「把」「給」的語源與發展過程：以接觸引發的演變初探〔碩士論文，國立清華大學〕。華藝線上圖書館。https://doi.org/10.6843/NTHU.2007.00678

余朗祺（2017）。基於辨識錯誤模型之語音文件抽象標題產生〔碩士論文，國立臺灣大學〕。華藝線上圖書館。https://doi.org/10.6342/NTU201702358

Liu, S. H. (2016). 改善語言模型於中文廣播新聞節錄式摘要 [doctoral dissertation, National Taiwan University]. Airiti Library. https://doi.org/10.6342/NTU201601686

Chen, K. Y. (2015). 統計式語言模型 – 語音文件標記、檢索以及摘要 [doctoral dissertation, National Taiwan University]. Airiti Library. https://doi.org/10.6342/NTU.2015.00784

國際替代計量

MATBN: A Mandarin Chinese Broadcast News Corpus

全文下載

主題瀏覽