Toward Constructing a Multilingual Speech Corpus for Taiwanese (Min-nan), Hakka, and Mandarin

The Formosa speech database (ForSDat) is a multilingual speech corpus collected at Chang Gung University and sponsored by the National Science Council of Taiwan. It is expected that a multilingual speech corpus will be collected, covering the three most frequently used languages in Taiwan: Taiwanese (Min-nan), Hakka, and Mandarin. This 3-year project has the goal of collecting a phonetically abundant speech corpus of more than 1,800 speakers and hundreds of hours of speech. Recently, the first version of this corpus containing speech of 600 speakers of Taiwanese and Mandarin was finished and is ready to be released. It contains about 49 hours of speech and 247,000 utterances.

並列關鍵字

Phonetic Alphabet ； Pronunciation Lexicon ； Phonetically Balanced Word ； Speech Corpus

參考文獻

Wang, H. C.,Tseng, C. Y.,Seide, F.,Lee, L. S.(2000).International Conference on Spoken Language Processing 2000.

Ang, U.(2002).Taiwan Language Phonetic Alphabet.

Google Scholar

Chiung, W. V. T.(2001).Romanization and Language Planning in Taiwan.The Linguistic Association of Korea Journal.9(1),15-43.

Google Scholar

CKIP=Chinese Knowledge Information Processing(2003).CKIP.

Google Scholar

Cormen, T. H.(2001).Introduction to Algorithm.

Google Scholar

被引用紀錄

周哲玄（2012）。台語關鍵詞辨識之實作與比較〔碩士論文，國立清華大學〕。華藝線上圖書館。https://www.airitilibrary.com/Article/Detail?DocID=U0016-2002201315383034

李毓哲（2013）。使用語音評分輔助台語語料的驗證〔碩士論文，國立清華大學〕。華藝線上圖書館。https://www.airitilibrary.com/Article/Detail?DocID=U0016-2511201311364800

國際替代計量

Toward Constructing a Multilingual Speech Corpus for Taiwanese (Min-nan), Hakka, and Mandarin

全文下載

主題瀏覽