透過您的圖書館登入
IP:52.14.221.113
  • 期刊
  • OpenAccess

Toward Constructing a Multilingual Speech Corpus for Taiwanese (Min-nan), Hakka, and Mandarin

並列摘要


The Formosa speech database (ForSDat) is a multilingual speech corpus collected at Chang Gung University and sponsored by the National Science Council of Taiwan. It is expected that a multilingual speech corpus will be collected, covering the three most frequently used languages in Taiwan: Taiwanese (Min-nan), Hakka, and Mandarin. This 3-year project has the goal of collecting a phonetically abundant speech corpus of more than 1,800 speakers and hundreds of hours of speech. Recently, the first version of this corpus containing speech of 600 speakers of Taiwanese and Mandarin was finished and is ready to be released. It contains about 49 hours of speech and 247,000 utterances.

參考文獻


Wang, H. C.,Tseng, C. Y.,Seide, F.,Lee, L. S.(2000).International Conference on Spoken Language Processing 2000.
Ang, U.(2002).Taiwan Language Phonetic Alphabet.
Chiung, W. V. T.(2001).Romanization and Language Planning in Taiwan.The Linguistic Association of Korea Journal.9(1),15-43.
CKIP=Chinese Knowledge Information Processing(2003).CKIP.
Cormen, T. H.(2001).Introduction to Algorithm.

被引用紀錄


周哲玄(2012)。台語關鍵詞辨識之實作與比較〔碩士論文,國立清華大學〕。華藝線上圖書館。https://www.airitilibrary.com/Article/Detail?DocID=U0016-2002201315383034
李毓哲(2013)。使用語音評分輔助台語語料的驗證〔碩士論文,國立清華大學〕。華藝線上圖書館。https://www.airitilibrary.com/Article/Detail?DocID=U0016-2511201311364800

延伸閱讀