Title

Automatic Segmentation and Labeling for Mandarin Chinese Speech Corpora for Concatenation-based TTS

DOI

10.30019/IJCLCLP.200507.0001

Authors

Cheng-Yuan Lin;Jyh-Shing Roger Jang;Kuan-Ting Chen

Key Words

speech assessment methods phonetic alphabet ; speech corpus ; sequential forward selection ; k-nearest neighbor rule ; leave-one-out ; speaker-adapted model ; context-dependent hidden Markov model HMM

PublicationName

中文計算語言學期刊

Volume or Term/Year and Month of Publication

10卷2期(2005 / 07 / 01)

Page #

145 - 166

Content Language

英文

English Abstract

Precise phone/syllable boundary labeling of the utterances in a speech corpus plays an important role in constructing a corpus-based TTS (text-to-speech) system. However, automatic labeling based on Viterbi forced alignment does not always produce satisfactory results. Moreover, a suitable labeling method for one language does not necessarily produce desirable results for another language. Hence in this paper, we propose a new procedure for refining the boundaries of utterances in a Mandarin speech corpus. This procedure employs different sets of acoustic features for four different phonetic categories. In addition, a new scheme is proposed to deal with the ”periodic voiced + periodic voiced” case, which produced most of the segmentation errors in our experiment. Several experiments were conducted to demonstrate the feasibility of the proposed approach.

Topic Category 人文學 > 圖書資訊學
基礎與應用科學 > 資訊科學
工程學 > 電機工程
Reference
  1. Bonafonte, A.,A. Nogueiras,A. Rodriguez-Garrido(1996).Proceedings of International Conference on Spoken Language Processing.
  2. Chen, K. J.,S. H. Liu(1992).Proceedings of the Fifteenth International Conference on Computational Linguistics.
  3. Chou, F.-C.,C.-Y. Tseng,L.-S. Lee(1998).Proceedings of International Conference on Spoken Language Processing.
  4. Chou, F.-C.,C.-Y. Tseng,L.-S. Lee(2002).A Set of Corpus-based Text-to-speech Synthesis Technologies for Mandarin Chinese.IEEE Transactions on Speech and Audio Processing,10(7),481-494.
  5. Cosi, P.,D. Falavigna,M. Omologo(1991).A Preliminary Statistical Evaluation of Manual and Automatic Segmentation Discrepancies.Proceedings of European Conference on Speech Communication and Technology,693-696.
  6. Demuynck, K.,T. Laureys(2002).Proceedings of International Conference on Text, Speech and Dialogue.
  7. Duda, R. D.,P. E. Hart,D. G. Stork(2001).Pattern Classification, 2nd ed..New York:Wiley.
  8. Huang, X.,A. Acero,H. W. Hon(2001).Spoken language processing.New Jersey:Prentice Hall.
  9. Lamel, L. F.,J. L. Gauvain(1993).Proceedings of European Conference on Speech Communication and Technology.
  10. Lee, L.-S.(1997).Voice Dictation of Mandarin Chinese.IEEE Signal Processing Magazine,10(4),63-101.
  11. Ljolje, A.,J. Hirschberg,J. P. H. van Santen(1994).Proceedings of ESCA/IEEE Workshop on speech synthesis.
  12. Ljolje, A.,M. D. Riley(1993).Proceedings of European Conference on Speech Communication and Technology.
  13. Lu, H.-M.(2002).An implementation and Analysis of Mandarin Speech Synthesis Technologies.
  14. Makashay, M. J.,C. W. Wightman,A. K. Syrdal,A. Conkie(2000).Proceedings of International Conference on Spoken Language Processing.
  15. Odell, J.,D. Ollason,P. Woodland,S. Young,J. Jansen(1995).The HTK Book for HTK V2.0.Cambridge UK:Cambridge University Press.
  16. Sethy, A.,S. Narayanan(2002).Proceedings of International Conference on Spoken Language Processing.
  17. Shen, J.-L.,J.-W. Hung,L.-S. Lee(1998).Proceedings of International Conference on Spoken Language Processing.
  18. Sproat, R.,C. Shih(1990).Computer Processing of Chinese and Oriental Languages.
  19. Torre Toledano, D.,M. A. Rodrguez Crespo,J. G. EscaladaSardina(1998).Proceedings of Third ESCA/COCOSDA Workshop on speech synthesis.
  20. Van Erp, A.,L. Boves(1988).Proceedings of Speech.
  21. van Santen, J. P. H.,R. Sproat(1990).Proceedings of European Conference on Speech Communication and Technology.
  22. Wang, H. C.,R. L. Chiou,S. K. Chuang,Y. F. Huang(1999).A phonetic labeling method for MAT database processing.Journal of the Chinese Institute of Engineers,22(5),529-534.
  23. Whitney, A.(1971).A direct method of nonparametric measurement selection.IEEE Transactions on Computers,20(9),1100-1103.
  24. Yeh, C. L.,H. J. Lee(1991).Computer Processing of Chinese and Oriental Languages.
Times Cited
  1. 陳雅婷(2012)。使用 擴展修剪演算法 決定語音音週標記 及 在台語語音合成的應用。清華大學統計學研究所學位論文。2012。1-40。 
  2. 江克敬(2008)。華語韻律轉換之研究與實作。清華大學資訊工程學系所學位論文。2008。1-33。
  3. 江蕙如(2009)。華語韻律移植的改良。清華大學資訊工程學系所學位論文。2009。1-38。