透過您的圖書館登入
IP:3.12.34.178
  • 期刊
  • OpenAccess

HMM-based Mandarin Singing Voice Synthesis Using Tailored Synthesis Units and Question Sets

並列摘要


Fluency and continuity properties are essential in synthesizing a high quality singing voice. In order to synthesize a smooth and continuous singing voice, the Hidden Markov Model-based synthesis approach is employed in this study to construct a Mandarin singing voice synthesis system. The system is designed to generate Mandarin songs with arbitrary lyrics and melody in a certain pitch range. In this study, a singing voice database is designed and collected, considering the phonetic converge of Mandarin singing voices. Synthesis units and a question set are defined carefully and tailored the meet the minimum requirement for Mandarin singing voice synthesis. In addition, pitch-shift pseudo data extension and vibrato creation are applied to obtain more natural synthesized singing voices.The evaluation results show that the system, based on tailored synthesis units and the question set, can improve the quality and intelligibility of the synthesized singing voice. Using pitch-shift pseudo data and vibrato creation can further improve the quality and naturalness of the synthesized singing voices.

參考文獻


Gu, H.-Y.,Liau, H.-L.(2008).Mandarin Singing Voice Synthesis Using an HNM Based Scheme.International Congress on Image and Signal Processing (CISP).(International Congress on Image and Signal Processing (CISP)).
Hsia, C.-C.,Wu, C.-H.,Wu, J.-Y.(2010).Exploiting Prosody Hierarchy and Dynamic Features for Pitch Modeling and Generation in HMM-Based Speech Synthesis.IEEE Transactions on Audio, Speech, and Language Processing.18(8),1994-2003.
Huang, C.,Shi, Y.,Zhou, J.,Chu, M.,Wang, T.,Chang, E.(2004).Segmental tonal modeling for phone set design in Mandarin LVCSR.Proceedings of ICASSP 04.(Proceedings of ICASSP 04).
Huang, Y.-C.,Wu, C.-H.,Chao, Y.-T.(2013).Personalized Spectral and Prosody Conversion using Frame-Based Codeword Distribution and Adaptive CRF.IEEE Trans. Audio, Speech, and Language Processing.21(1),51-62.
Huang, Y.-C.,Wu, C.-H.,Weng, S.-T.(2012).Hierarchical prosodic pattern selection based on Fujisaki model for natural mandarin speech synthesis.2012 8th International Symposium on Chinese Spoken Language Processing (ISCSLP).(2012 8th International Symposium on Chinese Spoken Language Processing (ISCSLP)).

被引用紀錄


Chang, F. C. (2014). 以正規邏輯方法解決中文文本蘊含辨識問題 [master's thesis, National Taiwan University]. Airiti Library. https://doi.org/10.6342/NTU.2014.01460

延伸閱讀