HMM-based Mandarin Singing Voice Synthesis Using Tailored Synthesis Units and Question Sets

Fluency and continuity properties are essential in synthesizing a high quality singing voice. In order to synthesize a smooth and continuous singing voice, the Hidden Markov Model-based synthesis approach is employed in this study to construct a Mandarin singing voice synthesis system. The system is designed to generate Mandarin songs with arbitrary lyrics and melody in a certain pitch range. In this study, a singing voice database is designed and collected, considering the phonetic converge of Mandarin singing voices. Synthesis units and a question set are defined carefully and tailored the meet the minimum requirement for Mandarin singing voice synthesis. In addition, pitch-shift pseudo data extension and vibrato creation are applied to obtain more natural synthesized singing voices.The evaluation results show that the system, based on tailored synthesis units and the question set, can improve the quality and intelligibility of the synthesized singing voice. Using pitch-shift pseudo data and vibrato creation can further improve the quality and naturalness of the synthesized singing voices.

並列關鍵字

Mandarin Singing Voice Synthesis ； Hidden Markov Models ； Vibrato

參考文獻

Gu, H.-Y.,Liau, H.-L.(2008).Mandarin Singing Voice Synthesis Using an HNM Based Scheme.International Congress on Image and Signal Processing (CISP).(International Congress on Image and Signal Processing (CISP)).

Google Scholar

Hsia, C.-C.,Wu, C.-H.,Wu, J.-Y.(2010).Exploiting Prosody Hierarchy and Dynamic Features for Pitch Modeling and Generation in HMM-Based Speech Synthesis.IEEE Transactions on Audio, Speech, and Language Processing.18(8),1994-2003.

Google Scholar

Huang, C.,Shi, Y.,Zhou, J.,Chu, M.,Wang, T.,Chang, E.(2004).Segmental tonal modeling for phone set design in Mandarin LVCSR.Proceedings of ICASSP 04.(Proceedings of ICASSP 04).

Google Scholar

Huang, Y.-C.,Wu, C.-H.,Chao, Y.-T.(2013).Personalized Spectral and Prosody Conversion using Frame-Based Codeword Distribution and Adaptive CRF.IEEE Trans. Audio, Speech, and Language Processing.21(1),51-62.

Google Scholar

Huang, Y.-C.,Wu, C.-H.,Weng, S.-T.(2012).Hierarchical prosodic pattern selection based on Fujisaki model for natural mandarin speech synthesis.2012 8th International Symposium on Chinese Spoken Language Processing (ISCSLP).(2012 8th International Symposium on Chinese Spoken Language Processing (ISCSLP)).

Google Scholar

被引用紀錄

Chang, F. C. (2014). 以正規邏輯方法解決中文文本蘊含辨識問題 [master's thesis, National Taiwan University]. Airiti Library. https://doi.org/10.6342/NTU.2014.01460

國際替代計量

HMM-based Mandarin Singing Voice Synthesis Using Tailored Synthesis Units and Question Sets

全文下載

主題瀏覽