透過您的圖書館登入
IP:216.73.216.60
  • 學位論文

利用跨語言情緒特徵遷移之台語情緒語音合成

Emotional Taiwanese Speech Synthesis using Cross-Lingual Emotion Feature Transfer

指導教授 : 鄭士康

摘要


文字轉語音(Text-to-speech)是當今人機互動中相當重要的一塊,尤其對於不識字的老人來說,若能透過語音溝通,將可以大大的提升其操控機器的能力。尤其近年來深度學習的快速發展,如Tacotron或是Fastspeech等語音合成模型,已經可以將合成出來的語音逼近人類說話的水準,並有相當多的應用。 本論文旨在開發出帶有情緒語音的臺語合成系統。但對於語音合成系統而言,需要一定數量的高品質語料。而當今臺語並沒有太多適合合成的語料,更不用說是情緒語料。因此,我們試著使用公開的中英語情緒語料庫,利用轉移學習的技術製造出跨語言、多語者、多情緒的合成系統,將其他語言中情緒表現的方式應用在臺語上。

並列摘要


Text-to-speech is a big part of today's human-computer interaction, especially for the illiterate elderly. If they can communicate through voice, it will improve their ability to control the machine. With the rapid development of deep learning, speech synthesis models, such as Tacotron or Fastspeech, can already synthesize speech to the level of human speech, and have many applications. This paper aims to develop a Taiwanese speech synthesis system with emotional speech. A speech synthesis system requires a huge amount of high-quality corpus. However, there are not many Taiwanese corpora suitable for synthesis now, let alone emotional corpus. Therefore, we try to use the public English and Mandarin emotion corpus to create a cross-lingual, multi-speaker, and multi-emotional TTS system using the transfer learning technique to apply emotional expression from other language to Taiwanese.

參考文獻


[1] Yuxuan Wang, RJ SkerryRyan, Daisy Stanton, Yonghui Wu, Ron J Weiss, Navdeep Jaitly, Zongheng Yang, Ying Xiao, Zhifeng Chen, Samy Bengio, et al. Tacotron: Towards endtoend speech synthesis. Proc. Interspeech 2017, pages 4006–4010, 2017.
[2] Jonathan Shen, Ruoming Pang, Ron J Weiss, Mike Schuster, Navdeep Jaitly, Zongheng Yang, Zhifeng Chen, Yu Zhang, Yuxuan Wang, Rj SkerrvRyan, et al. Natural tts synthesis by conditioning wavenet on mel spectrogram predictions. In 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pages 4779–4783. IEEE, 2018.
[3] RJ SkerryRyan, Eric Battenberg, Ying Xiao, Yuxuan Wang, Daisy Stanton, Joel Shor, Ron Weiss, Rob Clark, and Rif A Saurous. Towards endtoend prosody transfer for expressive speech synthesis with tacotron. In international conference on machine learning, pages 4693–4702. PMLR, 2018.
[4] James A Russell. A circumplex model of affect. Journal of personality and social psychology, 39(6):1161, 1980.
[5] IThuan Khoki Iuhan Kongsi. Suísiann dataset. https://suisiann-dataset.ithuan.tw/, 2019.

延伸閱讀