文字轉語音(Text-to-speech)是當今人機互動中相當重要的一塊,尤其對於不識字的老人來說,若能透過語音溝通,將可以大大的提升其操控機器的能力。尤其近年來深度學習的快速發展,如Tacotron或是Fastspeech等語音合成模型,已經可以將合成出來的語音逼近人類說話的水準,並有相當多的應用。 本論文旨在開發出帶有情緒語音的臺語合成系統。但對於語音合成系統而言,需要一定數量的高品質語料。而當今臺語並沒有太多適合合成的語料,更不用說是情緒語料。因此,我們試著使用公開的中英語情緒語料庫,利用轉移學習的技術製造出跨語言、多語者、多情緒的合成系統,將其他語言中情緒表現的方式應用在臺語上。
Text-to-speech is a big part of today's human-computer interaction, especially for the illiterate elderly. If they can communicate through voice, it will improve their ability to control the machine. With the rapid development of deep learning, speech synthesis models, such as Tacotron or Fastspeech, can already synthesize speech to the level of human speech, and have many applications. This paper aims to develop a Taiwanese speech synthesis system with emotional speech. A speech synthesis system requires a huge amount of high-quality corpus. However, there are not many Taiwanese corpora suitable for synthesis now, let alone emotional corpus. Therefore, we try to use the public English and Mandarin emotion corpus to create a cross-lingual, multi-speaker, and multi-emotional TTS system using the transfer learning technique to apply emotional expression from other language to Taiwanese.