在本論文中,我們藉由調整音高曲線的方式來提高合成歌聲的自然度。論文的重點在於探討如何產生與實際歌聲相近的音高曲線,以作為合成歌聲的依據,並且提出兩種方式實作:(1)使用支撐向量機(SVM, Support Vector Machine)方法來預測音高曲線; (2)使用我們提出的規則式基礎的音高預測方程式,來模擬十種不同條件下的音高曲線。此外,我們使用基於基週同步為基礎的cross-fading 方法來解決語音接合不連續的問題,且加入了抖音 (Vibrato)和回響音(Reverberation)等特效來美化合成歌聲。最後,經由聽測實驗證實,相較於傳統歌聲合成方法,使用我們提出的規則式音高預測方程式將能使合成音色更自然悅耳。
In this study, a singing voice synthesis system is proposed. We improve the naturalness of the synthetic singing voice via the modification of pitch curves. Our goal is to produce a pitch curve similar to that of actual singing voice. We employ two methods for pitch-curve prediction: In the first method, we use support vector machine (SVM) to train a regression model to predict pitch curves. In the second method, we propose a rule-based approach comprising 10 manually-tuned equations for the pitch curves under different conditions. In the second half of the thesis, we discuss the signal processing techniques that are applied to modify pitch, duration and volume. We further solve the problems of ill-articulated pronunciation and discontinuity in the syllable concatenation by using pitch synchronous based crossing fading approach. Moreover, we also create some euphonious effects, such as vibrato and reverberation. Finally, we assess the performance of the proposed methods via pitch curve observation and a listening test experiment. It is verified that the proposed rule-based approach actually is able to make the synthetic singing voices more natural as compared with other traditional singing voice synthesis approaches.