在本論文中,針對以語料庫為主的中文歌聲合成,提出一個有效預測聲韻母音長及音量的方法,主要目標為改善合成歌聲的清晰度與自然度。 首先,我們介紹聲韻母音長預測模組的架構概念。在此,我們依據中文子音類別分別建造聲母/韻母音長預測模組,使用語言學與聲韻學特徵以及樂譜資訊當作輸入參數,並採用支撐向量機 (SVM, Support Vector Machine) 迴歸方法來預測聲母長度。 接著,在音量模型設計方面,我們提出三種調整方法。方法一、給定所有音節相同音量強度。方法二、使用語言學與聲韻學特徵以及樂譜資訊藉由分類與迴歸樹狀圖分析 (CART, Classification and Regression Trees) 來預測音量。方法三、根據不同音高與音長分佈組合,定義規則對音量做調整。最後,我們進行相關實驗與聽測並歸納結論。由實驗結果證實本論文所提出的聲韻母音長預測與音量模型確實能提升合成歌聲的清晰度與自然度。
In this research, we propose several effective methods for initial/final (I/F) duration prediction and energy modeling for corpus-based Mandarin singing voice synthesis (SVS). Our goal is to improve the clarity and naturalness of the synthesized singing voices. Firstly, the framework of the I/F duration prediction model is presented. We construct an individual I/F duration prediction model for each category of consonants. Both linguistic/phonetic attributes and music-score information are used as the input features. The support vector machine (SVM) is employed to train each I/F duration prediction model. Secondly, three methods for energy modeling are proposed. In the first method, we use an identical volume to specify the energy of each syllable. In the second method, we adopt the same features used in the I/F duration prediction to predict energy. In the third method, a rule-based approach is designed to modify the energy according to different combinations of pitch and duration. Finally, several experiments and listening tests are conducted to demonstrate the feasibility of the proposed methods. The experimental results indicate that our methods are able to improve both the clarity and naturalness of the synthesized singing voices.