透過您的圖書館登入
IP:18.116.90.141
  • 學位論文

以韻律模型為基礎之中文韻律轉換研究

A Study on Model-based Prosody Conversion for Mandarin Chinese

指導教授 : 陳信宏

摘要


本研究提出以韻律模型為基礎的中文韻律轉換方法,其系統架構可分為訓練以及轉換部份。在訓練部份,先以A-PLM演算法分別對來源以及目標語料標示韻律標記並建立韻律模型,接著建立彼此韻律標記上的轉換關係。本論文提出兩種轉換方法,在方法一中以線性轉換的方式預估目標韻律狀態,此方法不需特別用到平行語料;而在方法二中,以MMSE(Minimum Mean Square Error)原則,建立來源與目標韻律標記的轉換關係,它需使用平行語料。在轉換部份,首先以A-PLM演算法標記欲轉換的語句,即可將得到的標記資訊透過轉換函式,預估目標語者的韻律標記;最後,藉由預估得到的目標語者標記資訊以及目標韻律模型還原音節基頻軌跡、音節長度以及音節能量位階,並利用目標語音原始之頻譜參數,以STRAIGHT合成器合成轉換之聲音。實驗結果證實,本論文所提出之方法在中央研究院COSPRO語料庫上轉換效果優於傳統轉換方法。以平行語料為基礎的方法中,方法二之轉換效果在不同轉換組別皆優於以高斯混合模型為基礎之轉換,而以非平行語料為基礎所推導的方法中,方法一則優於高斯正規化轉換。

並列摘要


In this thesis, a novel model-based prosody conversion method for Mandarin speech is presented. In the training phase, the source and target speech datasets are first analyzed by the A-PLM method to label all utterances with prosody tags and to construct their own prosodic models; then, a mapping function is built to relate the prosodic phrase structure of the two speakers. Two schemes of building mapping function are proposed. Scheme 1 builds a linear mapping function to relate the source and target prosodic states. No parallel training datasets are needed. Scheme 2 builds a probabilistic mapping function to relate the source and target prosody tags. A set of parallel data is required to train the mapping function. In the conversion phase, the source utterance is first analyzed by the A-PLM method. The labeled prosody tags are then converted to the target prosody tags by the mapping function. The transformed syllable pitch contour, duration and energy level is lastly generated by the target prosodic model. Experimental results on the Sinica COSPRO corpus confirmed that the proposed method performed very well. The two proposed schemes outperformed the conventional methods of mean/variance transformation and GMM-based mapping conversion, respectively, for the cases without and with using parallel data.

參考文獻


[2] C. C. Hsia, C. H. Wu and J. Q. Wu, “Conversion Function Clustering and Selection Using Linguistic and Spectral Information for Emotional Voice Conversion,” IEEE Trans. Computers, 56(9):1225–1254, 2007.
[5] J. Tao, Y. Kang and A. Li., “Prosody Conversion from Neutral Speech to Emotional Speech,” IEEE Trans. Audio, Speech and Language Processing, Vol. 14, No.4, pp.1145–1154, July 2006.
[6] O. Türk, O. Büyük, A. Haznedaroglu and L. M. Arslan, “Application of Voice Conversion for Cross-Language Rap Singing Transformation,” in Proc. of ICASSP, pp. 3597–3600, Taipei, Taiwan, April 2009.
[7] K. Y. Park and H. S. Kim, “Narrowband to wideband conversion of speech using GMM based transformation,” in Proc. ICASSP, Istanbul, Turkey, Jun. 2000, pp. 1847–1850.
[9] T. Toda, A.W. Black, and K. Tokuda, “Voice Conversion Based on Maximum Likelihood Estimation of Spectral Parameter Trajectory,” IEEE Trans. Audio, Speech and Language Processing, Vol. 15, No. 8, pp. 2222–2235, Nov. 2007.

被引用紀錄


張雅惠(2009)。跨國企業採行在地採購策略優勢之分析–以S手工具公司為例〔碩士論文,淡江大學〕。華藝線上圖書館。https://doi.org/10.6846/TKU.2009.00784
林佳緯(2013)。使用GMM轉換之背景伴奏消除及趨勢估計之歌曲音高軌跡追蹤〔碩士論文,國立交通大學〕。華藝線上圖書館。https://doi.org/10.6842/NCTU.2013.00028
李雪香(2008)。屏東縣萬巒鄉客家伙房民宿可行性評估與發展策略之研究〔碩士論文,國立屏東科技大學〕。華藝線上圖書館。https://doi.org/10.6346/NPUST.2008.00023
曾建煌(2010)。如何提升HTH公司微晶玻璃面板市佔率 - 以中國市場為例〔碩士論文,國立清華大學〕。華藝線上圖書館。https://www.airitilibrary.com/Article/Detail?DocID=U0016-0211201015592527
溫達祥(2015)。以平衡計分卡觀點建構組織策略地圖—以個案公司為例〔碩士論文,國立中央大學〕。華藝線上圖書館。https://www.airitilibrary.com/Article/Detail?DocID=U0031-0412201512064150

延伸閱讀