透過您的圖書館登入
IP:13.59.61.119
  • 期刊

以時域基周同步疊加法調整韻律之語音的客觀聽覺品質估測法

Objective Quality Assessment of Prosody-Modified Speech for TD-PSOLA

摘要


在本篇論文中,我們針對時域基周同步疊加法提出一個在韻律調整前就可做主觀聽覺失真估測的方法。我們首先分析了不同程度的韻律調整與主觀聽覺分數之間的關係。接下來,我們根據這些關係提出了27種不同的距離量測方法,並比較了這些距離量測個別的聽覺失真預測能力。最後,我們完整的搜索各種可能的距離組合方式,最佳組合為其中四個距離的線性回歸,而此時客觀預測分數與主觀聽覺分數的相關度可達87.6%。我們提出的方法不需合成目標語音就可做客觀聽覺品質估測,因此不管是語音合成系統中即時的合成單元挑選,或是離線的合成語料庫設計,都可應用本方法做主觀聽覺上之最佳化設計。

並列摘要


In this paper, a method for estimating perceptual distortion before performing prosody modification by TD-PSOLA was presented. First, relationships between different degrees of prosodic modification and corresponding subjective scores were investigated. 27 distance measures were then proposed, and performance of each measure was given and compared. Extensive search was finally used to evaluate every possible combination among these measures, and the best correlation between predicted scores and subjective scores was 87.6%, which could be obtained by linear regression of four proposed distance measures. The proposed method did not require synthesizing target speech and therefore could be used to perceptually optimize real-time unit selection as well as off-line corpus design of a TTS system.

延伸閱讀