  • 學位論文

台語 前音節 後音節 的 分割及合成

The Segmentation and Synthesis of Front Syllables and Back Syllables in Taiwanese

指導教授 : 江永進


在語音合成時, 將 前音後 與 後音前 相同的母音 做疊合 的方法, 稱為 同母音疊合(same vowel overlap and add, SVOLA), 使用 同母音疊合 的方式, 就可以 利用 基礎音節 去合成出 所有的聲調, 但 同母音疊合時, 會遇到 前音節的 前音前與前音後 該切在何處的問題。 本論文的 目標 就是去尋找 前音前與前音後 的切割點, 為了做 音節的分段,我們 使用四種 不同的 音節分段的方法:獨立音框的 最大概似 標籤、不用 訓練資料的 最大 分段總概似 的 切割法、使用 訓練資料的 最大 分段總概似 的 切割法 或 VDS 數列切割法 去對 音節做分段, 其中 VDS 數列切割法 可以 很有效的 切割 前音前 與 前音後 。


Given two simple syllables with a common vowel in them, we find that a new quite intelligible syllable can be synthesized by splitting the two syllables at this common vowel, and then concatenate the first part of the first syllable and the second part of second syllable. We call these two simple syllables front syllable and back syllable respectively, and call this method SVOLA, or “same vowel overlap and add” syllable synthesis. With a somewhat minimal set of front and back syllables as basis, a Taiwanese syllable synthesis system can be built easily using SVOLA. In addition to implement such a synthesis system, this thesis also explores four methods of splitting of the basis syllables. The first is independent frame maximum log-likelihood. The second is maximum segmental total likelihood without training data. The third is maximum segmental total likelihood with training data. For the fourth method we transform the feature vectors into a DS matrix (smoothing and then taking difference), and we split the basis syllable using the VDS sequence, or the sequence of the column variances of the DS matrix. The VDS approach is more effective in splitting the front syllables.


[2] 吳德祥 (2009). 台華語音節 雙拼合成. 清華大學統計學研究所學位論文, 2009 年, 1-42. 新竹:清華大學.
[3] 陳雅婷 (2012). 使用 擴展修剪演算法 決定語音音週標記 及 在台語語音合成 的應用. 清華大學統計學研究所學位論文, (2012 年), 1-40. 新竹:清華大學.
[4] Kumar, N., & Andreou, A. G. (1998). Heteroscedastic discriminant analysis and reduced rank HMMs for improved speech recognition. Speech communication, 26(4), 283-297.
[5] Kortekaas, R. W., & Kohlrausch, A. (1997). Psychoacoustical evaluation of the pitch-synchronous overlap-and-add speech-waveform manipulation technique using single-formant stimuli. The Journal of the Acoustical Society of America, 101, 2202.
[6] Verhelst, W., & Roelands, M. (1993). An overlap-add technique based on waveform similarity (WSOLA) for high quality time-scale modification of speech. In Acoustics, Speech, and Signal Processing, 1993. ICASSP-93., 1993 IEEE International Conference on (Vol. 2, pp. 554-557).
