  • 學位論文


Automatic phone boundary detection using sequential text-independent segmentation

指導教授 : 王小川


本論文之目的係在沒有提供任何已知的資訊下,將能暗示語音頻譜變化速率線索的參數相互的結合,建構一個自動音素分段系統。研究方法提出一個文本不特定(Text-independent)的循序式音素分段法沿著時間一次只找一個候選音素邊界點(Candidate Phoneme Boundary),找到之後即作確認,經確認後才算是偵測到的音素邊界點而進行音素分段。偵測候選音素邊界點時採用改變音框長度之小波特徵參數,接著再針對找到的候選音素邊界點,採用梅爾頻率倒頻譜係數建立單一高斯機率模型後透過貝式資訊修正準則(Bayesian Information Criterion Correct)計算出之分數值(Delta BICC),同時結合以小波特徵參數為輸入之正規化頻譜變異函式做為確認候選音素邊界點的雙重確認條件,進行候選音素邊界點的確認演算。以TIMIT語料進行系統之測試評估,實驗結果顯示:在20ms容忍度下,640句實驗語料中有422句之F估測值超過70%,且此422句之平均F估測值達到76%,640句語料之平均F估測值也有72%;另一種評量分段演算法好壞的R-值,640句語料在±20ms容忍度下的平均R-值有75%的表現。除此之外,本文還計算了分類音素邊界的偵測率(擊中率),結果顯示系統對於塞音(Stops)接母音(Vowels)、母音接塞音、摩擦音(Fricatives)接母音、母音接摩擦音、母音接鼻音(Nasals)、鼻音接母音、塞音接半母音與流音(Semivowels & Glides)的邊界擁有較高的偵測率,惟在鼻音接鼻音、半母音接靜音、鼻音接靜音、鼻音接塞音的邊界偵測效果仍有待改善。


This paper proposes a text-independent sequential phone boundary detection algorithm. Without any previous knowledge, an automatic phone segmentation system can be constructed. The method is to search for a candidate phone boundary and then follow by a verification process. The phone segmentation is accomplished when the phone boundaries are verified. The wavelet parameters are calculated in a frame of variable frame length for searching for the candidate phone boundaries. The Bayesian information criterion corrected (BICC) and normalized spectral variation function (SVF) are applied for verifying the phone boundaries. To evaluate this proposed algorithm, the experiment was conducted on TIMIT corpus. The performance of phone segmentation was measured in F-value. In the condition of 20-ms tolerance, the average F-value of 640 test utterances is 72%. Among them, 422utterances get the F-value larger than 70%.


[11]John R. Deller, Jr., John G. Proakis, John H.L. Hansen,“Discrete-Time Processing of Speech Signals” IEEE Computer Society , 1999.
[2]B. Pellom and J. Hansen, “Automatic segmentation of speech recorded in unknown noisy channel characteristics,” Speech Commun., vol. 25, no. 1–3, pp. 97–116, 1998
[3]L.Wang, Y. Zhao, M. Chu, F. Soong, J. Zhou, and Z. Cao, “Context-dependent boundary model for refining boundaries segmentation of TTS Units,” IEICE Trans. Inf. Syst., vol. E89-D, no. 3, pp. 1082–1091, 2006.
[6]G. Almpanidis,M. Kotti,and C. Kotropoulos,“Robust Detection of Phone Boundaries Using Model Selection Criteria With Few Observations,”IEEE Transactions on Audio,Speech,and Lanquage Processing,vol.17,no.2, pp. 287-298 , Feb. 2009
