使用取樣點式聲學參數之音素分段

精確的自動語音分段，應用於許多語音辨識系統或是語音合成的研究被認為是有助於提升系統效能的資訊，但是擁有龐大數量的語料庫經由人工準確的標記是相當費時費力，因此本研究以獲得一個精確的音素端點偵測以及自動語音分段系統為目標，以期提升語音辨識或是合成系統的效能。本論文提出數個取樣點式聲學參數如各頻段信號波封、聲學參數之上升率、頻譜熵以及頻譜KL距離，以描述語音信號中各種不同音素之語音特性，加入音素端點偵測以及自動語音分段的系統架構中，再分別針對音素端點以及自動語音分段所選用的基本語音單位訂定目標函數，接著使用前饋式類神經網路多層感知器以半監督式之模型訓練方法來建立音素端點偵測器之模型。最後對於不同語料庫的語句來進行音素端點偵測的實驗與自動語音分段的效能分析。

關鍵字

取樣點式聲學參數；音素分段；音素端點偵測

並列摘要

Automatic speech segmentation with high precision and accuracy is considered worthwhile in some speech recognition and speech synthesis researches. Manual labeling is the most precise way, but a huge database with manual labeling and segmentation are very time-consuming process. In order to promote the performances of speech recognition/synthesis system, sample-based phone boundary detection and segmentation algorithms are proposed in this paper. Some sample-based acoustic parameters are first extracted in the proposed method for modeling acoustic features in the spectral of speech signal, including six sub-band signal envelopes, rate of rise, sample-based KL distance and spectral entropy. Then, the sample-based KL distance is used for boundary candidates pre-selection and a target fuction labeling that specified the state-transistions between different classes which are pre-defined based on the transcription level. Last, a semi-supervised neural network is employed for final phone boundary detection and automatic speech segmentation. Finally, experimental results and analyses for phoneme detection and automatic speech segmentation are disussed with different corpus.

並列關鍵字

Sample-based Acoustic Parameters ； Phone Segmentation ； Phone Boundary Detection

參考文獻

【3】 Jen-Wei Kuo and Hsin-min Wang, “Minimum Boundary Error Training for Automatic Phonetic Segmentation,” The Ninth International Conference on Spoken Language Processing (Interspeech 2006 - ICSLP), September 2006.

【5】 K.-S. Lee, “MLP-based phone boundary refining for a TTS database,” IEEE Transactions on Audio, Speech and Language Processing, vol. 14, no. 3, pp. 981–989, 2006.

【7】 Almpanidis, G., Kotti, M., Kotropoulos, and C., “Robust Detection of Phone Boundaries Using Model Selection Criteria with Few Observations,” IEEE Transactions on Audio, Speech, and Language Processing, vol.17, no.2, pp.287-298, Feb. 2009.

【10】 Sharlene A. Liu, “ Landmark detection for distinctive feature-based speech recognition,” J. Acoust. Soc. Am. 100 (5), November 1996, pp. 3417-3430.

【11】 H. Misra, S. Ikbal, H. Bourlard, and H. Hermansky, “Spectral entropy based feature for robust ASR,” in Proc. ICASSP 2004, pp. 193–196.

被引用紀錄

戴珮瑜（2009）。國民中學教師工作價值觀與專業承諾關係之研究－以新竹地區為例〔碩士論文，中原大學〕。華藝線上圖書館。https://doi.org/10.6840/cycu200900064

國際替代計量

使用取樣點式聲學參數之音素分段

全文下載

主題瀏覽