跨語料庫之邊界模型對自動化切割的改善

本研究提出一個二階段的自動化切割方法，用現有的database分別去訓練傳統的GMM-HMM聲學模型和GMM-based邊界模型對一個全新的目標語料庫做自動化的音節切割處理。在第一個階段先使用GMM-HMM做強迫切割取得基本的音節層級之切割資訊，而後於第二階段利用邊界模型去針對前者在局部的範圍內做邊界位置的事後調整。在邊界模型的部分我們會從目標語料庫中選出少量的語句來做語者調適，讓模型參數的統計特性與測試語料一致，增強其做自動化切割修正的效能。實驗中我們以交大OCW所開設的課程語音作為自動化切割處理的測試語料，使用TCC300語料庫來訓練GMM-HMM基線模型，再使用陶小姐朗讀式快速語料庫及一部份的OCW語料來訓練邊界模型，希望藉此發展出一套能針對新語料庫進行高度自動化處理的音節切割標記系統。

關鍵字

自動化切割；強迫對齊法；邊界模型；跨語料庫；語者調適；課程語料

並列摘要

This thesis proposed a 2-stage automatic segmentation method, using database available to train traditional GMM-HMM acoustics model and GMM-based boundary model, aimed for processing syllable-level segmental boundaries of a new target database automatically. We got the initial syllable-level boundaries information by HMM-based forced alignment at the first stage, and then introduce boundary model to do post-refinement upon each boundary within a local range at second stage. A small number of utterances were treated as adaptation data for speaker adaptive training of boundary model so that the statistics of model parameters can match that of the test data, which would enhance the segmental refinement. In the experiment, lecture videos and captions from National Chiao Tung University Open Course Website (NCTU OCW) were choosen as the source of target database, while TCC300 training set was used for training GMM-HMM baseline model; Fast brodacast read speech database and part of the OCW training set was used for boundary model training, including background and speaker adaptation. By this, we would develop a highly-automatic syllable-level segmental boundary labeling system.

並列關鍵字

automatic segmentation ； forced alignment ； boundary model ； cross-database ； speaker adaptation ； lecture speech

參考文獻

[4]Doroteo Torre Toledano, “Neural network boundary refining for automatic speech segmentation,” IEEE International Conference on Acoustics, Speech and Signal Processing, Vol. 6, 2000.

[5]Ki-Seung Lee, “MLP-based phone boundary refining for a TTS database,” IEEE Transactions on Audio, Speech, and Language Processing, pp.981-989, 2006.

[6]Ashvin Kannan, Mari Ostendorf, and Jan Robin Rohlicek, “Maximum likelihood clustering of Gaussians for speech recognition,” IEEE Transactions on Speech and Audio Processing, pp.453-455, 1994.

[7]黃仰駿, 使用韻律信息之中文自發性語音辨認, 交通大學碩士論文, 2014.

[8]Luuan Wang, et al., “Refining segmental boundaries for TTS database using fine contextual-dependent boundary models,” IEEE International Conference on Acoustics, Speech and Signal Processing, Vol. 1, 2004.

國際替代計量

跨語料庫之邊界模型對自動化切割的改善

全文下載

主題瀏覽