一種適用於長音訊串流的語音切割技術

長語音串流切割對目前日俱增多的多媒體語料庫處理有很大的重要性。維特比演算法在處理短語音串流切割時可得到不錯的準確度及效率，因此，使用維特比演算法結合部份文句校準的方式處理長語音串流切割是一種直覺的做法。本文提出一種可靠路徑估測演算法，估測存在於部份搜尋空間裡的可靠路徑，防止最佳路徑可能超出部分搜尋空間的情況。確保部分搜尋空間的聯集完全含蓋最佳路徑，使維特比演算法在長語音串流切割能獲得較高的執行效率。實驗顯示該演算法配合部分語音文句校準，不但可適用在無背景噪音的長語音串流切割，在高SNR背景音樂的情況下也能獲得不錯的結果。最後，我們利用此方法為核心，建立一個可語音切割之軟體以自動處理數位學習素材。

關鍵字

語音切割；最佳路徑；可靠路徑

並列摘要

Long audio stream segmentation is getting more attention since the amount of multimedia databases are continually increasing. Using Viterbi algorithm in short audio stream segmentation gives good performance and accuracy. Therefore, it is intuitive to combine Viterbi algorithm with partial text calibration method in long audio stream segmentation. In this paper, we propose a partial path decision method to evaluate a reliable path in partial searching space to prevent the best path from exceeding the partial searching space. This method ensures that the best path is in the union of partial searching space, and can improve the performance of Viterbi algorithm in long audio stream segmentation. Experiment results show that this method not only can be used in long audio segmentation without background noise, but also gives good segmentation results in high SNR situation. Finally, we build up a text-alignment software based on this method to automatically segment the digital audio data in digital learning content.

並列關鍵字

Text Alignment ； Best Path ； Reliable Path

被引用紀錄

Cheng, S. S. (2009). 機率式模型分群法之研究與其應用 [doctoral dissertation, National Chiao Tung University]. Airiti Library. https://doi.org/10.6842/NCTU.2009.00159

國際替代計量

一種適用於長音訊串流的語音切割技術

全文下載

主題瀏覽