透過您的圖書館登入
IP:18.191.171.178
  • 學位論文

使用加權有限狀態轉換器的基於混合詞與次詞以文字及語音指令偵測口語詞彙

Hybrid Word/Sub-word Based Spoken Term Detection with Text/Spoken Queries Using Weighted Finite State Transducers

指導教授 : 李琳山

摘要


加權有限狀態轉換器由於完備的理論以及高效率的演算,已廣泛地被應用在語音處理相關的研究中,例如大字彙連續語音辨識以及語音資訊檢索。本論文著重在語音資訊檢索中的口述語彙偵測,並使用加權有限狀態轉換器建立基於詞、字、音節、以及混合式的索引。 我們以上述之架構分別在文字查詢問句以及語音查詢問句兩種情境下,進行在中文廣播新聞上的實驗。在文字查詢問句情境下,我們發現僅需20% 的運算時間就可以得到比基準方法更佳的檢索效能。在語音查詢問句情境下,如果僅用單一辨識單位,辭典內查詢詞適合用基於詞的辨識結果,而辭典外查詢詞適合用基於音節的辨識結果。但如果將詞與次詞單位混合使用,則不論是辭典內或辭典外的查詢詞,都可以超越任一個基於單一辨識單位的辨識結果的效能。這證明詞與次詞單位之辨識結果在檢索程序中的明顯加成性。另一方面,實驗也證明,加權有限狀態轉換器的架構在即時時間倍率和時間複雜度方面也獲得大幅進步。

並列摘要


With well developed theory and high efficiency in computation, weighted finite state transducers have been widely used in various tasks in speech signal processing, including large vocabulary continuous speech recognition and speech information retrieval. In this thesis, we focus on spoken term detection which is a sub-task of speech information retrieval, and use weighted finite state transducers to construct word-based, character based, syllable-based, as well as hybrid indices. We evaluated this framework with a Chinese broadcast news corpus in two scenarios, text queries and spoken queries. For text queries, we achieved better performance as compared to the baseline with only 20% of computation. For spoken queries, if only the recognition results for a single unit were used, the word-based index was better for in vocabulary (IV) queries while the syllable-based index was better for out-of-vocabulary (OOV) queries. But the hybrid index integrating results for different units outperformed every individual index based on results for the individual unit for both IV and OOV queries. This verified the obvious complimentarity between recognition results based on words and sub-word units for this task. It was also shown that the real time factor and the time complexity was dramatically improved by weighted finite state transducers.

參考文獻


[6] MIT OpenCourseWare, http://ocw.mit.edu/.
[8] Robertson SE, “The probability ranking principle in IR,” Journal of Documentation, vol. 33(4), pp. 294–304, 1977.
[9] Text REtrieval Conference (TREC), http://trec.nist.gov/.
[10] C. Chelba, T.J. Hazen, and M. Saraclar, “Retrieval and browsing of spoken content,” IEEE Signal Processing Magazine, vol. 25, no. 3, pp. 39–49, 2008.
[11] Jean-Manuel Van Thong, Pedro J. Moreno, Beth Logan, Blair Fidler, Katrina Maffey, and Matthew Moores, SPEECHBOT: An Experimental Speech-Based Search Engine for Multimedia Content in the Web, 2001.

延伸閱讀