語音辨識在數位娛樂之應用與研究

在數位娛樂產業中，接近真實感受的介面逐漸被開發，更多型態的操控介面被提出和應用於數位娛樂產業中。過往因正確率過低無法實用的語音介面也因技術和操控性提升漸漸導入數位娛樂產業。語音介面將成為數位娛樂新一代的操控方式，但是在特定(speaker-dependent)/非特定語者(speaker-independent)辨識中，仍有辨識率不佳的問題。本論文針對此問題，提出一個同時適用特定/非特定語者的辨識方法。目前主流辨識方式的隱藏馬可夫模型 (Hidden Markov Model, HMM)，對特定語者或非特定語者都有不錯的辨識牢，但事前須準備龐大的資料庫來建立語音模型。而在短詞彙的辨識系統中，動態時間軸校正, DTW)不需要大量的模型訓練，就可對短詞彙有相當不錯的辨識能力。針對於此，本論文以隱藏馬可夫模型為基礎，同時加入動態時間軸校正的技術，將兩演算法所得到的數據加權計算，並視需要調整比例，藉此達到增強短詞彙的辨識率，同時又能保有非特定語者的辨識能力。

關鍵字

語音辨識；特定語者；隱藏馬可夫模型；動態時間軸校正

並列摘要

Virtual reality interfaces for simulation reality are being developed gradually in digital entertainment. Many user interfaces are being developed and applied to digital entertainment. Speech recognition systems have so far been impractical, due to poor recognition of pronunciation. However, increasing scientific and technological progress in this field has led to increasing interest in digital entertainment. The speech interface has become a new tool for digital entertainment in this generation. However, difficulties are still present in distinguishing speakers, or identifying speech from different speakers. This thesis presents a hybrid method of analyzing speech that is both speaker-independent and speaker-dependent. The most popular method of speech recognition is currently HMM (Hidden Markov Model). This model requires a huge database set up use to identify speech and distinguish among speakers. In small vocabulary speech recognition, DTW (Dynamic Time Warping) does not need large corpus, but effectively distinguishes speech with a small vocabulary. This thesis presents a hybrid approach that combines HMM with DTW. The weighting of the data processed by each method is determined and adjusted to achieve the best distinguishing rate that strengthens the recognition of short vocabulary and the ability of distinguishing ofthe speaker-independent.

並列關鍵字

speech recognition ； speaker-independent ； HMM(Hidden Markov Model) ； DTW(Dynamic Time Warping)

參考文獻

[2] J.Zhang, J.Zhao, S.Bai, and Z. Huang, "Applying Speech Interface to Mahjong Game", MMM2004 Accepted Papers, 2004.

[6] 孫益君，以PDA為平台之語音辨識應用系統開發，私立中原大學資訊工程學系碩士論文, 2003年.

[7] Y. Gong, "Speech Recognition in Noisy Environments: A Survey", Speech Communication 16, 1995.

[8] M.J.F. Gales, "Model-based Techniques for Noise Robust Speech Recognition", University of Cambridge, Sep. 1995.

[9] Boll, S. F, "Suppression of Acoustic Noise in Speech Using Spectral Subtraction", IEEE Trans. on ASSP, Vol. 27, No. 2, pp.113-120, 1979.

被引用紀錄

林瑞峯（2009）。用非接觸式感應技術做智慧型病房內活動之監測與防護〔碩士論文，亞洲大學〕。華藝線上圖書館。https://www.airitilibrary.com/Article/Detail?DocID=U0118-0807200916283834

國際替代計量

語音辨識在數位娛樂之應用與研究

未授權

主題瀏覽