在數位娛樂產業中,接近真實感受的介面逐漸被開發,更多型態的操控介面被提出和應用於數位娛樂產業中。過往因正確率過低無法實用的語音介面也因技術和操控性提升漸漸導入數位娛樂產業。語音介面將成為數位娛樂新一代的操控方式,但是在特定(speaker-dependent)/非特定語者(speaker-independent)辨識中,仍有辨識率不佳的問題。本論文針對此問題,提出一個同時適用特定/非特定語者的辨識方法。 目前主流辨識方式的隱藏馬可夫模型 (Hidden Markov Model, HMM),對特定語者或非特定語者都有不錯的辨識牢,但事前須準備龐大的資料庫來建立語音模型。而在短詞彙的辨識系統中,動態時間軸校正, DTW)不需要大量的模型訓練,就可對短詞彙有相當不錯的辨識能力。 針對於此,本論文以隱藏馬可夫模型為基礎,同時加入動態時間軸校正的技術,將兩演算法所得到的數據加權計算,並視需要調整比例,藉此達到增強短詞彙的辨識率,同時又能保有非特定語者的辨識能力。
Virtual reality interfaces for simulation reality are being developed gradually in digital entertainment. Many user interfaces are being developed and applied to digital entertainment. Speech recognition systems have so far been impractical, due to poor recognition of pronunciation. However, increasing scientific and technological progress in this field has led to increasing interest in digital entertainment. The speech interface has become a new tool for digital entertainment in this generation. However, difficulties are still present in distinguishing speakers, or identifying speech from different speakers. This thesis presents a hybrid method of analyzing speech that is both speaker-independent and speaker-dependent. The most popular method of speech recognition is currently HMM (Hidden Markov Model). This model requires a huge database set up use to identify speech and distinguish among speakers. In small vocabulary speech recognition, DTW (Dynamic Time Warping) does not need large corpus, but effectively distinguishes speech with a small vocabulary. This thesis presents a hybrid approach that combines HMM with DTW. The weighting of the data processed by each method is determined and adjusted to achieve the best distinguishing rate that strengthens the recognition of short vocabulary and the ability of distinguishing ofthe speaker-independent.