摘要 語音編碼與語音辨識系統廣泛地被使用在手機ヽPDA以及行車應用等裝置上。這兩種系統由於需求及處理技術的差異而衍生出不同的演算法,然而此兩種系統在某些處理上是類似的,因此如何整合這兩種系統成為本論文的思考方向。在硬體實現方面,由於近年來SoC (System on Chip)的設計觀念盛行,衍生而出使用IP (Intellectual Property) 快速建立雛型系統的方法,因此在硬體設計時規劃有重複使用價值的模組亦為本研究的重要議題。 本論文首先分析不同的語音編碼及辨識演算法,並決定以CELP (Code Excited LPC)為語音編碼器,對語音辨識器則是選用動態時間扭曲(Dynamic Time Warping)法,其主要原因為兩系統可共享LPC (Linear Prediction Coefficient) 參數抽取這個核心模組。 為了設計出獨立運作的硬體模組,並得到有意義的輸出,因此將演算法拆解成線性預測係數ヽ音高週期ヽ碼簿搜尋ヽ倒頻譜係數ヽ辨識器等數個特定功能模組。設計時每個模組只考慮輸出輸入資料關係,而硬體內部架構可依照面積ヽ速度及功率不同需求彈性調整。最後演算法以Altera DSP Builder建立模組並與Quartus軟體的時序模擬相互驗證無誤,並以Quartus評估各模組中以邏輯元素為單位的面積及以MHz為單位的速度。 關鍵字:語音編碼、語音辨識、FPGA硬體實現
Abstract Speech coding and speech recognition systems are widely used in mobile phones, PDA (Personal Digital Assistant), vehicle applications, etc. Different requirements and processing technologies of these two systems result in different algorithms. However, some part of the processing algorithms are similar. Consequently how to integrate these two systems is an interesting task. In recent years, SoC (System on Chip) becomes the most popular design concept. As a result, IP (Intellectual Property) based approach is very important in a fast prototyping hardware system. Thus, how to design efficient and truly reusable modules is an important issue in this study. We first analyze several different algorithms, and then choose CELP (Code Excited Linear Prediction) as our codec and LPCC (Linear Prediction Coefficient Cepstrum) as recognition features for a DTW (Dynamic Time Warping) based recognition system. One reason is that both systems share the same LPC (Linear Prediction Coefficient) part, which can be integrated in hardware implementation. In order to design independent hardware modules and have meaningful outputs, we divide the overall algorithm into the following parts with specific functions: LPC, pitch period detection, codebook search, cepstrum coefficient and recognition modules. For each module, only the input / output relationship is considered, and its internal architecture can be adjusted according to different requirements in logic elements, speed and power consumption. Finally we use Altera’s DSP Builder to construct the modules for our algorithm and found no difference between our results and the timing simulation results obtained by using Quartus. We also use Quartus to evaluate the area in logic elements and speed in MHz for each module. Keywords: Speech Coding, Speech Recognition, FPGA, DSP builder
為了持續優化網站功能與使用者體驗,本網站將Cookies分析技術用於網站營運、分析和個人化服務之目的。
若您繼續瀏覽本網站,即表示您同意本網站使用Cookies。