整合語音編碼與辨識之模組化設計
及其FPGA實現

摘要語音編碼與語音辨識系統廣泛地被使用在手機ヽPDA以及行車應用等裝置上。這兩種系統由於需求及處理技術的差異而衍生出不同的演算法，然而此兩種系統在某些處理上是類似的，因此如何整合這兩種系統成為本論文的思考方向。在硬體實現方面，由於近年來SoC (System on Chip)的設計觀念盛行，衍生而出使用IP (Intellectual Property) 快速建立雛型系統的方法，因此在硬體設計時規劃有重複使用價值的模組亦為本研究的重要議題。本論文首先分析不同的語音編碼及辨識演算法，並決定以CELP (Code Excited LPC)為語音編碼器，對語音辨識器則是選用動態時間扭曲(Dynamic Time Warping)法，其主要原因為兩系統可共享LPC (Linear Prediction Coefficient) 參數抽取這個核心模組。為了設計出獨立運作的硬體模組，並得到有意義的輸出，因此將演算法拆解成線性預測係數ヽ音高週期ヽ碼簿搜尋ヽ倒頻譜係數ヽ辨識器等數個特定功能模組。設計時每個模組只考慮輸出輸入資料關係，而硬體內部架構可依照面積ヽ速度及功率不同需求彈性調整。最後演算法以Altera DSP Builder建立模組並與Quartus軟體的時序模擬相互驗證無誤，並以Quartus評估各模組中以邏輯元素為單位的面積及以MHz為單位的速度。關鍵字：語音編碼、語音辨識、FPGA硬體實現

關鍵字

FPGA硬體實現；語音編碼；語音辨識

並列摘要

Abstract Speech coding and speech recognition systems are widely used in mobile phones, PDA (Personal Digital Assistant), vehicle applications, etc. Different requirements and processing technologies of these two systems result in different algorithms. However, some part of the processing algorithms are similar. Consequently how to integrate these two systems is an interesting task. In recent years, SoC (System on Chip) becomes the most popular design concept. As a result, IP (Intellectual Property) based approach is very important in a fast prototyping hardware system. Thus, how to design efficient and truly reusable modules is an important issue in this study. We first analyze several different algorithms, and then choose CELP (Code Excited Linear Prediction) as our codec and LPCC (Linear Prediction Coefficient Cepstrum) as recognition features for a DTW (Dynamic Time Warping) based recognition system. One reason is that both systems share the same LPC (Linear Prediction Coefficient) part, which can be integrated in hardware implementation. In order to design independent hardware modules and have meaningful outputs, we divide the overall algorithm into the following parts with specific functions: LPC, pitch period detection, codebook search, cepstrum coefficient and recognition modules. For each module, only the input / output relationship is considered, and its internal architecture can be adjusted according to different requirements in logic elements, speed and power consumption. Finally we use Altera’s DSP Builder to construct the modules for our algorithm and found no difference between our results and the timing simulation results obtained by using Quartus. We also use Quartus to evaluate the area in logic elements and speed in MHz for each module. Keywords: Speech Coding, Speech Recognition, FPGA, DSP builder

並列關鍵字

FPGA ； Speech Recognition ； Speech Coding ； DSP builder

參考文獻

[2] S. Roucos and R. Schwartz, Digital Processing of Speech Signals. Prentice-Hall, 1978.

[3] D. G. Childers, Speech Processing and Synthesis Toolboxes, John Wiley & Sons, U.K., 1999.

[8] 王家慶，語音辨識與壓縮架構設計之研究，成功大學電機工程研究所碩士論文，2003年

[9] J. Makhoul, “Linear Prediction: A Tutorial Review”. Proc. IEEE, pp. 561-580, April 1995.

[10] F. Itakura, “Line Spectrum Representation of Linear Predictive Coefficients of Speech Signals,” J. Acoust. Soc. Am., 57, 535(A), 1975

被引用紀錄

林玉婷（2008）。手持式裝置之情境規劃〔碩士論文，國立中央大學〕。華藝線上圖書館。https://www.airitilibrary.com/Article/Detail?DocID=U0031-0207200917352422

國際替代計量

整合語音編碼與辨識之模組化設計及其FPGA實現

未授權

主題瀏覽