基於語者模型梅爾倒頻譜係數與類神經網路之主從式即時語者辨識系統

本論文主要是應用快速傅立葉轉換（FFT）推導出之語者模型梅爾倒頻譜係數（SMMFCC）並透過倒傳遞類神經網路（BPNN）在ARM-based嵌入式系統平台上開發即時語者辨識系統。由於本論文中所提出辨識系統是要在運算處理能力與記憶體資源有限的嵌入式平台開發，因此所擷取出之語者的特徵參數需要做有效的資料縮減；同時透過主從式架構，將需要大量計算之類神經網路訓練過程交由伺服端負責，用戶端只需要在首次使用或有新使用者加入時遠端透過乙太網路將訓練完成的類神經網路權值（Weight）下載更新儲存，用戶端辨識模組就可以達到即時語者辨識的目的。而在伺服端尚有紀錄使用者登錄情況及聲紋資料庫之功能。經實驗數據顯示本系統平均正確辨識率可達90％以上，辨識速度可在3秒以內，並可廣泛應用於如居家保全或汽車防盜等需要身份認證場合中。

關鍵字

語者模型梅爾倒頻譜係數；類神經網路；即時；嵌入式系統

並列摘要

The main contribution of this thesis is to develop a real-time speaker recognition system with Speaker Model Mel-Frequency Cepstral Coefficients (SMMFCC) derived from Fast Fourier Transform (FFT). Back-Propagation Neural Network is used on ARM-based embedded system platform to perform the speaker recognition function. Due to the limitations of computing capability and memory of embedded systems, the features extracted from speaker model are reduced. In order to overcome the computation limitation, a client - server architecture is proposed in this thesis. In this architecture, the server deals with the Neural Network training process that requires a great deal of computation, while the client performs the real-time speaker recognition based on the updated weights of neural network which is retrieved from the server. The experimental results show that the average recognition rate of this system is more than 90% and the recognition time is less than 3 seconds. The proposed speaker recognition system can be generally applied to home security, office security, factory security systems, etc.

並列關鍵字

Speaker Model Mel-Frequency Cepstral Coefficients ； Neural Network ； Real-Time ； Embedded System

參考文獻

[5] Zhonghua Fu and Rongchun Zhao, “An overview of modeling technology of speaker recognition,” Neural Networks and Signal Processing, Nanjing, China, 14-17 Dec.2003, pp.887-891

[6] Pravinkumar Premakanthan,and Wasfy. B. Mikhael, “Speaker verification/ recognition and the importance of selective feature extraction: review,” Circuits and Systems, Dayton, OH, 14-17 Aug. 2001, pp.57-61

[7] J. Deller, J. Proakis and J. H. Hansen, Discrete-Time Processing of Speech Signal, Macmillan, 2000

[8] Fenglei Hou and Bingxi Wang, “An integrated system for text-independent speaker recognition using binary neural network classifiers,” Signal Processing Proceedings, Beijing, 21-25 Aug. 2000, pp.710-713

[9] Fazal Mueen, Ayaz Ahmed, Sanaullah and Asim Gaba, “Speaker recognition using artificial neural networks,” Students Conference, 16-17 Aug. 2002, vol.1, pp.99-102

被引用紀錄

葉彥智（2010）。具彈性架構的高速硬體倒傳遞及回饋型類神經網路設計〔碩士論文，國立臺北科技大學〕。華藝線上圖書館。https://doi.org/10.6841/NTUT.2010.00465

賴映仲（2014）。應用支持向量機於鯨豚哨音分類之研究〔碩士論文，國立臺灣大學〕。華藝線上圖書館。https://doi.org/10.6342/NTU.2014.01630

陳志佳（2007）。非對稱雙處理器架構實現嵌入式語者辨識系統〔碩士論文，國立臺北科技大學〕。華藝線上圖書館。https://www.airitilibrary.com/Article/Detail?DocID=U0006-0208200715131900

國際替代計量

基於語者模型梅爾倒頻譜係數與類神經網路之主從式即時語者辨識系統

全文下載

主題瀏覽