透過您的圖書館登入
IP:3.12.108.236
  • 學位論文

結合經驗模態分解法以及常數Q倒頻譜係數和梅爾倒頻譜係數於自動語者驗證系統之研究

Combining Empirical Mode Decomposition with Constant Q Cepstral Coefficients and Mel-Frequency Cepstral Coefficients on Automatic Speaker Verification System

指導教授 : 金仲達

摘要


近幾年來,回放攻擊一直威脅著自動語者驗證系統(ASV)。經驗模態分解(EMD)是分析語音信號的一種有效方法。由於相對較高的頻率區域包含了更多的資訊來區分真假語音信號。因此,EMD是一種有效的分析語音信號的方法。語音信號被分解為多個本質模態函數(IMF)。我們提出了一種基於EMD的方法來實現檢測回放攻擊。該方法的主要思想是用EMD對信號進行分解,然後從不同的IMF組合中提取常數Q倒頻譜係數(CQCC)或者梅爾倒頻譜係數(MFCC)。根據我們在ASVspoof 2019資料庫上的結果,每個IMFs都在一定程度上提供了一些資訊。通過對部分IMFs的組合,我們可以得到比原始信號更好的結果。我們提出的方法可實現92.04%的高正確率和0.0693左右的低相等錯誤率(EER)。我們也討論了實驗結果可能的原因,包括EMD適合與CQCC結合,而不適合與MFCC結合的可能因素。

並列摘要


Replay spoofing attacks have been threatening the Automatic Speaker Verification (ASV) system in the past few years. Since relatively high frequency regions contain more information to differentiate genuine from spoofed speech signals. Empirical Mode Decomposition (EMD) is an effective method to analyze a speech signal, in which the signal is decomposed into several Intrinsic Mode Functions (IMF). We propose a method based on EMD for detecting spoofed speech signals. The main idea is to decompose the signal with EMD and then extract Constant Q Cepstral Coefficients (CQCC) or Mel-Frequency Cepstral Coefficients (MFCC) from different combinations of IMFs. According to the experiments using the ASVspoof 2019 database, we find that each IMF can provide information to a certain degree. By combining some of the IMFs, we can better detect spoofed speech signals. Our proposed approach attains a high accuracy rate of 92.04% and a low Equal Error Rate (EER) around 0.0693. We also discuss the possible reasons for our results, e.g., why EMD is suitable for combining with CQCC while it fails to combine with MFCC.

參考文獻


[1] A. E. Rosenberg, “Automatic speaker verification: A review,” Proceedings of the IEEE, vol. 64, no. 4, pp. 475–487, 1976.
[2] S. Furui, “Cepstral analysis technique for automatic speaker verification,” IEEE Transac­tions on Acoustics, Speech, and Signal Processing, vol. 29, no. 2, pp. 254–272, 1981.
[3] J. P. Campbell, “Speaker recognition: a tutorial,” Proceedings of the IEEE, vol. 85, no. 9, pp. 1437–1462, 1997.
[4] F. Bimbot, J.­F. Bonastre, C. Fredouille, G. Gravier, I. Magrin­Chagnolleau, S. Meignier, T. Merlin, J. Ortega­García, D. Petrovska­Delacrétaz, and D. A. Reynolds, “A tutorial on text­independent speaker verification,” EURASIP Journal on Advances in Signal Process­ing, vol. 2004, no. 4, p. 101962, 2004.
[5] W. M. Campbell, J. P. Campbell, T. P. Gleason, D. A. Reynolds, and W. Shen, “Speaker verification using support vector machines and high­level features,” IEEE Transactions on Audio, Speech, and Language Processing, vol. 15, no. 7, pp. 2085–2094, 2007.

延伸閱讀