結合經驗模態分解法以及常數Q倒頻譜係數和梅爾倒頻譜係數於自動語者驗證系統之研究

近幾年來，回放攻擊一直威脅著自動語者驗證系統（ASV）。經驗模態分解（EMD）是分析語音信號的一種有效方法。由於相對較高的頻率區域包含了更多的資訊來區分真假語音信號。因此，EMD是一種有效的分析語音信號的方法。語音信號被分解為多個本質模態函數（IMF）。我們提出了一種基於EMD的方法來實現檢測回放攻擊。該方法的主要思想是用EMD對信號進行分解，然後從不同的IMF組合中提取常數Q倒頻譜係數（CQCC）或者梅爾倒頻譜係數（MFCC）。根據我們在ASVspoof 2019資料庫上的結果，每個IMFs都在一定程度上提供了一些資訊。通過對部分IMFs的組合，我們可以得到比原始信號更好的結果。我們提出的方法可實現92.04%的高正確率和0.0693左右的低相等錯誤率（EER）。我們也討論了實驗結果可能的原因，包括EMD適合與CQCC結合，而不適合與MFCC結合的可能因素。

關鍵字

回放攻擊；自動語者驗證；常數Q倒頻譜係數 (CQCC) ；梅爾倒頻譜係數 (MFCC) ；經驗模態分解 (EMD) ； ASVspoof 2019

並列摘要

Replay spoofing attacks have been threatening the Automatic Speaker Verification (ASV) system in the past few years. Since relatively high frequency regions contain more information to differentiate genuine from spoofed speech signals. Empirical Mode Decomposition (EMD) is an effective method to analyze a speech signal, in which the signal is decomposed into several Intrinsic Mode Functions (IMF). We propose a method based on EMD for detecting spoofed speech signals. The main idea is to decompose the signal with EMD and then extract Constant Q Cepstral Coefficients (CQCC) or Mel-Frequency Cepstral Coefficients (MFCC) from different combinations of IMFs. According to the experiments using the ASVspoof 2019 database, we find that each IMF can provide information to a certain degree. By combining some of the IMFs, we can better detect spoofed speech signals. Our proposed approach attains a high accuracy rate of 92.04% and a low Equal Error Rate (EER) around 0.0693. We also discuss the possible reasons for our results, e.g., why EMD is suitable for combining with CQCC while it fails to combine with MFCC.

並列關鍵字

Replay spoofing ； Automatic speaker verification ； Constant Q Cepstral Coefficients (CQCC) ； Mel-Frequency Cepstral Coefficients (MFCC) ； Empirical mode decomposition (EMD) ； ASVspoof 2019

參考文獻

[1] A. E. Rosenberg, “Automatic speaker verification: A review,” Proceedings of the IEEE, vol. 64, no. 4, pp. 475–487, 1976.

Google Scholar

[2] S. Furui, “Cepstral analysis technique for automatic speaker verification,” IEEE Transactions on Acoustics, Speech, and Signal Processing, vol. 29, no. 2, pp. 254–272, 1981.

Google Scholar

[3] J. P. Campbell, “Speaker recognition: a tutorial,” Proceedings of the IEEE, vol. 85, no. 9, pp. 1437–1462, 1997.

Google Scholar

[4] F. Bimbot, J.F. Bonastre, C. Fredouille, G. Gravier, I. MagrinChagnolleau, S. Meignier, T. Merlin, J. OrtegaGarcía, D. PetrovskaDelacrétaz, and D. A. Reynolds, “A tutorial on textindependent speaker verification,” EURASIP Journal on Advances in Signal Processing, vol. 2004, no. 4, p. 101962, 2004.

Google Scholar

[5] W. M. Campbell, J. P. Campbell, T. P. Gleason, D. A. Reynolds, and W. Shen, “Speaker verification using support vector machines and highlevel features,” IEEE Transactions on Audio, Speech, and Language Processing, vol. 15, no. 7, pp. 2085–2094, 2007.

Google Scholar

國際替代計量

結合經驗模態分解法以及常數Q倒頻譜係數和梅爾倒頻譜係數於自動語者驗證系統之研究

全文下載

主題瀏覽